[cfe-dev] Question about hashing AST nodes

Wed Jan 30 21:30:29 PST 2019

First of all, why do you want to hash them? Like, are you serializing 
these hashes to disk or sending them over network? Are these hash values 
allowed to change between runs as long as equal nodes still get equal 
hashes within a single run?

Just in case you missed it, if the hashes you're looking for are never 
going to outlive a single Clang invocation, then you can simply compare 
AST nodes by pointers. For example, if a function call 'foo(x)' refers 
to a variable 'x', then it's going to have a DeclRefExpr that points to 
a VarDecl for 'x', and the pointer to VarDecl that you're going to 
retrieve from that DeclRefExpr is going to be exactly the same pointer 
that you'll retrieve by looking at the actual declaration of the 
variable somewhere above the call. Every AST node is stored exactly once 
in memory and is never moved to a different place.

Well, there are also forward declarations that are different from the 
definition (the latter may not even exist in the current translation 
unit), and sometimes some statements may refer to a forward declaration, 
but it's easy to obtain a "canonical" declaration in all cases.

If you want something that's more stable across runs (pointers would 
obviously change every time), you can have a look at the getID() method 
in various AST nodes, which produces an integer that doesn't change 
across runs but does change across host machines. These stable IDs are 
mostly for debugging, but if your work is not supposed to interact with 
other computers, these may do.

If you want a comparison up to a certain property (eg., you don't want 
to discriminate between "x + 1" and "1 + x" but you do want to 
discriminate both of them from "x + y"), then you might want to have a 
look at the CloneDetection framework that conducts a series of passes to 
find similar statements in different places of the program. One of the 
existing passes involves some actual hashing. Also the infrastructure 
for making these passes is re-usable for making different sets of passes 
in order to look for different similarities.

On 1/29/19 1:49 PM, Kevin Sullivan via cfe-dev wrote:
> Dear Clang Community:
>
> A colleague and I need to use AST nodes as keys in a map. For example, 
> having entered a few variable declaration nodes as keys, linking to 
> associated meta-data, and now looking at a member function application 
> node, with the variables as arguments, we'd like to be able to query 
> the map to find the associated meta-data for those variable objects.
>
> To this end, we need to implement hash and == for AST nodes. That's 
> what's needed, e.g., by the C++ unordered map class.
>
> There's little available information as far as we can tell on this 
> topic. Would you be so kind as to let us know the best way to do this, 
> in your view? Is ODRHash the key to unlocking a solution? Are there 
> other approaches you'd recommend?
>
> Kevin Sullivan
> University of Virginia
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev