[cfe-dev] Question about hashing AST nodes
Artem Dergachev via cfe-dev
cfe-dev at lists.llvm.org
Wed Jan 30 21:30:29 PST 2019
First of all, why do you want to hash them? Like, are you serializing
these hashes to disk or sending them over network? Are these hash values
allowed to change between runs as long as equal nodes still get equal
hashes within a single run?
Just in case you missed it, if the hashes you're looking for are never
going to outlive a single Clang invocation, then you can simply compare
AST nodes by pointers. For example, if a function call 'foo(x)' refers
to a variable 'x', then it's going to have a DeclRefExpr that points to
a VarDecl for 'x', and the pointer to VarDecl that you're going to
retrieve from that DeclRefExpr is going to be exactly the same pointer
that you'll retrieve by looking at the actual declaration of the
variable somewhere above the call. Every AST node is stored exactly once
in memory and is never moved to a different place.
Well, there are also forward declarations that are different from the
definition (the latter may not even exist in the current translation
unit), and sometimes some statements may refer to a forward declaration,
but it's easy to obtain a "canonical" declaration in all cases.
If you want something that's more stable across runs (pointers would
obviously change every time), you can have a look at the getID() method
in various AST nodes, which produces an integer that doesn't change
across runs but does change across host machines. These stable IDs are
mostly for debugging, but if your work is not supposed to interact with
other computers, these may do.
If you want a comparison up to a certain property (eg., you don't want
to discriminate between "x + 1" and "1 + x" but you do want to
discriminate both of them from "x + y"), then you might want to have a
look at the CloneDetection framework that conducts a series of passes to
find similar statements in different places of the program. One of the
existing passes involves some actual hashing. Also the infrastructure
for making these passes is re-usable for making different sets of passes
in order to look for different similarities.
On 1/29/19 1:49 PM, Kevin Sullivan via cfe-dev wrote:
> Dear Clang Community:
>
> A colleague and I need to use AST nodes as keys in a map. For example,
> having entered a few variable declaration nodes as keys, linking to
> associated meta-data, and now looking at a member function application
> node, with the variables as arguments, we'd like to be able to query
> the map to find the associated meta-data for those variable objects.
>
> To this end, we need to implement hash and == for AST nodes. That's
> what's needed, e.g., by the C++ unordered map class.
>
> There's little available information as far as we can tell on this
> topic. Would you be so kind as to let us know the best way to do this,
> in your view? Is ODRHash the key to unlocking a solution? Are there
> other approaches you'd recommend?
>
> Kevin Sullivan
> University of Virginia
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
More information about the cfe-dev
mailing list