[cfe-dev] [RFC] cHash: Integration of AST hashing
Christian Dietrich via cfe-dev
cfe-dev at lists.llvm.org
Mon Oct 23 07:00:26 PDT 2017
First of all, I'm sorry for the long delay in my answer. There was
vacation and an awful lot of traveling in August and September.
Johannes Altmanninger <aclopte at gmail.com> writes:
>> <UNIQUE-NODE-CLASS-ID> <FIELD1> <FIELD2> <FIELD3>
>
> We should probably do the same, maybe also for other enum members.
>
> In addition, it may be necessary to add some padding or separator
> between when hashing multiple nodes as they have different numbers of
> fields, I am not an expert though.
I would see two basic cases:
1. The number of fields is constant -> The knowledge is derivable from
the UNIQUE-NODE-CLASS-ID.
2. Variable number of fields (or children): Just hash in a 4 byte
integer.
>> - Stable?: Should the hash be stable over compiler invocations? Or can
>> we take some shortcuts for types/declarations/things that are not
>> syntactially enclosed into the node.
>
> Hopefully it is not too difficult to make it stable.
>
> If there is a node that is referred to by its name, then it suffices to hash the
> qualified name, provided that the referenced node is also included in
> the hash.
I see one problem here, at least if we want to make a syntactial hash,
and I believe that we have to include not only the qualified name, but
the hash behind that name.
I think a problem arises, if we make one pass over the whole tree
(starting from a TranslationUnit) and record not only the top-level
hash, but also every function level hash. Let's assume the following two
scenarios, where we expect both function hashes of foo to be equal:
: struct bar { int x; };
:
: void foo(struct bar y) {};
hash(translation-unit) = hash(foo) + hash(struct bar)
where
hash(foo) = hash(void) + hash(foo)
+ hash("struct bar") + hash(y) + hash({})
hash(struct bar) = ....
: struct bar { long x; };
:
: void foo(struct bar y) {};
hash(translation-unit) = hash(foo) + hash(baz) + hash(struct bar)
where
hash(foo) = hash(void) + hash(foo)
+ hash("struct bar") + hash(y) + hash({})
hash(struct bar) = ....
In this case, the top-level hash changes, as struct bar changes, but the
hash for 'foo' does not change. This is OK for a syntactic hash, but not
for a semantic hash.
> So there are references to other nodes by name.
> All other references between nodes I can think of are based on the tree
> structure (correct me if I am wrong), so we should be fine if we hash
> subtrees.
It depends. You can have an expression that references a type without
explicitly naming it:
: struct bar { int x }
: struct foo { struct bar * BAR; }
:
: int foo() {
: struct foo FOO;
: FOO.BAR.x = 23;
: return FOO.BAR.x;
: }
:
: int barfoo() {
: struct *foo FOO = mallow(23);
: }
Here, we see that that:
a) foo depends on a type it does not name explicitly, but only
implicitly, by using foo.
b) The binary of barfoo does NOT change, if 'struct bar' changes.
Therefore, I believe it is necessary to include all type references with
their hash for the semantic hash.
As I see, you have done a lot of work on the StmtDataCollectors and used
TableGen to generate the DataCollector. So my idea would be the following:
- Extend the StmtDataCollectors.td to handle all possible nodes/types in
the clang AST. Invoking this operator once, we get the hash value for
one node.
- Derive an recursive AST hashing visitor from RecursiveASTVisitor. This
recursive AST visitor then has our desired flags:
- semantic_hash vs syntactic_hash
- Hash mechanism as a template parameter
- Include debug flags
One problem I have with RecursiveASTVisitor is either preorder, or post
order. However, to calculate an Hash for let's say a type, I have to:
PreVisitType: H = new Hash
VisitFuncType: H.addData(FUNC_TYPE)
hash(T.getReturnType())
-> Recursion
PostVisitType: Store Hash for function
And I have not seen, how I could do this with RecursiveASTVisitor
chris
--
Christian Dietrich, M.Sc. (Scientific Staff)
Institute for Systems Engineering (Systems and Computerarchitecture)
Leibniz Universität Hannover
Appelstraße 4
30167 Hannover, Germany
Tel: +49 511 762-19737
Fax: +49 511 762-19733
eMail: dietrich at sra.uni-hannover.de
WWW: https://www.sra.uni-hannover.de
More information about the cfe-dev
mailing list