[cfe-dev] [RFC] cHash: Integration of AST hashing

Richard Trieu via cfe-dev cfe-dev at lists.llvm.org
Tue Dec 12 18:49:43 PST 2017


On Tue, Dec 12, 2017 at 12:24 AM, Christian Dietrich <
dietrich at sra.uni-hannover.de> wrote:

> Richard Trieu <rtrieu at google.com> writes:
>
> > On Mon, Dec 11, 2017 at 5:14 AM, Christian Dietrich <
> > dietrich at sra.uni-hannover.de> wrote:
> >
> >> Richard Trieu <rtrieu at google.com> writes:
> >>
> >> > I don't understand your need for pseudrandom identifiers.  If a
> specific
> >> > type of AST node (for instance, a Decl) is expected, then adding to
> the
> >> > stream the node kind (like from Decl::getKind()) is enough to avoid
> >> > collisions.  If any node can be expected, then first pass in an enum
> to
> >> > differentiate the different types (Decl, Stmt, Type, Attr, etc), then
> add
> >> > the node kind, and then continue on.  Stmt::Profile and ODRHash use
> this
> >> > method to avoid collisions.  I don't see any gain from using new
> >> > identifiers here.
> >>
> >> Ok, so let's explain a little bit my idea here: Of course, including the
> >> Decl::getKind() also distinguishes node types from each other. However,
> >> these values come from enums and are, therefore, small integers. Small
> >> integers (SI) occur very often in an AST. So, if a sequence of addData()
> >> operations produces the same sequence of SIs because node types are not
> >> distinguishable from SI that occur in the AST, we get a collision[1].
> >>
> > How does the result of Decl::getKind() being a small integer allows any
> > chance of collision?  I'll admit that arbitrary sub-sequences of
> different
> > hash streams can be equal, but we shouldn't be hashing sub-sequences.
>
> Sure, we shouldn't take an arbitrary subsequence of the stream and hash
> it. That would be a bad idea.
>
> > Decl
> > <Decl::getKind()>
> >
> > Decl with two fields
> > <Decl::getKind()> <Field 1> < Field 2>
> >
> > Decl with one sub-Decl
> > <Decl::getKind()> <Stream of sub-decl>
> >
> > Decl with arbitrary number of sub-Decl's
> > <Decl::getKind()> <Number of sub-Decl's> <Stream of sub-decl 1> ...
> <Stream
> > of sub-decl N>
> >
> > The result of Decl::getKind() will basically specify the format of the
> hash
> > stream.  I think better controlling of the input stream is a better way
> of
> > avoiding collisions then using magic numbers as identifiers.
> >
> > We could see the same sequence for different ASTs, if an AST node has a
>
> Sure, this would definitly be a better option. A quick look into the
> RecursiveASTVisitor tells me that there are at least 76 for (...) loops.
> So we have to add them.
>
> Perhaps we can combine both, length-indicators and and
> non-small-integers, by using llvm::hash on the <Decl::getKind()> values.
> As the documentation says, we have to spent less than 20 cycles per
> 32-bit value on this. By this, we avoid the magic numbers. I will start
> on doing the changes to incorporate the length fields.
>
> Furthermore, I'm not sure how the clang community handles the "how to
> find a reviewer for my patch". Are people adding themselves to be a
> reviewer? Or do I just add people how I think could be interessted in
> reviewing the change?
>
>  People will add themselves as reviewers, so you can CC people that you
think may have an interest in it.  If neither happens, adding a code owner
and they will help find reviewers.

Personally, I've only looked through small portions of your patch partially
because I have other projects to work on and partially because I was
waiting on your answers to Richard Smith's queries.  Basically, your plan
for integrating with the existing hash functions and how the other hash
functions, especially ODRHash, doesn't fill your need.  From what I gather,
you think the ODRHash is too strict on the AST structure.  I would like to
understand what differences in the AST that you are fine ignoring.


> chris
> --
> Christian Dietrich, M.Sc. (Scientific Staff)
> Institute for Systems Engineering (Systems and Computerarchitecture)
> Leibniz Universität Hannover
> Appelstraße 4
> 30167 Hannover, Germany
>
> Tel:    +49 511 762-19737
> Fax:    +49 511 762-19733
> eMail:  dietrich at sra.uni-hannover.de
> WWW:    https://www.sra.uni-hannover.de
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20171212/62a264df/attachment.html>


More information about the cfe-dev mailing list