[cfe-dev] [RFC] cHash: Integration of AST hashing

Alex Lorenz via cfe-dev cfe-dev at lists.llvm.org
Thu Aug 3 03:57:16 PDT 2017


Hey,

This looks like an awesome project! It would be great to see something like this in Clang itself.
What impact does the hashing itself have on the build time? I.e. If I were to fully rebuild a project from scratch, how much longer would it take?

Alex

> On 3 Aug 2017, at 11:41, Christian Dietrich via cfe-dev <cfe-dev at lists.llvm.org> wrote:
> 
> Hello!
> 
> First, I'd like to give you a little bit of context, before I get into
> the details what all of this has to do with the cfe-dev list.
> 
> TL;DR: cHash calculates an hash over the AST and, therefore, allows
>       the detection of redundant builds.
> 
> In the last months, we have been working on a mechanism to avoid
> redundant recompilations. In C/C++ projects, we observe that a change
> to a central header file can cause a lot of compiler invocations.
> However, not all compiler runs will lead to an object file that
> differs from the last invocation. For example, if I add a typedef
> that is never used in any .c file, none of the object files will
> change.
> 
> Our approach to this problem is that we hook into the compiler front
> end, after the parsing and the semantic analysis and recursively
> calculate an AST hash. We start with all definitions (variables and
> functions) and include all relevant AST-node properties and the hashes
> of all children and referenced types. Thereby, we get a hash value
> that changes if a changed translation unit would result in a different
> object file. As a byproduct, we get a hash for every function, every
> referenced type, and all global variables.
> 
> So far, we have only implemented cHash for C, not yet for C++. We have
> already published our current version of the clang plugin[1] and are
> also working on a GCC version.
> 
> Furthermore, we presented our approach at USENIX ATC (and got a best
> paper award, yay!)[2]. Now, we consider proposing some parts of cHash
> for mainline.
> 
> Our impression is that the AST hashing mechanism[3], which calculates
> a semantic fingerprint for a given AST node is not only useful to
> detect redundant builds. AST hashes are (computationally) fast to
> calculate, easy to store, easy to calculate, and they capture all of
> the semantic of an AST node. These properties should ease other techniques:
> 
> - partial recompilation: we now know at an early state that a function
>  body has changed directly or indirectly.

Do you currently hash the full AST at once, or do you also compute/store hashes for individual functions?

> - change impact analysis: we can now quantify which parts of the
>  program is actually touched, even if we only modfiy a type.
> - project-wide analysis: have all types with the same name also
>  the name hash, or is there an inconsistency?
> 
> Besides the AST hashing mechanism, cHash also includes the caching and
> compiler abortion logic. However, in my opinion this is better located
> in an external tool, or a plugin. For example, a more close
> integration of CLang with CCache, could give the compilation speedups
> that we have observed in our prototype.
> 
> Before I start on preparing parts of the chash plugin to go upstream
> (there is definitely some cleanup required), I would love to hear your
> opinions on the following topics:
> 
> - Would you like to see the AST hashing in clang mainline?
> - Would you like to have a standalone tool that calculates AST hashes?
> - Should the detection of redundant compilations be part of CLang mainline?
> - Should we integrate the test-suite[4] into the repo? If yes, how?
> 
> chris
> 
> [1] https://github.com/luhsra/chash
> [2] https://www.usenix.org/conference/atc17/technical-sessions/presentation/dietrich
> [3] https://github.com/luhsra/chash/blob/master/clang-plugin/hash-visitor.cc
> [4] https://github.com/luhsra/chash/tree/master/test
>    https://github.com/luhsra/chash/blob/master/test/control_flow/if/IfStmt_10.c
> -- 
> Christian Dietrich, M.Sc. (Scientific Staff)
> Institute for Systems Engineering (Systems and Computerarchitecture)
> Leibniz Universität Hannover
> Appelstraße 4
> 30167 Hannover, Germany
> 
> Tel:    +49 511 762-19737
> Fax:    +49 511 762-19733
> eMail:  dietrich at sra.uni-hannover.de
> WWW:    https://www.sra.uni-hannover.de
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev




More information about the cfe-dev mailing list