[cfe-dev] [RFC] cHash: Integration of AST hashing

Sat Aug 12 09:33:47 PDT 2017

Christian Dietrich <dietrich at sra.uni-hannover.de> writes:

> The benefit of doing it on the token stream is that you can avoid the
> expensive parsing.

And it works with any compiler that supports the -E option.

> But I wonder how hashing the token stream can be any better than doing
> a textual hash on the preprocessed code before lexing (as ccache is
> doing it)?

Probably not by much (can ignore some redundant #line directives, etc).
The reason we do it this way in build2 is to be ready for the day when
we no longer need preprocessing, at least for some translation units
(with C++ modules being the first step in that direction). Such
translation units will still contain comments, line continuations,
etc., which we would want to ignore.

> In the current prototype, we do not include debugging information.
> Therefore, the hash does not change if you introduce a newline into the
> source. In the future, line numbers/filnames should be included, if
> debugging information is requested.

In our experience, this feature is most useful during development
when debug info is almost always required.

Boris