[PATCH] D136624: [clang][modules] Account for non-affecting inputs in `ASTWriter`

Jan Svoboda via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Nov 1 13:10:12 PDT 2022


jansvoboda11 added a comment.

In D136624#3893607 <https://reviews.llvm.org/D136624#3893607>, @dexonsmith wrote:

> I'm curious; are system modules allowed to be non-affecting yet, or are they still assumed to be affecting? (It's the system modules that I think are most likely to be adjacent.)

Yes, all the measurements are done with system modules allowed to be non-affecting.

> My intuition is that there is likely some peephole that would be quite effective, that might not be useful for general `getFileID()` lookups.
>
> - I already suggested "same as last lookup?"... I'm curious if that'll help. Maybe that's already in `getFileID()`, but now that you've factored out that call, it could be useful to replicate.

I tried that and we seemingly never hit that case. I think that's because these two optimizations take precedence:

> - You could also try: "past the the last non-affecting module?"
> - You could also try: "before the first non-affecting module?"

These avoid the binary search for the vast majority of calls to `getAdjustment()` with local offsets.

> I suspect you could collect some data to guide this, such as, for loaded locations (you could ignore "local" locations since they already have a peephole):
>
> - Histogram of "loaded" vs. "between" vs. "after" non-affecting modules.
> - Histogram of "same as last" vs. "same as last-1" vs. "different from last 2".
> - [...]

Loaded locations make up 68% of all calls to `getAdjustment()`, so it's good it's the first thing we check for. Around 4% of all calls are for locations before the first non-affecting module. That's the second special case. Around 28% are pointing after the last non-affecting module, and the current revision has a special case for that as well. I think it would make sense to prioritize this check. Only the remaining 0.3% of calls do the actual binary search.

> Other things that might be useful to know:
>
> - What effect is the merging having (or would it have)? (i.e., what's the histogram of "adjacent" non-affecting files? (e.g.: 9 ranges of non-affecting files, with two blocks of 5 files and seven blocks of 1 (which aren't adjacent to any others)))

Big one. We usually get between 70-100 non-affecting module maps, but they merge into 4-6 consecutive regions.

> - Is there a change in cycles/instructions when the module cache is hot? (presumably the common case)

I didn't notice this (but didn't look for it specifically). How could that affect performance for PCM writes?

> - Are the PCM artifacts smaller?

Yes, we leave out the 70-100 SLocEntries. Nothing much changes otherwise.

> - Are the PCMs bit-for-bit identical now when a non-affecting module is added to the input? (If not, why not?)

They are not in the current revision, but I'll create a patch that makes them so (we also need to adjust the `NumCreatedFileIDs`).

> - What's the data for implicitly-discovered, explicitly-built modules?

I didn't measure that. I don't expect much of a difference, though. The scanner has an empty AST, so the number of `SourceLocation`s will be pretty low. (Other parts of the PCM file, like the metadata, don't write much `SourceManager` offsets.) Even if there was a small regression, I'd be fine with it, since we'll be moving off of implicit build in the scanner. For the explicit builds themselves, this shouldn't affect performance at all. Implicit module map search is disabled, all necessary/affecting module maps are deserialized from the PCM files or provided on the command-line.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136624/new/

https://reviews.llvm.org/D136624



More information about the cfe-commits mailing list