[PATCH] D136624: [clang][modules] Account for non-affecting inputs in `ASTWriter`
Duncan P. N. Exon Smith via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Thu Oct 27 08:01:36 PDT 2022
dexonsmith added a comment.
In D136624#3888387 <https://reviews.llvm.org/D136624#3888387>, @jansvoboda11 wrote:
> I tried optimizing this patch a bit. Instead of creating compact data structure and using binary search to find the preceding non-affecting file, I now store the adjustment information for each `FileID` in a vector. During deserialization, `FileID` is simply used as an index into `SLocEntryInfos`. That didn't yield any measurable improvement in performance, though. I think the regression must be coming from the `SourceLocation`/`Offset` to `FileID` translation.
>
> I don't see any obvious way to work around that. `SourceManager::getFileIDLocal()` already implements some optimizations that makes accessing nearby offsets fast. A separate `SourceManager` would avoid this bottleneck, but I'm not sure how much work that would entail (seems substantial).
>
> @Bigcheese LMK if you're fine with the performance implications here.
I don't think you need to call `getFileID()` inside `getAdjustment()`. There is also an opportunity for a peephole, if `getAdjustment()` remains expensive.
I've left some comments on the previous version of the patch since it's not obvious to me how to avoid the `getFileID()` call in the new version.
In D136624#3888097 <https://reviews.llvm.org/D136624#3888097>, @jansvoboda11 wrote:
> The serialization slowdown I understand, but I expected deserialization to get faster, since we now have less of `SourceManager` to look through.
Seems worth digging into the deserialization regression. Does the PCM actually get smaller and the ranges more condensed?
One quick test would be to manufacture a situation where two output PCMs would previously have different non-affecting inputs, but now should be bit-for-bit identical. Are they, in fact, bit-for-bit identical? If not, maybe there's something funny to look into...
================
Comment at: clang/include/clang/Serialization/ASTWriter.h:449-452
+ /// Exclusive prefix sum of the lengths of preceding non-affecting inputs.
+ std::vector<SourceLocation::UIntTy> NonAffectingInputOffsetAdjustments;
+ /// Exclusive prefix sum of the count of preceding non-affecting inputs.
+ std::vector<unsigned> NonAffectingInputFileIDAdjustments;
----------------
Can you collect a histogram for how big these vectors are? Can we avoid pointer chasing in the common case by making them `SmallVector` of some size during lookup?
================
Comment at: clang/lib/Serialization/ASTWriter.cpp:2054
// Starting offset of this entry within this module, so skip the dummy.
- Record.push_back(SLoc->getOffset() - 2);
+ Record.push_back(getAdjustedOffset(SLoc->getOffset()) - 2);
if (SLoc->isFile()) {
----------------
Can we shift this `getAdjustedOffset()` computation to after deciding whether to skip the record?
================
Comment at: clang/lib/Serialization/ASTWriter.cpp:5282-5283
+
+SourceLocation::UIntTy
+ASTWriter::getAdjustment(SourceLocation::UIntTy Offset) const {
+ if (PP->getSourceManager().isLoadedOffset(Offset) ||
----------------
How often does `getAdjustment()` return the same answer in consecutive calls? If at all common, this would likely benefit from a peephole:
```
lang=c++
Optional<SLocRange> ASTWriter::CachedAdjustmentRange;
Optional<UIntTy> ASTWriter::CachedAdjustment;
SourceLocation::UIntTy
ASTWriter::getAdjustment(SourceLocation::UIntTy Offset) const {
// Check for 0.
//
// How fast is "isLoadedOffset()"? Can/should we add a peephole, or is it just bit
// manipulation? (I seem to remember it checking the high bit or something, but if
// it's doing some sort of look up, maybe it should be in the slow path so it can
// get cached by LastAdjustment.)
if (PP->getSourceManager().isLoadedOffset(Offset) ||
NonAffectingInputs.empty())
return 0;
// Check CachedAdjustment.
if (CachedAdjustment && CachedAdjustmentRange->includes(Offset))
return *CachedAdjustment;
// Call getAdjustmentSlow, which updates CachedAdjustment and CachedAdjustmentRange.
// It's out-of-line so that `getAdjustment` can easily be inlined without inlining
// the slow path.
//
// LastAdjustmentRange would be the size of the "gap" between this adjustment
// level and the next one (end would be UINTMAX if it's after the last
// non-affecting range).
return getAdjustmentSlow(Offset);
}
```
================
Comment at: clang/lib/Serialization/ASTWriter.cpp:5289-5290
+ ? NonAffectingInputs.end()
+ : llvm::lower_bound(NonAffectingInputs,
+ PP->getSourceManager().getFileID(Offset));
+ unsigned Idx = std::distance(NonAffectingInputs.begin(), It);
----------------
Why do you need to call `getFileID()` here?
Instead, I would expect this to be a search through a range of offsets (e.g., see my suggestion at https://reviews.llvm.org/D106876#3869247 -- `DroppedMMs` contains SourceLocations, not FileIDs).
Two benefits:
1. You don't need to call `getFileID()` to look up an offset.
2. You can merge adjacent non-affecting files (shrinking the search/storage significantly).
================
Comment at: clang/test/Modules/non-affecting-module-maps-source-locations.m:32
+
+// RUN: %clang_cc1 -I %t/first -I %t/second -I %t/third -fmodules -fimplicit-module-maps -fmodules-cache-path=%t/cache %t/tu.m -o %t/tu.o
----------------
This is exercising the code, but it could do one better and check if the output PCMs are bit-for-bit identical when we (now) expect them to be.
Maybe you could do this by having two run lines: one that includes `-I %t/second` and another that doesn't. Then check if the output PCMs are equal.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D136624/new/
https://reviews.llvm.org/D136624
More information about the cfe-commits
mailing list