[PATCH] D20401: [Lexer] Don't merge macro args from different macro files

Nick Desaulniers via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Sep 29 11:12:37 PDT 2022


nickdesaulniers added subscribers: justinstitt, nikic, tstellar, serge-sans-paille.
nickdesaulniers added a comment.

In D20401#3823476 <https://reviews.llvm.org/D20401#3823476>, @sammccall wrote:

> In D20401#2770059 <https://reviews.llvm.org/D20401#2770059>, @nickdesaulniers wrote:
>
>> I know this was sped up slightly in 3339c568c43e4644f02289e5edfc78c860f19c9f, but this change makes `updateConsecutiveMacroArgTokens` the hottest function in clang in a bottom up profile of an entire build of the Linux kernel.  It thrashes the one entry LastFileIDLookup cache, and we end up looking up the same FileID again and again and again for each token when we expand nested function like macros.
>>
>> Is there anything we can do to speed this up?
>
> @hokein and I spent some time looking at this (initially trying to understand behavior, now performance).

Wonderful! I'm glad to see you also identified this change in particular.  Thanks as well for looking into this code.

> Short version is:
>
> - we can *simplify* the code a lot, we think it's now just partitioning based on FileID and this can be done more clearly. This may have some speedups at the margin.
> - in terms of performance: I suspect when clang is built by GCC it's doing roughly 2x as much work as when it's built by clang. @nickdesaulniers can you tell me which you're measuring/deploying? To give some idea if we're likely to actually help much...

In all of my personal measurements, I've been bootstrapping clang with AOSP's clang, and `updateConsecutiveMacroArgTokens` is still the worst method in profiles.

But I wouldn't be surprised if other Linux distro's like RHEL bootstrap their clang distribution via GCC.  @tstellar or @serge-sans-paille or @nikic might know.  We did get a curious comment from a kernel developer recently claiming that clang was "twice as slow as GCC" which didn't make much sense; not sure if it was an exaggeration vs. precise measurement, but I wouldn't be surprised if evaluation order you identified plays into this, making the worst method even slower.  I'll try to find a link to the thread...

My summer intern @justinstitt looked into this case again briefly.  We found that minor improvements to `SourceManager::isWrittenInSameFile` to avoid a few more calls to `getFileID` to be irrelevant in terms of performance improvement. But we also didn't consider GCC-built-clang; we were using clang to bootstrap clang.

---

Meanwhile, I think besides evaluating the high level logic in in `TokenLexer` and how it might be improved, I think there's potentially an opportunity for a "AOS vs. SOA" speedup in `SourceManager`.  `SourceManager::LoadedSLocEntryTable` is a `llvm::SmallVector<SrcMgr::SLocEntry>`. `SourceManager::getFileIDLoaded` only really cares about the `SLocEntry`'s `Offset`. I suspect we could get major memory locality wins by packing those into a standalone vector so that we could search them faster.

> @hokein or I will try to find time to take a stab at this.

Awesome, please keep me in the loop.


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D20401/new/

https://reviews.llvm.org/D20401



More information about the cfe-commits mailing list