[PATCH] D78845: [COFF] Add a fastpath for /INCLUDE: in .drective sections

Reid Kleckner via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Apr 28 10:11:49 PDT 2020


rnk added a comment.

In D78845#2003705 <https://reviews.llvm.org/D78845#2003705>, @aganea wrote:

> As for LLVMOptions, what prevents a BumpAllocator + placement new on the Arg(s)? Or is the perf. wasted somewhere else?


I think that's a good first step. After that, I would focus on making the object leaner. It is huge.

The `Option` class is two pointers large, and we pass it by value. It needs a pointer to the parent option table so that it can implement `getAliasedOption` and `getOptionGroup` (sp). I think we could trim out the parent pointer by assuming that the option Info structs are laid out in an array indexed by option ID. Subtract the current option pointer by the ID, and then add back the ID of the option.

For `Arg` itself, I would suggest making all seldomly used fields members of `TrailingObjects`. This complicates the use of `BumpPtrAllocator`, but seems worth it.

> Side-node: I was profiling the build-time regression between Clang 9, 10 & 11 on building LLVM and building a few of our games, with and without debug info. There's a severe regression, +10% CPU without debug info, and +15 to +18% with debug info, from Clang 9 to 10. Clang 11 adds an extra +2%. Additionally, no matter from what angle I look, allocations in clang take 10.5%-11% of the total CPU time (on Windows with the standard heap allocator). Replacing the allocator reduces that to 3.5% CPU for allocations, and improves some bandwidth/cache-sensitive functions along the way, which effectively reduce CPU usage by 15% (compared to baseline Clang 10). But at the same time it swipes the issue under the carpet. This all seems related to the amount of (small) allocations in LLVM generally.

I think the Google C++ production toolchain team noticed similar results and switched to tcmalloc to achieve the same thing.

I think early in LLVM project history, developers did a lot of micro-optimization focusing on reducing heap allocations (see prevalence (and overuse!) of SmallVector), and a lot of that has gone by the wayside as generic containers proliferate in new code.

However, I know that for the option library specifically, performance was not a priority because it was considered to only impact startup time, and therefore not worth optimizing. Reusing it for `.drective` parsing where throughput is important takes it outside of the original problem domain.

---

As a next step, I noticed that `cl::TokenizeWindowsCommandLine` copies all strings. ;_; When I initially wrote it, I had intended that it would only copy an argument in the case that it had to deal with quotations, but it looks like some developer has "helpfully" fixed a use after free by making it always copy. :(


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78845/new/

https://reviews.llvm.org/D78845





More information about the llvm-commits mailing list