[clang] Bfi precision (PR #66285)
David Li via cfe-commits
cfe-commits at lists.llvm.org
Sat Oct 28 17:00:36 PDT 2023
david-xl wrote:
> And digging even deeper:
>
> * FWIW I noticed that I only used `clang -c` as benchmark previously, should have used `clang -c -O3` resulting in this:
>
> ```
> Old-BFI: insn: 37,821,687,947 (baseline)
> New-BFI: insn: 38,133,312,923 +0.82%
> Old-BFI, no-cold: insn: 37,423,365,184 -1.05%
> New-BFI, no-cold: insn: 37,438,736,713 -1.01%
> ```
>
> * The problematic part of the code that is inlined/not-inlined is actually marked as `LLVM_UNLIKELY` here: https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ADT/DenseMap.h#L607 and intuitively one would think that it is probably best to not inline `grow` (as will happen with the new BFI version)
> * In practice not-inlining `grow` mostly ends up being worse because of knock-on effects as far as I can tell. These are the inlining decisions I noticed for the most prominent situation:
>
> ```
> - `grow` inlined (Old-BFI):
> - Instruction::getMetadataImpl
> -> Value::getMetadata not inlined
> -> DenseMap::operator[] inlined
> -> DenseMap::FindAndConstruct inlined
> -> DenseMap::InsertIntoBucket not inlined, size
> likely too big with `grow` inlined here
> -> DenseMap::grow inlined
> - `grow` inlined (New-BFI):
> - Instruction::getMadataImpl
> -> Value::getMetadata inlined
> -> DenseMap::operator[] inlined
> -> DenseMap::FindAndConstruct not inlined, size
> -> DenseMap::InsertIntoBucket inlined
> -> DenseMap::grow not inlined
> ```
>
> Though if you look into this then I would state that the code got better for the wrong reasons! Not inlining `grow` is a sensible decision in this context and the `LLVM_UNLIKELY` annotation makes sense (I actually added some counters and see the unlikely branch taken somewhere between 1% and 10% of the cases depending on inputs, so seems sensible).
>
> Unfortunately the particular code here `getMetadataImpl` never inserts new things into the map, but unfortunately `operator[]` gives you that behavior by default so nobody noticed. So not inlining `InsertIntoBucket` happened to be a great decision previously that the compiler did by accident without having good data to support this. Now with better but still insufficient data (as this is PGO) we happen to end up inlining `InsertIntoBucket` wasting code size leading to worse inlining decisions further up the stack...
>
> Long story short: This ended up another of the many stories where the compiler cannot make the right inlining decisions without having actual runtime data. It tends to make a better decision based on the data that ends up being wrong anyway here. I'm gonna leave things as is and rather put up some improvements to LLVM code instead!
Good analysis.
Your plan sounds good but lowering the cold threshold is probably a good thing to do with the new BFI. The FindAndConstruct method in DenseMap can probably be annotated with buitin_expect to indicate the insertion is a unlikely path.
https://github.com/llvm/llvm-project/pull/66285
More information about the cfe-commits
mailing list