[PATCH] D118385: [NFC] Optimize FoldingSet usage where it matters

Dawid Jurczak via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jan 27 09:03:21 PST 2022


yurai007 created this revision.
yurai007 added reviewers: nikic, xbolva00, aeubanks, ChuanqiXu, v.g.vassilev, serge-sans-paille, rsmith.
Herald added subscribers: dexonsmith, pengfei, hiraditya.
yurai007 requested review of this revision.
Herald added projects: clang, LLVM.
Herald added subscribers: llvm-commits, cfe-commits.

While building huge code bases it's not uncommon to see perf reports with following FoldingSet items:

      
  1.56%     0.47%  clang  clang-14  [.] llvm::FoldingSetBase::FindNodeOrInsertPos
  0.30%     0.01%  clang  clang-14  [.] llvm::ContextualFoldingSet<clang::FunctionProtoType, clang::ASTContext&>::NodeEquals
  0.25%     0.02%  clang  clang-14  [.] llvm::FoldingSetBase::InsertNode
  0.23%     0.12%  clang  clang-14  [.] llvm::FoldingSetBase::GrowBucketCount
  0.22%     0.21%  clang  clang-14  [.] llvm::FoldingSetNodeID::AddPointer
  0.47%     0.06%  clang  clang-14  [.] llvm::FoldingSetBase::InsertNode
      
  or
      
  1.12%     0.75%  clang++       libLLVM-13.so        [.] llvm::FoldingSetBase::GrowBucketCount
  0.49%     0.48%  clang++       libLLVM-13.so        [.] llvm::FoldingSetNodeID::AddPointer
  0.41%     0.09%  clang++       libLLVM-13.so        [.] llvm::FoldingSetNodeID::operator==
      
  etc.
      

Among many FoldingSet users most notable seem to be ASTContext and CodeGenTypes.
The reasons that we spend not-so-tiny amount of time in FoldingSet calls from there, are following:

  
  1. Default FoldingSet capacity for 2^6 items very often is not enough.
     For PointerTypes/ElaboratedTypes/ParenTypes it's not unlikely to observe growing it to 256 or 512 items.
     FunctionProtoTypes can easily exceed 1k items capacity growing up to 4k or even 8k size.
  
  2. FoldingSetBase::GrowBucketCount cost itself is not very bad (pure reallocations are rather cheap thanks to BumpPtrAllocator)
     What matters is high collision rate when lot of items end up in same bucket slowing down FoldingSetBase::FindNodeOrInsertPos and trashing CPU cache
     (as items with same hash are organized in intrusive linked list which need to be traversed).
  
  3. Lack of AddInteger/AddPointer and computeHash inlining slows down NodeEquals/Profile/:operator== calls.
     Inlining makes FunctionProtoTypes/PointerTypes/ElaboratedTypes/ParenTypes Profile functions faster but
     since NodeEquals is still called indirectly through function pointer from FindNodeOrInsertPos
     there is room for further inlining improvements.


After addressing above issues I built Linux (with default config) on isolated CPU cores in silent x86-64 Linux environment.
Compile time statistics diff produced by perf before and after change are following:
instructions -0.4%, cycles -0.9%
size-text change of output Clang binary is below +0.1%.

      

Similarly like in: https://reviews.llvm.org/D118169 for code bases containing smaller translation units
it's expected to get less significant speedup with this patch.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D118385

Files:
  clang/include/clang/AST/ASTContext.h
  clang/lib/AST/ASTContext.cpp
  clang/lib/CodeGen/CodeGenTypes.h
  llvm/include/llvm/ADT/FoldingSet.h
  llvm/lib/Support/FoldingSet.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D118385.403683.patch
Type: text/x-patch
Size: 8225 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220127/0df991f6/attachment.bin>


More information about the llvm-commits mailing list