[all-commits] [llvm/llvm-project] 386930: [CSSPGO][llvm-profgen] Aggregate samples on call f...

Thu Feb 4 08:44:51 PST 2021

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 3869309a0c92735228afaa5663d1c10bb372bb54
      https://github.com/llvm/llvm-project/commit/3869309a0c92735228afaa5663d1c10bb372bb54
  Author: wlei <wlei at fb.com>
  Date:   2021-02-04 (Thu, 04 Feb 2021)

  Changed paths:
    M llvm/tools/llvm-profgen/PerfReader.cpp
    M llvm/tools/llvm-profgen/PerfReader.h
    M llvm/tools/llvm-profgen/ProfiledBinary.cpp
    M llvm/tools/llvm-profgen/ProfiledBinary.h

  Log Message:
  -----------
  [CSSPGO][llvm-profgen] Aggregate samples on call frame trie to speed up profile generation

For CS profile generation, the process of call stack unwinding is time-consuming since for each LBR entry we need linear time to generate the context( hash, compression, string concatenation). This change speeds up this by grouping all the call frame within one LBR sample into a trie and aggregating the result(sample counter) on it, deferring the context compression and string generation to the end of unwinding.

Specifically, it uses `StackLeaf` as the top frame on the stack and manipulates(pop or push a trie node) it dynamically during virtual unwinding so that the raw sample can just be recoded on the leaf node, the path(root to leaf) will represent its calling context. In the end, it traverses the trie and generates the context on the fly.

Results:
Our internal branch shows about 5X speed-up on some large workloads in SPEC06 benchmark.

Differential Revision: https://reviews.llvm.org/D94110