[all-commits] [llvm/llvm-project] 0609f2: [CSSPGO][llvm-profgen] Compress recursive cycles i...

ictwanglei via All-commits all-commits at lists.llvm.org
Wed Feb 3 18:53:17 PST 2021


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 0609f257dc2e2c3e4c7cd30fe2ffd520117e706b
      https://github.com/llvm/llvm-project/commit/0609f257dc2e2c3e4c7cd30fe2ffd520117e706b
  Author: wlei <wlei at fb.com>
  Date:   2021-02-03 (Wed, 03 Feb 2021)

  Changed paths:
    A llvm/test/tools/llvm-profgen/Inputs/recursion-compression-noprobe.perfbin
    A llvm/test/tools/llvm-profgen/Inputs/recursion-compression-noprobe.perfscript
    A llvm/test/tools/llvm-profgen/Inputs/recursion-compression-pseudoprobe.perfbin
    A llvm/test/tools/llvm-profgen/Inputs/recursion-compression-pseudoprobe.perfscript
    A llvm/test/tools/llvm-profgen/recursion-compression-noprobe.test
    A llvm/test/tools/llvm-profgen/recursion-compression-pseudoprobe.test
    M llvm/tools/llvm-profgen/PerfReader.cpp
    M llvm/tools/llvm-profgen/ProfileGenerator.cpp
    M llvm/tools/llvm-profgen/ProfileGenerator.h
    M llvm/tools/llvm-profgen/ProfiledBinary.cpp
    M llvm/tools/llvm-profgen/ProfiledBinary.h
    M llvm/tools/llvm-profgen/PseudoProbe.cpp
    M llvm/tools/llvm-profgen/PseudoProbe.h
    M llvm/unittests/tools/CMakeLists.txt
    A llvm/unittests/tools/llvm-profgen/CMakeLists.txt
    A llvm/unittests/tools/llvm-profgen/ContextCompressionTest.cpp

  Log Message:
  -----------
  [CSSPGO][llvm-profgen] Compress recursive cycles in calling context

This change compresses the context string by removing cycles due to recursive function for CS profile generation. Removing recursion cycles is a way to normalize the calling context which will be better for the sample aggregation and also make the context promoting deterministic.
Specifically for implementation, we recognize adjacent repeated frames as cycles and deduplicated them through multiple round of iteration.
For example:
Considering a input context string stack:
[“a”, “a”, “b”, “c”, “a”, “b”, “c”, “b”, “c”, “d”]
For first iteration,, it removed all adjacent repeated frames of size 1:
[“a”, “b”, “c”, “a”, “b”, “c”, “b”, “c”, “d”]
For second iteration, it removed all adjacent repeated frames of size 2:
[“a”, “b”, “c”, “a”, “b”, “c”, “d”]
So in the end, we get compressed output:
[“a”, “b”, “c”, “d”]

Compression will be called in two place: one for sample's context key right after unwinding, one is for the eventual context string id in the ProfileGenerator.
Added a switch `compress-recursion` to control the size of duplicated frames, default -1 means no size limit.
Added unit tests and regression test for this.

Differential Revision: https://reviews.llvm.org/D93556


  Commit: 1714ad2336293f351b15dd4b518f9e8618ec38f2
      https://github.com/llvm/llvm-project/commit/1714ad2336293f351b15dd4b518f9e8618ec38f2
  Author: wlei <wlei at fb.com>
  Date:   2021-02-03 (Wed, 03 Feb 2021)

  Changed paths:
    M llvm/tools/llvm-profgen/PerfReader.cpp
    M llvm/tools/llvm-profgen/PerfReader.h
    M llvm/tools/llvm-profgen/ProfiledBinary.cpp
    M llvm/tools/llvm-profgen/ProfiledBinary.h

  Log Message:
  -----------
  [CSSPGO][llvm-profgen] Aggregate samples on call frame trie to speed up profile generation

For CS profile generation, the process of call stack unwinding is time-consuming since for each LBR entry we need linear time to generate the context( hash, compression, string concatenation). This change speeds up this by grouping all the call frame within one LBR sample into a trie and aggregating the result(sample counter) on it, deferring the context compression and string generation to the end of unwinding.

Specifically, it uses `StackLeaf` as the top frame on the stack and manipulates(pop or push a trie node) it dynamically during virtual unwinding so that the raw sample can just be recoded on the leaf node, the path(root to leaf) will represent its calling context. In the end, it traverses the trie and generates the context on the fly.

Results:
Our internal branch shows about 5X speed-up on some large workloads in SPEC06 benchmark.

Differential Revision: https://reviews.llvm.org/D94110


Compare: https://github.com/llvm/llvm-project/compare/c95c0db2eb68...1714ad233629


More information about the All-commits mailing list