[all-commits] [llvm/llvm-project] eba574: [CSSPGO][llvm-profgen] Reimplement CS profile gene...

ictwanglei via All-commits all-commits at lists.llvm.org
Mon Jun 27 23:30:15 PDT 2022


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: eba5749262d9f1c6754984034c7a81fcd9bc3de6
      https://github.com/llvm/llvm-project/commit/eba5749262d9f1c6754984034c7a81fcd9bc3de6
  Author: wlei <wlei at fb.com>
  Date:   2022-06-27 (Mon, 27 Jun 2022)

  Changed paths:
    M llvm/include/llvm/Transforms/IPO/SampleContextTracker.h
    M llvm/tools/llvm-profgen/ProfileGenerator.cpp
    M llvm/tools/llvm-profgen/ProfileGenerator.h

  Log Message:
  -----------
  [CSSPGO][llvm-profgen] Reimplement CS profile generator using context trie

Our investigation showed ProfileMap's key is the bottleneck of the memory consumption for CS profile generation on some large services. This patch tries to optimize it by storing the CS function samples using the context trie tree structure instead of the context frame array ref. Parts of code in `ContextTrieNode` are reused.

Our experiment on one internal service showed that the context key's memory can be reduced from 80GB to 300MB.

To be compatible with non-CS profiles, the profile writer still needs to use ProfileMap as input, so rebuild the ProfileMap using the context trie in `postProcessProfiles`.

The optimization is not complete yet, next step is to reimplement Pre-inliner or profile trimmer, after that, ProfileMap should be small to be written.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D125246


  Commit: aa58b7b1e30fbbd9c8c2bf6ba291f1742f53afed
      https://github.com/llvm/llvm-project/commit/aa58b7b1e30fbbd9c8c2bf6ba291f1742f53afed
  Author: wlei <wlei at fb.com>
  Date:   2022-06-27 (Mon, 27 Jun 2022)

  Changed paths:
    M llvm/include/llvm/Transforms/IPO/SampleContextTracker.h
    M llvm/lib/Transforms/IPO/SampleContextTracker.cpp
    M llvm/tools/llvm-profgen/ProfileGenerator.cpp
    M llvm/tools/llvm-profgen/ProfileGenerator.h
    M llvm/tools/llvm-profgen/llvm-profgen.cpp

  Log Message:
  -----------
  [CSSPGO][llvm-profgen] Reimplement computeSummaryAndThreshold using context trie

Follow-up patch to https://reviews.llvm.org/D125246, support `computeSummaryAndThreshold` based on context trie.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D127026


  Commit: 7e86b13c63f200a5649234647433fc563e1159f5
      https://github.com/llvm/llvm-project/commit/7e86b13c63f200a5649234647433fc563e1159f5
  Author: wlei <wlei at fb.com>
  Date:   2022-06-27 (Mon, 27 Jun 2022)

  Changed paths:
    M llvm/include/llvm/ProfileData/SampleProf.h
    M llvm/include/llvm/Transforms/IPO/SampleContextTracker.h
    M llvm/lib/Transforms/IPO/SampleContextTracker.cpp
    M llvm/lib/Transforms/IPO/SampleProfile.cpp
    M llvm/tools/llvm-profgen/CSPreInliner.cpp
    M llvm/tools/llvm-profgen/CSPreInliner.h
    M llvm/tools/llvm-profgen/ProfileGenerator.cpp
    M llvm/tools/llvm-profgen/ProfiledBinary.cpp
    M llvm/tools/llvm-profgen/ProfiledBinary.h

  Log Message:
  -----------
  [CSSPGO][llvm-profgen] Reimplement SampleContextTracker using context trie

This is the followup patch to https://reviews.llvm.org/D125246 for the `SampleContextTracker` part. Before the promotion and merging of the context is based on the SampleContext(the array of frame), this causes a lot of cost to the memory. This patch detaches the tracker from using the array ref instead to use the context trie itself. This can save a lot of memory usage and benefit both the compiler's CS inliner and llvm-profgen's pre-inliner.

One structure needs to be specially treated is the `FuncToCtxtProfiles`, this is used to get all the functionSamples for one function to do the merging and promoting. Before it search each functions' context and traverse the trie to get the node of the context. Now we don't have the context inside the profile, instead we directly use an auxiliary map `ProfileToNodeMap` for profile , it initialize to create the FunctionSamples to TrieNode relations and keep updating it during promoting and merging the node.

Moreover, I was expecting the results before and after remain the same, but I found that the order of FuncToCtxtProfiles matter and affect the results. This can happen on recursive context case, but the difference should be small. Now we don't have the context, so I just used a vector for the order, the result is still deterministic.

Measured on one huge size(12GB) profile from one of our internal service. The profile similarity difference is 99.999%, and the running time is improved by 3X(debug mode) and the memory is reduced from 170GB to 90GB.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D127031


Compare: https://github.com/llvm/llvm-project/compare/834a38bbcbcf...7e86b13c63f2


More information about the All-commits mailing list