[all-commits] [llvm/llvm-project] 7cc249: profi - a flow-based profile inference algorithm: ...

Wed Dec 1 15:31:09 PST 2021

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 7cc2493daaf549f2e8bf899a34e3326884104afe
      https://github.com/llvm/llvm-project/commit/7cc2493daaf549f2e8bf899a34e3326884104afe
  Author: spupyrev <spupyrev at fb.com>
  Date:   2021-12-01 (Wed, 01 Dec 2021)

  Changed paths:
    A llvm/include/llvm/Transforms/Utils/SampleProfileInference.h
    M llvm/include/llvm/Transforms/Utils/SampleProfileLoaderBaseImpl.h
    M llvm/lib/Transforms/IPO/SampleProfile.cpp
    M llvm/lib/Transforms/Utils/CMakeLists.txt
    A llvm/lib/Transforms/Utils/SampleProfileInference.cpp
    M llvm/lib/Transforms/Utils/SampleProfileLoaderBaseUtil.cpp
    A llvm/test/Transforms/SampleProfile/Inputs/profile-inference.prof
    A llvm/test/Transforms/SampleProfile/profile-inference.ll

  Log Message:
  -----------
  profi - a flow-based profile inference algorithm: Part I (out of 3)

The benefits of sampling-based PGO crucially depends on the quality of profile
data. This diff implements a flow-based algorithm, called profi, that helps to
overcome the inaccuracies in a profile after it is collected.

Profi is an extended and significantly re-engineered classic MCMF (min-cost
max-flow) approach suggested by Levin, Newman, and Haber [2008, Complementing
missing and inaccurate profiling using a minimum cost circulation algorithm]. It
models profile inference as an optimization problem on a control-flow graph with
the objectives and constraints capturing the desired properties of profile data.
Three important challenges that are being solved by profi:
- "fixing" errors in profiles caused by sampling;
- converting basic block counts to edge frequencies (branch probabilities);
- dealing with "dangling" blocks having no samples in the profile.

The main implementation (and required docs) are in SampleProfileInference.cpp.
The worst-time complexity is quadratic in the number of blocks in a function,
O(|V|^2). However a careful engineering and extensive evaluation shows that
the running time is (slightly) super-linear. In particular, instances with
1000 blocks are solved within 0.1 second.

The algorithm has been extensively tested internally on prod workloads,
significantly improving the quality of generated profile data and providing
speedups in the range from 0% to 5%. For "smaller" benchmarks (SPEC06/17), it
generally improves the performance (with a few outliers) but extra work in
the compiler might be needed to re-tune existing optimization passes relying on
profile counts.

UPD Dec 1st 2021:
- synced the declaration and definition of the option `SampleProfileUseProfi ` to use type `cl::opt<bool`;
- added `inline` for `SampleProfileInference<BT>::findUnlikelyJumps` and `SampleProfileInference<BT>::isExit` to avoid linking problems on windows.

Reviewed By: wenlei, hoy

Differential Revision: https://reviews.llvm.org/D109860