[llvm-dev] [RFC] New Feature Proposal: De-Optimizing Cold Functions using PGO Info
Min-Yih Hsu via llvm-dev
llvm-dev at lists.llvm.org
Tue Sep 8 17:20:35 PDT 2020
We would like to propose a new feature to disable optimizations on IR
Functions that are considered “cold” by PGO profiles. The primary goal for
this work is to improve code optimization speed (which also improves
compilation and LTO speed) without making too much impact on target code
performance.
The mechanism is pretty simple: In the second phase (i.e. optimization
phase) of PGO, we would add `optnone` attributes on functions that are
considered “cold”. That is, functions with low profiling counts. Similar
approach can be applied on loops. The rationale behind this idea is pretty
simple as well: If a given IR Function will not be frequently executed, we
shouldn’t waste time optimizing it. Similar approaches can be found in
modern JIT compilers for dynamic languages (e.g. Javascript and Python)
that adopt a multi-tier compilation model: Only “hot” functions or
execution traces will be brought to higher-tier compilers for aggressive
optimizations.
In addition to de-optimizing on functions whose profiling counts are
exactly zero (`-fprofile-deopt-cold`), we also provide a knob
(`-fprofile-deopt-cold-percent=<X percent>`) to adjust the “cold
threshold”. That is, after sorting profiling counts of all functions, this
knob provides an option to de-optimize functions whose count values are
sitting in the lower X percent.
We evaluated this feature on LLVM Test Suite (the Bitcode, SingleSource,
and MultiSource sub-folders were selected). Both compilation speed and
target program performance are measured by the number of instructions
reported by Linux perf. The table below shows the percentage of compilation
speed improvement and target performance overhead relative to the baseline
that only uses (instrumentation-based) PGO.
Experiment Name Compile Speedup Target Overhead
DeOpt Cold Zero Count 5.13% 0.02%
DeOpt Cold 25% 8.06%
0.12%
DeOpt Cold 50% 13.32%
2.38%
DeOpt Cold 75% 17.53%
7.07%
(The “DeOpt Cold Zero Count” experiment will only disable optimizations on
functions whose profiling counts are exactly zero. Rest of the experiments
are disabling optimizations on functions whose profiling counts are in the
lower X%.)
We also did evaluations on FullLTO, here are the numbers:
Experiment Name Link Time Speedup Target Overhead
DeOpt Cold Zero Count 10.87% 1.29%
DeOpt Cold 25% 18.76%
1.50%
DeOpt Cold 50% 30.16%
3.94%
DeOpt Cold 75% 38.71%
8.97%
(The link time presented here included the LTO and code generation time. We
omitted the compile time numbers here since it’s not really interesting in
LTO setup)
>From the above experiments we observed that compilation / link time
improvement scaled linearly with the percentage of cold functions we
skipped. Even if we only skipped functions that never got executed (i.e.
had counter values equal to zero, which is effectively “0%”), we already
had 5~10% of “free ride” on compilation / linking speed improvement and
barely had any target performance penalty.
We believed that the above numbers had justified this patch to be useful on
improving build time with little overhead.
Here are the patches for review:
* Modifications on LLVM instrumentation-based PGO:
https://reviews.llvm.org/D87337
* Modifications on Clang driver: https://reviews.llvm.org/D87338
Credit: This project was originally started by Paul Robinson <
paul.robinson at sony.com> and Edward Dawson <Edd.Dawson at sony.com> from Sony
PlayStation compiler team. I picked it up when I was interning there this
summer.
Thank you for your reading.
-Min
--
Min-Yih Hsu
Ph.D Student in ICS Department, University of California, Irvine (UCI).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200908/eab3480f/attachment.html>
More information about the llvm-dev
mailing list