[llvm-dev] Current PGO status
Xinliang David Li via llvm-dev
llvm-dev at lists.llvm.org
Tue Feb 6 15:11:16 PST 2018
Victor, thanks for the experiment.
My suspicion is it is due to the remaining issues with block layout --
especially with loop rotation (with PGO). Another problem is that tail dup
is not happening after loop rotation which can limit the effectiveness of
I tried the internal option -mllvm -force-precise-rotation-cost and there
is about 10% speedup with -fprofile-use. This option turns on more precise
cost model when computing rotation strategy but it is not turned on by
+carrot who is working on this area.
On Tue, Feb 6, 2018 at 1:37 PM, Victor Leschuk <vleschuk at accesssoftek.com>
> Hello David, thanks for detailed response!
> Do you have any tests that you use to measure the PGO effectiveness? I
> have tested clang version 6.0 with the same sample that Jie Chen used in
> 2016 and actually both frontend-based PGO and IR-based make code run
> slower, see the average time:
> clang++ -O3: 3.15 sec
> clang++ -O3 and -fprofile-instr-use: 3.160 sec
> clang++ -O3 and -fprofile-use: 3.180 sec
> g++ (7.3.0) -O3: 3.640 sec
> g++ (7.3.0) -O3 and -fprofile-use: 2.92 sec
> Do you have any idea what can be wrong? Maybe there are some
> recommendations in which cases one should use PGO with clang and when it is
> better not to do it?
> On 02/05/2018 09:38 AM, Xinliang David Li wrote:
> On Sun, Feb 4, 2018 at 9:59 PM, Victor Leschuk <vleschuk at accesssoftek.com>
>> Hello David!
>> I have recently started acquaintance with PGO in LLVM/clang and found
>> your e-mail thread:
>> http://lists.llvm.org/pipermail/llvm-dev/2016-May/099395.html . Here you
>> posted a nice list of optimizations that use profiling and of those
>> which could be using but don't. However that thread is about 2 years
>> old. Could you please kindly let me know if there were any significant
>> changes in this area since that time?
> Yes, there were quite some changes since then. Here are some of the new
> * LLVM IR based PGO -- this is designed to maximize program performance.
> The option to turn it on is -fprofile-generate/-fprofile-use
> * value profiling support in PGO -- currently support indirect call target
> profiling and memcpy/memset size profiling and optimizations
> * Profile data is made available for inliner to use (enabled only for the
> new pass manager: -fexperimental-new-pass-manager)
> * Profile aware LICM is available -- implemented via a profile driven code
> sinking pass
> * Partial inlining is made profile aware; Graham Yu also added support
> for multiple region function outlining (with PGO)
> * BB layout heuristics are tuned with PGO
> * hotness driven function layout optimization
> There are pending work in the following area:
> * profile aware loop vectorization, etc
> * control heigh reduction optimization (Hiroshi is working on this)
> ThinLTO also works well with PGO.
> Hope this helps.
> >* What I can tell you is that there are many missing ones (that can benefit
> *from profile): such as profile aware LICM (patch pending), speculative PRE,
> loop unrolling, loop peeling, auto vectorization, inlining, function
> splitting, function layout, function outlinling, profile driven size
> optimization, induction variable optimization/strength reduction, stringOp
> specialization/optimization/inlining, switch peeling/lowering etc. The
> biggest profile user today include ralloc, BB layout, ifcvt, shrinkwrapping
> etc, but there should be rooms to be improvement there too.
>> Thanks in advance!
>> Best Regards,
>> Victor Leschuk | Software Engineer | Access Softek
> Best Regards,
> Victor Leschuk | Software Engineer | Access Softek
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev