[llvm-dev] Current PGO status

Victor Leschuk via llvm-dev llvm-dev at lists.llvm.org
Thu Feb 8 09:42:35 PST 2018


https://bugs.llvm.org/show_bug.cgi?id=36303 Please let me know if I can
help somehow.


On 02/08/2018 01:22 AM, Xinliang David Li wrote:
> Victor, please file a bug tracking the issue. We can put relevant
> information there including test cases used in the experiment etc.
>
> thanks,
>
> David
>
> On Wed, Feb 7, 2018 at 2:15 PM, Victor Leschuk
> <vleschuk at accesssoftek.com <mailto:vleschuk at accesssoftek.com>> wrote:
>
>     David, could you please clarify on which code did you gain 10%
>     improvement? I have run numerous tests with and w/o this option
>     and it looks like it has no effect on performance (I am talking of
>     the old 2016 sample to be concrete). Maybe we could investigate it
>     together? Just tell me where to start?
>
>
>     On 02/07/2018 02:11 AM, Xinliang David Li wrote:
>>     Victor, thanks for the experiment.
>>
>>     My suspicion is it is due to the remaining issues with block
>>     layout -- especially with loop rotation (with PGO). Another
>>     problem is that tail dup is not happening after loop rotation
>>     which can limit the effectiveness of loop rotation.
>>
>>     I tried the internal option -mllvm -force-precise-rotation-cost
>>     and there is about 10% speedup with -fprofile-use. This option
>>     turns on more precise cost model when computing rotation strategy
>>     but it is not turned on by default.
>>
>>     +carrot who is working on this area.
>>
>>     thanks,
>>
>>     David
>>
>>     On Tue, Feb 6, 2018 at 1:37 PM, Victor Leschuk
>>     <vleschuk at accesssoftek.com <mailto:vleschuk at accesssoftek.com>> wrote:
>>
>>         Hello David, thanks for detailed response!
>>
>>         Do you have any tests that you use to measure the PGO
>>         effectiveness? I have tested clang version 6.0 with the same
>>         sample that Jie Chen used in 2016 and actually both
>>         frontend-based PGO and IR-based make code run slower, see the
>>         average time:
>>
>>         clang++ -O3: 3.15 sec 
>>
>>         clang++ -O3 and -fprofile-instr-use: 3.160 sec
>>
>>         clang++ -O3 and -fprofile-use: 3.180 sec
>>
>>         g++ (7.3.0) -O3: 3.640 sec
>>
>>         g++ (7.3.0) -O3 and -fprofile-use: 2.92 sec
>>
>>         Do you have any idea what can be wrong? Maybe there are some
>>         recommendations in which cases one should use PGO with clang
>>         and when it is better not to do it?
>>
>>         Thanks!
>>
>>
>>         On 02/05/2018 09:38 AM, Xinliang David Li wrote:
>>>
>>>
>>>         On Sun, Feb 4, 2018 at 9:59 PM, Victor Leschuk
>>>         <vleschuk at accesssoftek.com
>>>         <mailto:vleschuk at accesssoftek.com>> wrote:
>>>
>>>             Hello David!
>>>
>>>             I have recently started acquaintance with PGO in
>>>             LLVM/clang and found
>>>             your e-mail thread:
>>>             http://lists.llvm.org/pipermail/llvm-dev/2016-May/099395.html
>>>             <http://lists.llvm.org/pipermail/llvm-dev/2016-May/099395.html>
>>>             . Here you
>>>             posted a nice list of optimizations that use profiling
>>>             and of those
>>>             which could be using but don't. However that thread is
>>>             about 2 years
>>>             old. Could you please kindly let me know if there were
>>>             any significant
>>>             changes in this area since that time?
>>>
>>>
>>>
>>>         Yes, there were quite some changes since then. Here are some
>>>         of the new features:
>>>
>>>         * LLVM IR based PGO -- this is designed to maximize program
>>>         performance. The option to turn it on is
>>>         -fprofile-generate/-fprofile-use
>>>         * value profiling support in PGO -- currently support
>>>         indirect call target profiling and memcpy/memset size
>>>         profiling and optimizations
>>>         * Profile data is made available for inliner to use (enabled
>>>         only for the new pass manager: -fexperimental-new-pass-manager)
>>>         * Profile aware LICM is available -- implemented via a
>>>         profile driven code sinking pass 
>>>         * Partial inlining is made profile aware;  Graham Yu also
>>>         added support for multiple region function outlining (with PGO)
>>>         * BB layout heuristics are tuned with PGO
>>>         * hotness driven function layout optimization 
>>>
>>>         There are pending work in the following area:
>>>         * profile aware loop vectorization, etc
>>>         * control heigh reduction optimization (Hiroshi is working
>>>         on this)
>>>
>>>         ThinLTO also works well with PGO.
>>>
>>>         Hope this helps.
>>>
>>>         David
>>>
>>>         >/What I can tell you is that there are many missing ones
>>>         (that can benefit /from profile): such as profile aware LICM (patch pending), speculative PRE,
>>>         loop unrolling, loop peeling, auto vectorization, inlining, function
>>>         splitting, function layout, function outlinling,  profile driven size
>>>         optimization, induction variable optimization/strength reduction, stringOp
>>>         specialization/optimization/inlining, switch peeling/lowering etc. The
>>>         biggest profile user today include ralloc, BB layout, ifcvt, shrinkwrapping
>>>         etc, but there should be rooms to be improvement there too.
>>>
>>>
>>>             Thanks in advance!
>>>
>>>             --
>>>             Best Regards,
>>>
>>>             Victor Leschuk | Software Engineer | Access Softek
>>>
>>>
>>
>>         -- 
>>         Best Regards,
>>
>>         Victor Leschuk | Software Engineer | Access Softek
>>
>>
>
>     -- 
>     Best Regards,
>
>     Victor Leschuk | Software Engineer | Access Softek
>
>

-- 
Best Regards,

Victor Leschuk | Software Engineer | Access Softek

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180208/6903e4fd/attachment.html>


More information about the llvm-dev mailing list