[llvm-dev] Current PGO status
Victor Leschuk via llvm-dev
llvm-dev at lists.llvm.org
Thu Feb 8 09:42:35 PST 2018
https://bugs.llvm.org/show_bug.cgi?id=36303 Please let me know if I can
help somehow.
On 02/08/2018 01:22 AM, Xinliang David Li wrote:
> Victor, please file a bug tracking the issue. We can put relevant
> information there including test cases used in the experiment etc.
>
> thanks,
>
> David
>
> On Wed, Feb 7, 2018 at 2:15 PM, Victor Leschuk
> <vleschuk at accesssoftek.com <mailto:vleschuk at accesssoftek.com>> wrote:
>
> David, could you please clarify on which code did you gain 10%
> improvement? I have run numerous tests with and w/o this option
> and it looks like it has no effect on performance (I am talking of
> the old 2016 sample to be concrete). Maybe we could investigate it
> together? Just tell me where to start?
>
>
> On 02/07/2018 02:11 AM, Xinliang David Li wrote:
>> Victor, thanks for the experiment.
>>
>> My suspicion is it is due to the remaining issues with block
>> layout -- especially with loop rotation (with PGO). Another
>> problem is that tail dup is not happening after loop rotation
>> which can limit the effectiveness of loop rotation.
>>
>> I tried the internal option -mllvm -force-precise-rotation-cost
>> and there is about 10% speedup with -fprofile-use. This option
>> turns on more precise cost model when computing rotation strategy
>> but it is not turned on by default.
>>
>> +carrot who is working on this area.
>>
>> thanks,
>>
>> David
>>
>> On Tue, Feb 6, 2018 at 1:37 PM, Victor Leschuk
>> <vleschuk at accesssoftek.com <mailto:vleschuk at accesssoftek.com>> wrote:
>>
>> Hello David, thanks for detailed response!
>>
>> Do you have any tests that you use to measure the PGO
>> effectiveness? I have tested clang version 6.0 with the same
>> sample that Jie Chen used in 2016 and actually both
>> frontend-based PGO and IR-based make code run slower, see the
>> average time:
>>
>> clang++ -O3: 3.15 sec
>>
>> clang++ -O3 and -fprofile-instr-use: 3.160 sec
>>
>> clang++ -O3 and -fprofile-use: 3.180 sec
>>
>> g++ (7.3.0) -O3: 3.640 sec
>>
>> g++ (7.3.0) -O3 and -fprofile-use: 2.92 sec
>>
>> Do you have any idea what can be wrong? Maybe there are some
>> recommendations in which cases one should use PGO with clang
>> and when it is better not to do it?
>>
>> Thanks!
>>
>>
>> On 02/05/2018 09:38 AM, Xinliang David Li wrote:
>>>
>>>
>>> On Sun, Feb 4, 2018 at 9:59 PM, Victor Leschuk
>>> <vleschuk at accesssoftek.com
>>> <mailto:vleschuk at accesssoftek.com>> wrote:
>>>
>>> Hello David!
>>>
>>> I have recently started acquaintance with PGO in
>>> LLVM/clang and found
>>> your e-mail thread:
>>> http://lists.llvm.org/pipermail/llvm-dev/2016-May/099395.html
>>> <http://lists.llvm.org/pipermail/llvm-dev/2016-May/099395.html>
>>> . Here you
>>> posted a nice list of optimizations that use profiling
>>> and of those
>>> which could be using but don't. However that thread is
>>> about 2 years
>>> old. Could you please kindly let me know if there were
>>> any significant
>>> changes in this area since that time?
>>>
>>>
>>>
>>> Yes, there were quite some changes since then. Here are some
>>> of the new features:
>>>
>>> * LLVM IR based PGO -- this is designed to maximize program
>>> performance. The option to turn it on is
>>> -fprofile-generate/-fprofile-use
>>> * value profiling support in PGO -- currently support
>>> indirect call target profiling and memcpy/memset size
>>> profiling and optimizations
>>> * Profile data is made available for inliner to use (enabled
>>> only for the new pass manager: -fexperimental-new-pass-manager)
>>> * Profile aware LICM is available -- implemented via a
>>> profile driven code sinking pass
>>> * Partial inlining is made profile aware; Graham Yu also
>>> added support for multiple region function outlining (with PGO)
>>> * BB layout heuristics are tuned with PGO
>>> * hotness driven function layout optimization
>>>
>>> There are pending work in the following area:
>>> * profile aware loop vectorization, etc
>>> * control heigh reduction optimization (Hiroshi is working
>>> on this)
>>>
>>> ThinLTO also works well with PGO.
>>>
>>> Hope this helps.
>>>
>>> David
>>>
>>> >/What I can tell you is that there are many missing ones
>>> (that can benefit /from profile): such as profile aware LICM (patch pending), speculative PRE,
>>> loop unrolling, loop peeling, auto vectorization, inlining, function
>>> splitting, function layout, function outlinling, profile driven size
>>> optimization, induction variable optimization/strength reduction, stringOp
>>> specialization/optimization/inlining, switch peeling/lowering etc. The
>>> biggest profile user today include ralloc, BB layout, ifcvt, shrinkwrapping
>>> etc, but there should be rooms to be improvement there too.
>>>
>>>
>>> Thanks in advance!
>>>
>>> --
>>> Best Regards,
>>>
>>> Victor Leschuk | Software Engineer | Access Softek
>>>
>>>
>>
>> --
>> Best Regards,
>>
>> Victor Leschuk | Software Engineer | Access Softek
>>
>>
>
> --
> Best Regards,
>
> Victor Leschuk | Software Engineer | Access Softek
>
>
--
Best Regards,
Victor Leschuk | Software Engineer | Access Softek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180208/6903e4fd/attachment.html>
More information about the llvm-dev
mailing list