[LLVMdev] FW: Capabilities of Clang's PGO (e.g. improving code density)

Xinliang David Li xinliangli at gmail.com
Wed May 27 12:52:32 PDT 2015


On Wed, May 27, 2015 at 12:40 PM, Randy Chapman <randyc at microsoft.com>
wrote:

>
>
> Hi David!
>
>
>
> Thanks again for your help!  I was wondering if you could clarify one
> thing for me?
>
>  I find mention of “hot arc” optimization (-fprofile-arcs) , but I’m
> unclear if this is the same thing.  Does Clang PGO do block reordering?
>
> It does reordering, but does not do splitting/partitioning.
>
> I take this to mean that PGO does block reordering within the function?  I
> don’t see that the clang drive passes anything to the linker to drive
> function ordering at the linker level as well.  Is there something there
> that I missed, or are you aware of any readily available tools to do so?
> If not, we’ve done some work locally on enabling that which we will
> continue.
>
>
>

Ok. There are three reordering related optimizations:

1) intra-procedural Basic Block Reordering to reduce branch cost, icache
miss and front-end stalls.
2) function splitting/partitioning -- splitting really code part of a
function into unlikely.text sections
3) function reordering based on affinity and hotness -- reordering
functions by the linker/plugin (guided by the compiler annotations).

Clang currently only does 1).

Hope this clarifies.

thanks,

David




>  Thanks J
>
> --randy
>
>
>
> *From:* Xinliang David Li [mailto:xinliangli at gmail.com
> <xinliangli at gmail.com>]
> *Sent:* Wednesday, May 27, 2015 10:21 AM
>
> *To:* Lee Hunt
> *Cc:* llvmdev at cs.uiuc.edu
> *Subject:* Re: [LLVMdev] Capabilities of Clang's PGO (e.g. improving code
> density)
>
>
>
>
>
>
>
> On Wed, May 27, 2015 at 10:11 AM, Lee Hunt <leehu at exchange.microsoft.com>
> wrote:
>
>  Thanks! CIL [LeeHu] for a few comments…
>
>
>
>
>
> *From:* Xinliang David Li [mailto:xinliangli at gmail.com]
> *Sent:* Wednesday, May 27, 2015 9:29 AM
> *To:* Lee Hunt
> *Cc:* llvmdev at cs.uiuc.edu
> *Subject:* Re: [LLVMdev] Capabilities of Clang's PGO (e.g. improving code
> density)
>
>
>
>
>
> On Tue, May 26, 2015 at 8:47 PM, Lee Hunt <leehu at exchange.microsoft.com>
> wrote:
>
>  Hello –
>
>
>
> I’m an Engineer in Microsoft Office after looking into possible advantages
> of using PGO for our Android Applications.
>
>
>
> We at Microsoft have deep experience with Visual C++’s Profile Guided
> Optimization
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__msdn.microsoft.com_en-2Dus_library_e7k32f4k.aspx&d=AwMFAg&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=CDx6fJHiO_U5ya1dHZhv-O5nAU_botD-I7BAyxPZXZE&s=L5s90Jkxqk45FMvD7qA0Visu71cC_bqMyLK3h0RSZtU&e=>
> and often see 10% or more reduction in the size of application code loaded
> after using PGO for key scenarios (e.g. application launch).
>
>
>
> yes. This is true for the GCC too.  Clang's PGO does not shrink code size
> yet.
>
>
>
> [LeeHu] Note: I’m not talking about shrinking code size, but rather
> reordering it such that only ‘active’ branches within the profiled
> functions are grouped together in ‘hot’ code pages.  This is a very big
> optimization for us in VC++ toolchain in PGO.
>
> We also have the “/LTCG” flag – which is seemingly similar to the “-flto”
> Clang flag -- that **does** shrink code by various means (dead code
> removal, common IL tree collapsing) because it can see all the object code
> for an entire produced target binary (e.g. .exe or .dll).
>
> Does -flto also shrink code?
>
>
>
>
>
> That depends on other options used (e.g, -Os). With LTO, compiler  sees
> larger scope, performs cross module inlines and dead function eliminations.
> It does have more opportunities to shrink code.
>
>
>
>
>
>
>
>      Making application launch quickly is very important to us, and
> reducing the number of code pages loaded helps with this goal.
>
>
>
> Before we dig into turning it on, I’m wondering if there’s any
> pre-existing research / case studies about possible code page reduction
> seen from other Clang PGO-enabled applications?  It sounds like there is
> some possible instrumented run performance problems due to counter
> contention resulting in sluggish performance and perhaps skewed profile
> data: https://groups.google.com/forum/#!topic/llvm-dev/cDqYgnxNEhY
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__groups.google.com_forum_-23-21topic_llvm-2Ddev_cDqYgnxNEhY&d=AwMFAg&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=CDx6fJHiO_U5ya1dHZhv-O5nAU_botD-I7BAyxPZXZE&s=YaUiiOgIrmA6Io5p4aWzmppYDAKyp8ddTwozd_l-Wjg&e=>.
>
>
>
>
> Counter contention is one issue. Redundant counter updates is another
> major issue (due to the early instrumentation). We are working on the later
> and see great speed ups.
>
>
>
>
>
>  I’d like an overview of the optimizations that PGO does, but I don’t
> find much from looking at the Clang PGO section:
> http://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__clang.llvm.org_docs_UsersManual.html-23profile-2Dguided-2Doptimization&d=AwMFAg&c=8hUWFZcy2Z-Za5rBPlktOQ&r=Mfk2qtn1LTDThVkh6-oGglNfMADXfJdty4_bhmuhMHA&m=CDx6fJHiO_U5ya1dHZhv-O5nAU_botD-I7BAyxPZXZE&s=cKiMsZqz31mbPqwGaH_hX2B8sTtFSJ65A4_vbF-fkB4&e=>
> .
>
>
>
> Profile data is not used in any IPA passes yet. It is used by any post
> inline optimizations though -- including block layout, register allocator
> etc.
>
>
>
> [LeeHu]: sorry for naïve question, but what is IPA?
>
>
>
>
>
> Inter-procedural analysis/optimizations.
>
>
>
>
>
>    And what post-inline optimizations are currently being done?   We’re
> currently using Clang 3.5 if that matters.
>
>
>
>
>
> For example, from reading different pages on how Clang PGO, it’s unclear
> if it does “block reordering” (i.e. moving unexecuted code blocks to a
> distant code page, leaving only ‘hot’ executed code packed together for
> greater code density).
>
>
>
> LLVM's block placement uses branch probability and frequency data, but
> there is no function splitting optimization yet.
>
>
>
>   I find mention of “hot arc” optimization (-fprofile-arcs) , but I’m
> unclear if this is the same thing.  Does Clang PGO do block reordering?
>
>
>
> It does reordering, but does not do splitting/partitioning.
>
>
>
> David
>
>
>
>
>
>
>
> Thanks,
>
> --Lee
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150527/ddeb58c6/attachment.html>


More information about the llvm-dev mailing list