[cfe-dev] [llvm-dev] Your help needed: List of LLVM Open Projects 2017

Mon Jan 16 19:41:08 PST 2017

Would it make sense for xray instrumentation be part of -fprofile-generate?
PGO will affect inlining decisions etc for the optimized binary, but the
collected traces during the instrumented build would still have quite a bit
of useful information.

-- Sean Silva

On Mon, Jan 16, 2017 at 4:33 PM, Xinliang David Li <xinliangli at gmail.com>
wrote:

> Google GCC records profile data (dynamic callgraph) in a special named
> section in ELF object file to be consumed by the plugin. Those sections
> will be discarded later by the linker.
>
> There are pros and cons of using xray for layout purpose.  The call trace
> from xray is certainly more powerful for layout purpose, but it adds
> addtional complexity to the optimized build process.  You would need to
> collect xray trace profile on the optimized binary (presumably built with
> PGO already) and rebuild without xray nop insertion and function layout.
>
> David
>
> On Mon, Jan 16, 2017 at 3:40 PM, Sean Silva via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>>
>>
>> On Mon, Jan 16, 2017 at 3:34 PM, Sean Silva <chisophugis at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Mon, Jan 16, 2017 at 3:32 PM, Sean Silva <chisophugis at gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Mon, Jan 16, 2017 at 2:31 PM, Davide Italiano <davide at freebsd.org>
>>>> wrote:
>>>>
>>>>> On Mon, Jan 16, 2017 at 2:07 PM, Mehdi Amini <mehdi.amini at apple.com>
>>>>> wrote:
>>>>> >
>>>>> > On Jan 16, 2017, at 1:47 PM, Sean Silva <chisophugis at gmail.com>
>>>>> wrote:
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Mon, Jan 16, 2017 at 1:25 PM, Davide Italiano <davide at freebsd.org>
>>>>> wrote:
>>>>> >>
>>>>> >> On Mon, Jan 16, 2017 at 12:31 PM, Sean Silva via llvm-dev
>>>>> >> <llvm-dev at lists.llvm.org> wrote:
>>>>> >> > Do we have any open projects on LLD?
>>>>> >> >
>>>>> >> > I know we usually try to avoid any big "projects" and mainly
>>>>> add/fix
>>>>> >> > things
>>>>> >> > in response to user needs, but just wondering if somebody has any
>>>>> ideas.
>>>>> >> >
>>>>> >>
>>>>> >> I'm not particularly active in lld anymore, but the last big item
>>>>> I'd
>>>>> >> like to see implemented is Pettis-Hansen layout.
>>>>> >> http://perso.ensta-paristech.fr/~bmonsuez/Cours/B6-4/Article
>>>>> s/papers15.pdf
>>>>> >> (mainly because it improves performances of the final executable).
>>>>> >> GCC/gold have an implementation of the algorithm that can be used as
>>>>> >> base. I'll expand if anybody is interested.
>>>>> >> Side note: I'd like to propose a couple of llvm projects as well,
>>>>> I'll
>>>>> >> sit down later today and write them.
>>>>> >
>>>>> >
>>>>> >
>>>>> > I’m not sure, can you confirm that such layout optimization on ELF
>>>>> requires
>>>>> > -ffunction-sections?
>>>>> >
>>>>>
>>>>> For the non-LTO case, I think so.
>>>>>
>>>>> > Also, for clang on OSX the best layout we could get is to order
>>>>> functions in
>>>>> > the order in which they get executed at runtime.
>>>>> >
>>>>>
>>>>> That's what we already do for lld. We collect and order file (run a
>>>>> profiler) and pass that to the linker that lays out functions
>>>>> accordingly.
>>>>> This is to improve startup time for a class of startup-time-sensitive
>>>>> operations. The algorithm proposed by Pettis (allegedly) aims to
>>>>> reduce the TLB misses as it tries to lay out hot functions (or
>>>>> functions that are likely to  be called together near in the final
>>>>> binary).
>>>>>
>>>>
>>>> IIRC from when I looked at the paper a while ago, it is mostly just a
>>>> "huffman tree construction" type algorithm (agglomerating based on highest
>>>> probability) and assumes that if two functions are hot then they are likely
>>>> to be needed together. This is not always the case.
>>>>
>>>> E.g. consider a server that accepts RPC requests and based on those
>>>> requests either does Foo or Bar which are largely disjoint. It's entirely
>>>> possible for the top two functions of the profile to be one in Foo and one
>>>> in Bar, but laying them out near each other doesn't make sense since there
>>>> is never locality (for a given RPC, either Foo or Bar gets run). A static
>>>> call graph analysis can provide the needed signals to handle this case
>>>> better.
>>>>
>>>>
>>> Hence you said "allegedly" :) I know we've talked about this before.
>>> Just wanted to put the backstory of the "allegedly" on the list.
>>>
>>
>> Looks like I remembered this wrong. The algorithm in section 3.2 of the
>> paper is call-graph aware. It does do greedy coalescing like a Huffman tree
>> construction algorithms, but constrains the available coalescing operations
>> at each step by call graph adjacency (in fact, what it is "greedy" about is
>> the hotness of the edges between call graph nodes and not the nodes
>> themselves).
>>
>> -- Sean Silva
>>
>>
>>>
>>> -- Sean Silva
>>>
>>>
>>>> -- Sean Silva
>>>>
>>>>
>>>>>
>>>>> >
>>>>> > For FullLTO it is conceptually pretty easy to get profile data we
>>>>> need for
>>>>> > this, but I'm not sure about the ThinLTO case.
>>>>> >
>>>>> > Teresa, Mehdi,
>>>>> >
>>>>> > Are there any plans (or things already working!) for getting profile
>>>>> data
>>>>> > from ThinLTO in a format that the linker can use for code layout? I
>>>>> assume
>>>>> > that profile data is being used already to guide importing, so it
>>>>> may just
>>>>> > be a matter of siphoning that off.
>>>>> >
>>>>> >
>>>>> > I’m not sure what kind of “profile information” is needed, and what
>>>>> makes it
>>>>> > easier for MonolithicLTO compared to ThinLTO?
>>>>> >
>>>>> > Or maybe that layout code should be inside LLVM; maybe part of the
>>>>> general
>>>>> > LTO interface? It looks like the current gcc plugin calls back into
>>>>> gcc for
>>>>> > the actual layout algorithm itself (function call
>>>>> > find_pettis_hansen_function_layout) rather than the reordering
>>>>> logic living
>>>>> > in the linker:
>>>>> > https://android.googlesource.com/toolchain/gcc/+/3f73d6ef904
>>>>> 58b45bbbb33ef4c2b174d4662a22d/gcc-4.6/function_reordering_pl
>>>>> ugin/function_reordering_plugin.c
>>>>> >
>>>>> >
>>>>> > I was thinking about this: could this be done by reorganizing the
>>>>> module
>>>>> > itself for LTO?
>>>>> >
>>>>> > That wouldn’t help non-LTO and ThinLTO though.
>>>>>
>>>>> This is a dimension that I think can be explored. The fact that it
>>>>> wouldn't help with other modes of operation is completely orthogonal,
>>>>> in particular until it's proven that this kind of optimization makes
>>>>> sense with ThinLTO (and if it doesn't, it can be an optimization ran
>>>>> only during full LTO).
>>>>>
>>>>> --
>>>>> Davide
>>>>>
>>>>> "There are no solved problems; there are only problems that are more
>>>>> or less solved" -- Henri Poincare
>>>>>
>>>>
>>>>
>>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170116/8c57699e/attachment.html>