[cfe-dev] [llvm-dev] Your help needed: List of LLVM Open Projects 2017

Xinliang David Li via cfe-dev cfe-dev at lists.llvm.org
Mon Jan 16 21:04:50 PST 2017


right -- which is why I said it is complicated :)

David

On Mon, Jan 16, 2017 at 9:02 PM, Mehdi Amini <mehdi.amini at apple.com> wrote:

> That’s a very good point, the X-ray data that would be useful for the
> linker should be collected from a "profile-use” build to be accurate.
>
> IIUC, that’d look like:   build with profile-gen -> run benchmarks ->
> build with profile-use + Xray -> run benchmarks -> build with profil-use
> and feed linker with X-ray data for layout optimization.
>
>> Mehdi
>
> On Jan 16, 2017, at 8:25 PM, Xinliang David Li via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Xray data are collected post-inlining, but the problem is that inlining
> decisions are different between profile-gen and profile-use compilations.
> This makes it hard to match call trace data from prof-gen run with
> post-inline callgraph of the profile-use build.
>
> David
>
> On Mon, Jan 16, 2017 at 7:41 PM, Sean Silva <chisophugis at gmail.com> wrote:
>
>> Would it make sense for xray instrumentation be part of
>> -fprofile-generate? PGO will affect inlining decisions etc for the
>> optimized binary, but the collected traces during the instrumented build
>> would still have quite a bit of useful information.
>>
>> -- Sean Silva
>>
>>
>> On Mon, Jan 16, 2017 at 4:33 PM, Xinliang David Li <xinliangli at gmail.com>
>>  wrote:
>>
>>> Google GCC records profile data (dynamic callgraph) in a special named
>>> section in ELF object file to be consumed by the plugin. Those sections
>>> will be discarded later by the linker.
>>>
>>> There are pros and cons of using xray for layout purpose.  The call
>>> trace from xray is certainly more powerful for layout purpose, but it adds
>>> addtional complexity to the optimized build process.  You would need to
>>> collect xray trace profile on the optimized binary (presumably built with
>>> PGO already) and rebuild without xray nop insertion and function layout.
>>>
>>> David
>>>
>>> On Mon, Jan 16, 2017 at 3:40 PM, Sean Silva via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>>
>>>>
>>>> On Mon, Jan 16, 2017 at 3:34 PM, Sean Silva <chisophugis at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Mon, Jan 16, 2017 at 3:32 PM, Sean Silva <chisophugis at gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Jan 16, 2017 at 2:31 PM, Davide Italiano <davide at freebsd.org>
>>>>>>  wrote:
>>>>>>
>>>>>>> On Mon, Jan 16, 2017 at 2:07 PM, Mehdi Amini <mehdi.amini at apple.com>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > On Jan 16, 2017, at 1:47 PM, Sean Silva <chisophugis at gmail.com>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > On Mon, Jan 16, 2017 at 1:25 PM, Davide Italiano <
>>>>>>> davide at freebsd.org> wrote:
>>>>>>> >>
>>>>>>> >> On Mon, Jan 16, 2017 at 12:31 PM, Sean Silva via llvm-dev
>>>>>>> >> <llvm-dev at lists.llvm.org> wrote:
>>>>>>> >> > Do we have any open projects on LLD?
>>>>>>> >> >
>>>>>>> >> > I know we usually try to avoid any big "projects" and mainly
>>>>>>> add/fix
>>>>>>> >> > things
>>>>>>> >> > in response to user needs, but just wondering if somebody has
>>>>>>> any ideas.
>>>>>>> >> >
>>>>>>> >>
>>>>>>> >> I'm not particularly active in lld anymore, but the last big item
>>>>>>> I'd
>>>>>>> >> like to see implemented is Pettis-Hansen layout.
>>>>>>> >> http://perso.ensta-paristech.fr/~bmonsuez/Cours/
>>>>>>> B6-4/Articles/papers15.pdf
>>>>>>> >> (mainly because it improves performances of the final executable).
>>>>>>> >> GCC/gold have an implementation of the algorithm that can be used
>>>>>>> as
>>>>>>> >> base. I'll expand if anybody is interested.
>>>>>>> >> Side note: I'd like to propose a couple of llvm projects as well,
>>>>>>> I'll
>>>>>>> >> sit down later today and write them.
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > I’m not sure, can you confirm that such layout optimization on ELF
>>>>>>> requires
>>>>>>> > -ffunction-sections?
>>>>>>> >
>>>>>>>
>>>>>>> For the non-LTO case, I think so.
>>>>>>>
>>>>>>> > Also, for clang on OSX the best layout we could get is to order
>>>>>>> functions in
>>>>>>> > the order in which they get executed at runtime.
>>>>>>> >
>>>>>>>
>>>>>>> That's what we already do for lld. We collect and order file (run a
>>>>>>> profiler) and pass that to the linker that lays out functions
>>>>>>> accordingly.
>>>>>>> This is to improve startup time for a class of startup-time-sensitive
>>>>>>> operations. The algorithm proposed by Pettis (allegedly) aims to
>>>>>>> reduce the TLB misses as it tries to lay out hot functions (or
>>>>>>> functions that are likely to  be called together near in the final
>>>>>>> binary).
>>>>>>>
>>>>>>
>>>>>> IIRC from when I looked at the paper a while ago, it is mostly just a
>>>>>> "huffman tree construction" type algorithm (agglomerating based on highest
>>>>>> probability) and assumes that if two functions are hot then they are likely
>>>>>> to be needed together. This is not always the case.
>>>>>>
>>>>>> E.g. consider a server that accepts RPC requests and based on those
>>>>>> requests either does Foo or Bar which are largely disjoint. It's entirely
>>>>>> possible for the top two functions of the profile to be one in Foo and one
>>>>>> in Bar, but laying them out near each other doesn't make sense since there
>>>>>> is never locality (for a given RPC, either Foo or Bar gets run). A static
>>>>>> call graph analysis can provide the needed signals to handle this case
>>>>>> better.
>>>>>>
>>>>>>
>>>>> Hence you said "allegedly" :) I know we've talked about this before.
>>>>> Just wanted to put the backstory of the "allegedly" on the list.
>>>>>
>>>>
>>>> Looks like I remembered this wrong. The algorithm in section 3.2 of the
>>>> paper is call-graph aware. It does do greedy coalescing like a Huffman tree
>>>> construction algorithms, but constrains the available coalescing operations
>>>> at each step by call graph adjacency (in fact, what it is "greedy" about is
>>>> the hotness of the edges between call graph nodes and not the nodes
>>>> themselves).
>>>>
>>>> -- Sean Silva
>>>>
>>>>
>>>>>
>>>>> -- Sean Silva
>>>>>
>>>>>
>>>>>> -- Sean Silva
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> >
>>>>>>> > For FullLTO it is conceptually pretty easy to get profile data we
>>>>>>> need for
>>>>>>> > this, but I'm not sure about the ThinLTO case.
>>>>>>> >
>>>>>>> > Teresa, Mehdi,
>>>>>>> >
>>>>>>> > Are there any plans (or things already working!) for getting
>>>>>>> profile data
>>>>>>> > from ThinLTO in a format that the linker can use for code layout?
>>>>>>> I assume
>>>>>>> > that profile data is being used already to guide importing, so it
>>>>>>> may just
>>>>>>> > be a matter of siphoning that off.
>>>>>>> >
>>>>>>> >
>>>>>>> > I’m not sure what kind of “profile information” is needed, and
>>>>>>> what makes it
>>>>>>> > easier for MonolithicLTO compared to ThinLTO?
>>>>>>> >
>>>>>>> > Or maybe that layout code should be inside LLVM; maybe part of the
>>>>>>> general
>>>>>>> > LTO interface? It looks like the current gcc plugin calls back
>>>>>>> into gcc for
>>>>>>> > the actual layout algorithm itself (function call
>>>>>>> > find_pettis_hansen_function_layout) rather than the reordering
>>>>>>> logic living
>>>>>>> > in the linker:
>>>>>>> > https://android.googlesource.com/toolchain/
>>>>>>> gcc/+/3f73d6ef90458b45bbbb33ef4c2b174d4662a22d/gcc-4.6/
>>>>>>> function_reordering_plugin/function_reordering_plugin.c
>>>>>>> >
>>>>>>> >
>>>>>>> > I was thinking about this: could this be done by reorganizing the
>>>>>>> module
>>>>>>> > itself for LTO?
>>>>>>> >
>>>>>>> > That wouldn’t help non-LTO and ThinLTO though.
>>>>>>>
>>>>>>> This is a dimension that I think can be explored. The fact that it
>>>>>>> wouldn't help with other modes of operation is completely orthogonal,
>>>>>>> in particular until it's proven that this kind of optimization makes
>>>>>>> sense with ThinLTO (and if it doesn't, it can be an optimization ran
>>>>>>> only during full LTO).
>>>>>>>
>>>>>>> --
>>>>>>> Davide
>>>>>>>
>>>>>>> "There are no solved problems; there are only problems that are more
>>>>>>> or less solved" -- Henri Poincare
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>>
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170116/db0d9230/attachment.html>


More information about the cfe-dev mailing list