[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

Mon Jul 17 06:12:55 PDT 2017

btw, the YAML format is specified in this patch:
https://reviews.llvm.org/D34063 (you can see it in the test case for the
patch).

On Mon, Jul 17, 2017 at 9:11 AM, Charles Saternos <
charles.saternos at gmail.com> wrote:

> Hey @chandlerc and @dblaikie,
>
> Any updates on this in relation to "[PATCH] D34080: [ThinLTO] Add
> dump-summary command to llvm-lto2 tool"?
>
> Thanks,
> Charles
>
> On Fri, Jun 9, 2017 at 3:58 PM, Charles Saternos <
> charles.saternos at gmail.com> wrote:
>
>> OK, I tested the hotness, and it works.
>>
>> > I'd ask Charles to take it over. I think it just needs a test case and
>> an update to the usage message.
>>
>> Sure - I've added the message and a quick test. The patch is here:
>> https://reviews.llvm.org/D34063
>>
>> Thanks,
>> Charles
>>
>> On Thu, Jun 8, 2017 at 8:01 PM, Peter Collingbourne <peter at pcc.me.uk>
>> wrote:
>>
>>> I'd ask Charles to take it over. I think it just needs a test case and
>>> an update to the usage message.
>>>
>>> Peter
>>>
>>> On Thu, Jun 8, 2017 at 4:55 PM, Teresa Johnson via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> Great! For the hotness, try creating a small test case with a very hot
>>>> loop that iterates many times. Let me know if you are still having trouble.
>>>> While the llvm-dis serialization is being discussed, I suppose at the very
>>>> least this can go in with the rest of the existing YAML summary dumping and
>>>> get emitted from llvm-lto2 using the patch Peter attached. Peter - do you
>>>> want to add that to llvm-lto2 so that we have a way of dumping the existing
>>>> YAML supported summary info to stdout, or would you rather have Charles
>>>> take that one over and submit it (probably just needs a test case).
>>>>
>>>> Teresa
>>>>
>>>> On Thu, Jun 8, 2017 at 4:16 PM, Charles Saternos <
>>>> charles.saternos at gmail.com> wrote:
>>>>
>>>>> Hey Teresa,
>>>>>
>>>>> I've updated the YAML to include the names and GUIDs for all
>>>>> functions/global vars/aliases. I've also added the hotness info to the
>>>>> output, but for some reason, none of my tests when running with FDO gave
>>>>> anything besides Unknown. I'll be looking more into this tomorrow.
>>>>>
>>>>> Here's the current format:
>>>>>
>>>>> > ../build/bin/llvm-lto2 dump-summary b.o
>>>>> ---
>>>>> NamedGlobalValueMap:
>>>>>   :
>>>>>     - GUID:            3762489268811518743
>>>>>       Kind:            GlobalVar
>>>>>       Linkage:         PrivateLinkage
>>>>>       NotEligibleToImport: true
>>>>>       Live:            false
>>>>>   cold:
>>>>>     - GUID:            11668175513417606517
>>>>>       Kind:            Function
>>>>>       Linkage:         ExternalLinkage
>>>>>       NotEligibleToImport: true
>>>>>       Live:            false
>>>>>       InstCount:       5
>>>>>       Calls:
>>>>>         - Name:            puts
>>>>>           GUID:            8979701042202144121
>>>>>           Hotness:         Unknown
>>>>>   fib:
>>>>>     - GUID:            8667248078361406812
>>>>>       Kind:            Function
>>>>>       Linkage:         ExternalLinkage
>>>>>       NotEligibleToImport: true
>>>>>       Live:            false
>>>>>       InstCount:       26
>>>>>       Calls:
>>>>>         - Name:            fib
>>>>>           GUID:            8667248078361406812
>>>>>           Hotness:         Unknown
>>>>>   hot:
>>>>>     - GUID:            10177652421713147431
>>>>>       Kind:            Function
>>>>>       Linkage:         ExternalLinkage
>>>>>       NotEligibleToImport: true
>>>>>       Live:            false
>>>>>       InstCount:       14
>>>>>       Calls:
>>>>>         - Name:            fib
>>>>>           GUID:            8667248078361406812
>>>>>           Hotness:         Unknown
>>>>>         - Name:            printf
>>>>>           GUID:            7383291119112528047
>>>>>           Hotness:         Unknown
>>>>>   llvm.used:
>>>>>     - GUID:            15665353970260777610
>>>>>       Kind:            GlobalVar
>>>>>       Linkage:         AppendingLinkage
>>>>>       NotEligibleToImport: true
>>>>>       Live:            true
>>>>> TypeIdMap:
>>>>> WithGlobalValueDeadStripping: false
>>>>> ...
>>>>>
>>>>> Thanks,
>>>>> Charles
>>>>>
>>>>>
>>>>> On Wed, Jun 7, 2017 at 12:38 PM, Teresa Johnson <tejohnson at google.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 7, 2017 at 8:58 AM, Charles Saternos <
>>>>>> charles.saternos at gmail.com> wrote:
>>>>>>
>>>>>>> Alright, now it outputs YAML in the following format:
>>>>>>>
>>>>>>> ---
>>>>>>> NamedGlobalValueMap:
>>>>>>>   X:
>>>>>>>     - Kind:            GlobalVar
>>>>>>>       Linkage:         ExternalLinkage
>>>>>>>       NotEligibleToImport: false
>>>>>>>       Live:            false
>>>>>>>   a:
>>>>>>>     - Kind:            Alias
>>>>>>>       Linkage:         WeakAnyLinkage
>>>>>>>       NotEligibleToImport: false
>>>>>>>       Live:            false
>>>>>>>       AliaseeGUID:     1881667236089500162
>>>>>>>   afun:
>>>>>>>     - Kind:            Function
>>>>>>>       Linkage:         ExternalLinkage
>>>>>>>       NotEligibleToImport: false
>>>>>>>       Live:            false
>>>>>>>       InstCount:       2
>>>>>>>   testtest:
>>>>>>>     - Kind:            Function
>>>>>>>       Linkage:         ExternalLinkage
>>>>>>>       NotEligibleToImport: false
>>>>>>>       Live:            false
>>>>>>>       InstCount:       2
>>>>>>>       Calls:
>>>>>>>         - Function:        14471680721094503013
>>>>>>> TypeIdMap:
>>>>>>> WithGlobalValueDeadStripping: false
>>>>>>> ...
>>>>>>>
>>>>>>> Any thoughts on the new format?
>>>>>>>
>>>>>>
>>>>>> Thanks, Charles. The main improvement I think we would want is to
>>>>>> output value names instead of the GUID. Can you build up a map from GUID ->
>>>>>> name ahead of time and use those like you were for your initial patch?
>>>>>> Actually, I also think it would be useful to emit both the GUID and the
>>>>>> name, since the combined index will eventually only have the GUID, so this
>>>>>> would give a mapping to use for at least the visual inspection of the
>>>>>> combined index.
>>>>>>
>>>>>> Also, would be good to see an example with FDO, to make sure the
>>>>>> hotness info of the calls is emitted.
>>>>>>
>>>>>> Teresa
>>>>>>
>>>>>>
>>>>>>> Thanks,
>>>>>>> Charles
>>>>>>>
>>>>>>> On Tue, Jun 6, 2017 at 5:21 PM, Mehdi AMINI <joker.eph at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2017-06-06 13:38 GMT-07:00 David Blaikie <dblaikie at gmail.com>:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jun 6, 2017 at 1:26 PM Mehdi AMINI <joker.eph at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> 2017-06-05 14:27 GMT-07:00 David Blaikie via llvm-dev <
>>>>>>>>>> llvm-dev at lists.llvm.org>:
>>>>>>>>>>
>>>>>>>>>>> I know there's been a bunch of discussion here already, but I
>>>>>>>>>>> was wondering if perhaps someone (probably Teresa? Peter?) could:
>>>>>>>>>>>
>>>>>>>>>>> 1) summarize the current state
>>>>>>>>>>> 2) describe the end-goal
>>>>>>>>>>> 3) describe what steps (& how this patch relates) are planned to
>>>>>>>>>>> get to (2)
>>>>>>>>>>>
>>>>>>>>>>> My naive thoughts, not being intimately familiar with any of
>>>>>>>>>>> this: Usually bitcode and textual IR support go in together or around the
>>>>>>>>>>> same time, and designed that way from the start (take r211920 for examaple,
>>>>>>>>>>> which added an explicit representation of COMDATs to the IR). This seems to
>>>>>>>>>>> have been an oversight in the implementation of IR summaries (is that an
>>>>>>>>>>> accurate representation/statement?)
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> More or less: it was not an oversight.
>>>>>>>>>> The summaries are not really part of the IR, it is more like an
>>>>>>>>>> "analysis result" that is serialized. It can always be recomputed from the
>>>>>>>>>> IR. This aspect makes it quite "special", it is the only analysis result
>>>>>>>>>> that I know of that we serialize.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The use list work seems pretty similar in some ways (granted,
>>>>>>>>> can't be recomputed to match, hence the desire to serialize it for test
>>>>>>>>> case implementation).
>>>>>>>>>
>>>>>>>>
>>>>>>>> I see use-list as a leaky implementation detail of the IR that we
>>>>>>>> serialized because it impact the processing of the IR.
>>>>>>>>
>>>>>>>> Summaries are more like serializing the CFG for example.
>>>>>>>>
>>>>>>>>
>>>>>>>>> But it looks like the same is true here to a degree - there are
>>>>>>>>> test cases that exercise the summary handling, so they want summaries for
>>>>>>>>> input (for now, I think, I've seen test cases that run another LLVM tool to
>>>>>>>>> insert/create a summary to then feed that back in for a test), or to test
>>>>>>>>> that the resulting summary is correct.
>>>>>>>>>
>>>>>>>>
>>>>>>>> We have cases were we want summaries as an input and check a
>>>>>>>> combined summary as an output, and for these having the YAML representation
>>>>>>>> will be useful (we didn't have it before).
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Can summaries be standalone? I thought they could (that'd be ideal
>>>>>>>>> for the distributed situation - only the summary needs to go to the 'thin
>>>>>>>>> link' step, I think? (currently maybe only the debug info is stripped for
>>>>>>>>> that - but ideally other unused IR wouldn't be shipped there as well, I
>>>>>>>>> would think)
>>>>>>>>>
>>>>>>>>
>>>>>>>> Yes conceptually they can be standalone.
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> & now there's an effort to correct that.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The main motivation here, I believe, is more to help dev to have
>>>>>>>>>> human readable/understandable dump for ThinLTO bitcodes. Having to inspect
>>>>>>>>>> separately summaries is a pain.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Not sure I quite follow - inspect separately?
>>>>>>>>>
>>>>>>>>
>>>>>>>> llvm-dis does not display summaries today, so you can't just use
>>>>>>>> llvm-dis like a "regular" flow.
>>>>>>>>
>>>>>>>>
>>>>>>>>> How are they inspected today?
>>>>>>>>>
>>>>>>>>
>>>>>>>> llvm-bcanalyzer? And now the YAML dump as well.
>>>>>>>>
>>>>>>>>
>>>>>>>>> & also, I think there are test cases that want to/are currently
>>>>>>>>> testing summary input but do so somewhat awkwardly by using another tool to
>>>>>>>>> produce the summary first. Ideally the test case would have the summary
>>>>>>>>> written in to start, I would think, if that's a codepath worth testing?
>>>>>>>>>
>>>>>>>>
>>>>>>>> The IR already contains all the information, so why repeating it?
>>>>>>>> This makes the test case harder to maintain, in the vast majority, I expect
>>>>>>>> that if a test needs IR then it shouldn't need to include a summary as well
>>>>>>>> (and vice-versa).
>>>>>>>>
>>>>>>>> In the majority of test we have we want to check if the importing
>>>>>>>> does what it is supposed to do, and if the linkage are correctly adjusted.
>>>>>>>> With a YAML (or other) serialization for the summaries this could indeed
>>>>>>>> been done purely with summaries, without any IR involved.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Mehdi
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> - Dave
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>  --
>>>>>>>>>> Mehdi
>>>>>>>>>>
>>>>>>>>>> So it seems like that would start with a discussion of what the
>>>>>>>>>>> right end-state would be: What the syntax in textual IR should be, then
>>>>>>>>>>> implementing it. I can understand implementing such a thing in steps - it's
>>>>>>>>>>> perhaps more involved than the COMDAT situation. In that case starting on
>>>>>>>>>>> either side seems fine - implementing the emission first (hidden behind a
>>>>>>>>>>> flag, so as not to break round-tripping in the interim) or the parsing
>>>>>>>>>>> first (no need to hide it behind any flags - manually written examples can
>>>>>>>>>>> be used as input tests).
>>>>>>>>>>>
>>>>>>>>>>> (& it sounds like there's some partially implemented
>>>>>>>>>>> functionality using a YAML format that was intended to address how some
>>>>>>>>>>> test cases could be written? & this might be a good basis for the syntax -
>>>>>>>>>>> but seems to me like it might be a bit disjointed/out of place in the
>>>>>>>>>>> textual IR format that's not otherwise YAML-based?)
>>>>>>>>>>>
>>>>>>>>>>> - Dave
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jun 2, 2017 at 8:46 AM Charles Saternos via llvm-dev <
>>>>>>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hey all,
>>>>>>>>>>>>
>>>>>>>>>>>> Below is the proposed format for the dump of the ThinLTO module
>>>>>>>>>>>> summary in the llvm-dis utility:
>>>>>>>>>>>>
>>>>>>>>>>>> > ../build/bin/llvm-dis t.o && cat t.o.ll
>>>>>>>>>>>> ; ModuleID = '2.o'
>>>>>>>>>>>> source_filename = "2.ll"
>>>>>>>>>>>> target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
>>>>>>>>>>>> target triple = "x86_64-unknown-linux-gnu"
>>>>>>>>>>>>
>>>>>>>>>>>> @X = constant i32 42, section "foo", align 4
>>>>>>>>>>>>
>>>>>>>>>>>> @a = weak alias i32, i32* @X
>>>>>>>>>>>>
>>>>>>>>>>>> define void @afun() {
>>>>>>>>>>>>   %1 = load i32, i32* @a
>>>>>>>>>>>>   ret void
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> define void @testtest() {
>>>>>>>>>>>>   tail call void @boop()
>>>>>>>>>>>>   ret void
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> declare void @boop()
>>>>>>>>>>>>
>>>>>>>>>>>> ; Module summary:
>>>>>>>>>>>> ;  testtest (External linkage)
>>>>>>>>>>>> ;    Function (2 instructions)
>>>>>>>>>>>> ;    Calls: boop
>>>>>>>>>>>> ;  X (External linkage)
>>>>>>>>>>>> ;    Global Variable
>>>>>>>>>>>> ;  afun (External linkage)
>>>>>>>>>>>> ;    Function (2 instructions)
>>>>>>>>>>>> ;    Refs:
>>>>>>>>>>>> ;      a
>>>>>>>>>>>> ;  a (Weak any linkage)
>>>>>>>>>>>> ;    Alias (aliasee X)
>>>>>>>>>>>>
>>>>>>>>>>>> I've implemented the above format in the llvm-dis utility,
>>>>>>>>>>>> since there currently isn't really a way of getting ThinLTO summaries in a
>>>>>>>>>>>> human-readable format.
>>>>>>>>>>>>
>>>>>>>>>>>> Let me know what you think of this format, and what information
>>>>>>>>>>>> you think should be added/removed.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Charles
>>>>>>>>>>>>
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> LLVM Developers mailing list
>>>>>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> LLVM Developers mailing list
>>>>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
>>>>>> 408-460-2413 <(408)%20460-2413>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
>>>> 408-460-2413 <(408)%20460-2413>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>>
>>>
>>>
>>> --
>>> --
>>> Peter
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170717/f07e0cfd/attachment.html>