[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format
Teresa Johnson via llvm-dev
llvm-dev at lists.llvm.org
Thu Jun 8 16:55:56 PDT 2017
Great! For the hotness, try creating a small test case with a very hot loop
that iterates many times. Let me know if you are still having trouble.
While the llvm-dis serialization is being discussed, I suppose at the very
least this can go in with the rest of the existing YAML summary dumping and
get emitted from llvm-lto2 using the patch Peter attached. Peter - do you
want to add that to llvm-lto2 so that we have a way of dumping the existing
YAML supported summary info to stdout, or would you rather have Charles
take that one over and submit it (probably just needs a test case).
Teresa
On Thu, Jun 8, 2017 at 4:16 PM, Charles Saternos <charles.saternos at gmail.com
> wrote:
> Hey Teresa,
>
> I've updated the YAML to include the names and GUIDs for all
> functions/global vars/aliases. I've also added the hotness info to the
> output, but for some reason, none of my tests when running with FDO gave
> anything besides Unknown. I'll be looking more into this tomorrow.
>
> Here's the current format:
>
> > ../build/bin/llvm-lto2 dump-summary b.o
> ---
> NamedGlobalValueMap:
> :
> - GUID: 3762489268811518743
> Kind: GlobalVar
> Linkage: PrivateLinkage
> NotEligibleToImport: true
> Live: false
> cold:
> - GUID: 11668175513417606517
> Kind: Function
> Linkage: ExternalLinkage
> NotEligibleToImport: true
> Live: false
> InstCount: 5
> Calls:
> - Name: puts
> GUID: 8979701042202144121
> Hotness: Unknown
> fib:
> - GUID: 8667248078361406812
> Kind: Function
> Linkage: ExternalLinkage
> NotEligibleToImport: true
> Live: false
> InstCount: 26
> Calls:
> - Name: fib
> GUID: 8667248078361406812
> Hotness: Unknown
> hot:
> - GUID: 10177652421713147431
> Kind: Function
> Linkage: ExternalLinkage
> NotEligibleToImport: true
> Live: false
> InstCount: 14
> Calls:
> - Name: fib
> GUID: 8667248078361406812
> Hotness: Unknown
> - Name: printf
> GUID: 7383291119112528047
> Hotness: Unknown
> llvm.used:
> - GUID: 15665353970260777610
> Kind: GlobalVar
> Linkage: AppendingLinkage
> NotEligibleToImport: true
> Live: true
> TypeIdMap:
> WithGlobalValueDeadStripping: false
> ...
>
> Thanks,
> Charles
>
>
> On Wed, Jun 7, 2017 at 12:38 PM, Teresa Johnson <tejohnson at google.com>
> wrote:
>
>>
>>
>> On Wed, Jun 7, 2017 at 8:58 AM, Charles Saternos <
>> charles.saternos at gmail.com> wrote:
>>
>>> Alright, now it outputs YAML in the following format:
>>>
>>> ---
>>> NamedGlobalValueMap:
>>> X:
>>> - Kind: GlobalVar
>>> Linkage: ExternalLinkage
>>> NotEligibleToImport: false
>>> Live: false
>>> a:
>>> - Kind: Alias
>>> Linkage: WeakAnyLinkage
>>> NotEligibleToImport: false
>>> Live: false
>>> AliaseeGUID: 1881667236089500162
>>> afun:
>>> - Kind: Function
>>> Linkage: ExternalLinkage
>>> NotEligibleToImport: false
>>> Live: false
>>> InstCount: 2
>>> testtest:
>>> - Kind: Function
>>> Linkage: ExternalLinkage
>>> NotEligibleToImport: false
>>> Live: false
>>> InstCount: 2
>>> Calls:
>>> - Function: 14471680721094503013
>>> TypeIdMap:
>>> WithGlobalValueDeadStripping: false
>>> ...
>>>
>>> Any thoughts on the new format?
>>>
>>
>> Thanks, Charles. The main improvement I think we would want is to output
>> value names instead of the GUID. Can you build up a map from GUID -> name
>> ahead of time and use those like you were for your initial patch? Actually,
>> I also think it would be useful to emit both the GUID and the name, since
>> the combined index will eventually only have the GUID, so this would give a
>> mapping to use for at least the visual inspection of the combined index.
>>
>> Also, would be good to see an example with FDO, to make sure the hotness
>> info of the calls is emitted.
>>
>> Teresa
>>
>>
>>> Thanks,
>>> Charles
>>>
>>> On Tue, Jun 6, 2017 at 5:21 PM, Mehdi AMINI <joker.eph at gmail.com> wrote:
>>>
>>>>
>>>>
>>>> 2017-06-06 13:38 GMT-07:00 David Blaikie <dblaikie at gmail.com>:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Jun 6, 2017 at 1:26 PM Mehdi AMINI <joker.eph at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> 2017-06-05 14:27 GMT-07:00 David Blaikie via llvm-dev <
>>>>>> llvm-dev at lists.llvm.org>:
>>>>>>
>>>>>>> I know there's been a bunch of discussion here already, but I was
>>>>>>> wondering if perhaps someone (probably Teresa? Peter?) could:
>>>>>>>
>>>>>>> 1) summarize the current state
>>>>>>> 2) describe the end-goal
>>>>>>> 3) describe what steps (& how this patch relates) are planned to get
>>>>>>> to (2)
>>>>>>>
>>>>>>> My naive thoughts, not being intimately familiar with any of this:
>>>>>>> Usually bitcode and textual IR support go in together or around the same
>>>>>>> time, and designed that way from the start (take r211920 for examaple,
>>>>>>> which added an explicit representation of COMDATs to the IR). This seems to
>>>>>>> have been an oversight in the implementation of IR summaries (is that an
>>>>>>> accurate representation/statement?)
>>>>>>>
>>>>>>
>>>>>> More or less: it was not an oversight.
>>>>>> The summaries are not really part of the IR, it is more like an
>>>>>> "analysis result" that is serialized. It can always be recomputed from the
>>>>>> IR. This aspect makes it quite "special", it is the only analysis result
>>>>>> that I know of that we serialize.
>>>>>>
>>>>>
>>>>> The use list work seems pretty similar in some ways (granted, can't be
>>>>> recomputed to match, hence the desire to serialize it for test case
>>>>> implementation).
>>>>>
>>>>
>>>> I see use-list as a leaky implementation detail of the IR that we
>>>> serialized because it impact the processing of the IR.
>>>>
>>>> Summaries are more like serializing the CFG for example.
>>>>
>>>>
>>>>> But it looks like the same is true here to a degree - there are test
>>>>> cases that exercise the summary handling, so they want summaries for input
>>>>> (for now, I think, I've seen test cases that run another LLVM tool to
>>>>> insert/create a summary to then feed that back in for a test), or to test
>>>>> that the resulting summary is correct.
>>>>>
>>>>
>>>> We have cases were we want summaries as an input and check a combined
>>>> summary as an output, and for these having the YAML representation will be
>>>> useful (we didn't have it before).
>>>>
>>>>
>>>>>
>>>>> Can summaries be standalone? I thought they could (that'd be ideal for
>>>>> the distributed situation - only the summary needs to go to the 'thin link'
>>>>> step, I think? (currently maybe only the debug info is stripped for that -
>>>>> but ideally other unused IR wouldn't be shipped there as well, I would
>>>>> think)
>>>>>
>>>>
>>>> Yes conceptually they can be standalone.
>>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>> & now there's an effort to correct that.
>>>>>>>
>>>>>>
>>>>>> The main motivation here, I believe, is more to help dev to have
>>>>>> human readable/understandable dump for ThinLTO bitcodes. Having to inspect
>>>>>> separately summaries is a pain.
>>>>>>
>>>>>
>>>>> Not sure I quite follow - inspect separately?
>>>>>
>>>>
>>>> llvm-dis does not display summaries today, so you can't just use
>>>> llvm-dis like a "regular" flow.
>>>>
>>>>
>>>>> How are they inspected today?
>>>>>
>>>>
>>>> llvm-bcanalyzer? And now the YAML dump as well.
>>>>
>>>>
>>>>> & also, I think there are test cases that want to/are currently
>>>>> testing summary input but do so somewhat awkwardly by using another tool to
>>>>> produce the summary first. Ideally the test case would have the summary
>>>>> written in to start, I would think, if that's a codepath worth testing?
>>>>>
>>>>
>>>> The IR already contains all the information, so why repeating it? This
>>>> makes the test case harder to maintain, in the vast majority, I expect that
>>>> if a test needs IR then it shouldn't need to include a summary as well (and
>>>> vice-versa).
>>>>
>>>> In the majority of test we have we want to check if the importing does
>>>> what it is supposed to do, and if the linkage are correctly adjusted. With
>>>> a YAML (or other) serialization for the summaries this could indeed been
>>>> done purely with summaries, without any IR involved.
>>>>
>>>> --
>>>> Mehdi
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> - Dave
>>>>>
>>>>>
>>>>>>
>>>>>> --
>>>>>> Mehdi
>>>>>>
>>>>>> So it seems like that would start with a discussion of what the right
>>>>>>> end-state would be: What the syntax in textual IR should be, then
>>>>>>> implementing it. I can understand implementing such a thing in steps - it's
>>>>>>> perhaps more involved than the COMDAT situation. In that case starting on
>>>>>>> either side seems fine - implementing the emission first (hidden behind a
>>>>>>> flag, so as not to break round-tripping in the interim) or the parsing
>>>>>>> first (no need to hide it behind any flags - manually written examples can
>>>>>>> be used as input tests).
>>>>>>>
>>>>>>> (& it sounds like there's some partially implemented functionality
>>>>>>> using a YAML format that was intended to address how some test cases could
>>>>>>> be written? & this might be a good basis for the syntax - but seems to me
>>>>>>> like it might be a bit disjointed/out of place in the textual IR format
>>>>>>> that's not otherwise YAML-based?)
>>>>>>>
>>>>>>> - Dave
>>>>>>>
>>>>>>> On Fri, Jun 2, 2017 at 8:46 AM Charles Saternos via llvm-dev <
>>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>>
>>>>>>>> Hey all,
>>>>>>>>
>>>>>>>> Below is the proposed format for the dump of the ThinLTO module
>>>>>>>> summary in the llvm-dis utility:
>>>>>>>>
>>>>>>>> > ../build/bin/llvm-dis t.o && cat t.o.ll
>>>>>>>> ; ModuleID = '2.o'
>>>>>>>> source_filename = "2.ll"
>>>>>>>> target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
>>>>>>>> target triple = "x86_64-unknown-linux-gnu"
>>>>>>>>
>>>>>>>> @X = constant i32 42, section "foo", align 4
>>>>>>>>
>>>>>>>> @a = weak alias i32, i32* @X
>>>>>>>>
>>>>>>>> define void @afun() {
>>>>>>>> %1 = load i32, i32* @a
>>>>>>>> ret void
>>>>>>>> }
>>>>>>>>
>>>>>>>> define void @testtest() {
>>>>>>>> tail call void @boop()
>>>>>>>> ret void
>>>>>>>> }
>>>>>>>>
>>>>>>>> declare void @boop()
>>>>>>>>
>>>>>>>> ; Module summary:
>>>>>>>> ; testtest (External linkage)
>>>>>>>> ; Function (2 instructions)
>>>>>>>> ; Calls: boop
>>>>>>>> ; X (External linkage)
>>>>>>>> ; Global Variable
>>>>>>>> ; afun (External linkage)
>>>>>>>> ; Function (2 instructions)
>>>>>>>> ; Refs:
>>>>>>>> ; a
>>>>>>>> ; a (Weak any linkage)
>>>>>>>> ; Alias (aliasee X)
>>>>>>>>
>>>>>>>> I've implemented the above format in the llvm-dis utility, since
>>>>>>>> there currently isn't really a way of getting ThinLTO summaries in a
>>>>>>>> human-readable format.
>>>>>>>>
>>>>>>>> Let me know what you think of this format, and what information you
>>>>>>>> think should be added/removed.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Charles
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> LLVM Developers mailing list
>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> LLVM Developers mailing list
>>>>>>> llvm-dev at lists.llvm.org
>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>
>>>>>>>
>>>>
>>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson at google.com |
>> 408-460-2413 <(408)%20460-2413>
>>
>
>
--
Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170608/da136b53/attachment.html>
More information about the llvm-dev
mailing list