[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format
Charles Saternos via llvm-dev
llvm-dev at lists.llvm.org
Fri Jun 9 12:58:51 PDT 2017
OK, I tested the hotness, and it works.
> I'd ask Charles to take it over. I think it just needs a test case and an
update to the usage message.
Sure - I've added the message and a quick test. The patch is here:
https://reviews.llvm.org/D34063
Thanks,
Charles
On Thu, Jun 8, 2017 at 8:01 PM, Peter Collingbourne <peter at pcc.me.uk> wrote:
> I'd ask Charles to take it over. I think it just needs a test case and an
> update to the usage message.
>
> Peter
>
> On Thu, Jun 8, 2017 at 4:55 PM, Teresa Johnson via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Great! For the hotness, try creating a small test case with a very hot
>> loop that iterates many times. Let me know if you are still having trouble.
>> While the llvm-dis serialization is being discussed, I suppose at the very
>> least this can go in with the rest of the existing YAML summary dumping and
>> get emitted from llvm-lto2 using the patch Peter attached. Peter - do you
>> want to add that to llvm-lto2 so that we have a way of dumping the existing
>> YAML supported summary info to stdout, or would you rather have Charles
>> take that one over and submit it (probably just needs a test case).
>>
>> Teresa
>>
>> On Thu, Jun 8, 2017 at 4:16 PM, Charles Saternos <
>> charles.saternos at gmail.com> wrote:
>>
>>> Hey Teresa,
>>>
>>> I've updated the YAML to include the names and GUIDs for all
>>> functions/global vars/aliases. I've also added the hotness info to the
>>> output, but for some reason, none of my tests when running with FDO gave
>>> anything besides Unknown. I'll be looking more into this tomorrow.
>>>
>>> Here's the current format:
>>>
>>> > ../build/bin/llvm-lto2 dump-summary b.o
>>> ---
>>> NamedGlobalValueMap:
>>> :
>>> - GUID: 3762489268811518743
>>> Kind: GlobalVar
>>> Linkage: PrivateLinkage
>>> NotEligibleToImport: true
>>> Live: false
>>> cold:
>>> - GUID: 11668175513417606517
>>> Kind: Function
>>> Linkage: ExternalLinkage
>>> NotEligibleToImport: true
>>> Live: false
>>> InstCount: 5
>>> Calls:
>>> - Name: puts
>>> GUID: 8979701042202144121
>>> Hotness: Unknown
>>> fib:
>>> - GUID: 8667248078361406812
>>> Kind: Function
>>> Linkage: ExternalLinkage
>>> NotEligibleToImport: true
>>> Live: false
>>> InstCount: 26
>>> Calls:
>>> - Name: fib
>>> GUID: 8667248078361406812
>>> Hotness: Unknown
>>> hot:
>>> - GUID: 10177652421713147431
>>> Kind: Function
>>> Linkage: ExternalLinkage
>>> NotEligibleToImport: true
>>> Live: false
>>> InstCount: 14
>>> Calls:
>>> - Name: fib
>>> GUID: 8667248078361406812
>>> Hotness: Unknown
>>> - Name: printf
>>> GUID: 7383291119112528047
>>> Hotness: Unknown
>>> llvm.used:
>>> - GUID: 15665353970260777610
>>> Kind: GlobalVar
>>> Linkage: AppendingLinkage
>>> NotEligibleToImport: true
>>> Live: true
>>> TypeIdMap:
>>> WithGlobalValueDeadStripping: false
>>> ...
>>>
>>> Thanks,
>>> Charles
>>>
>>>
>>> On Wed, Jun 7, 2017 at 12:38 PM, Teresa Johnson <tejohnson at google.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Jun 7, 2017 at 8:58 AM, Charles Saternos <
>>>> charles.saternos at gmail.com> wrote:
>>>>
>>>>> Alright, now it outputs YAML in the following format:
>>>>>
>>>>> ---
>>>>> NamedGlobalValueMap:
>>>>> X:
>>>>> - Kind: GlobalVar
>>>>> Linkage: ExternalLinkage
>>>>> NotEligibleToImport: false
>>>>> Live: false
>>>>> a:
>>>>> - Kind: Alias
>>>>> Linkage: WeakAnyLinkage
>>>>> NotEligibleToImport: false
>>>>> Live: false
>>>>> AliaseeGUID: 1881667236089500162
>>>>> afun:
>>>>> - Kind: Function
>>>>> Linkage: ExternalLinkage
>>>>> NotEligibleToImport: false
>>>>> Live: false
>>>>> InstCount: 2
>>>>> testtest:
>>>>> - Kind: Function
>>>>> Linkage: ExternalLinkage
>>>>> NotEligibleToImport: false
>>>>> Live: false
>>>>> InstCount: 2
>>>>> Calls:
>>>>> - Function: 14471680721094503013
>>>>> TypeIdMap:
>>>>> WithGlobalValueDeadStripping: false
>>>>> ...
>>>>>
>>>>> Any thoughts on the new format?
>>>>>
>>>>
>>>> Thanks, Charles. The main improvement I think we would want is to
>>>> output value names instead of the GUID. Can you build up a map from GUID ->
>>>> name ahead of time and use those like you were for your initial patch?
>>>> Actually, I also think it would be useful to emit both the GUID and the
>>>> name, since the combined index will eventually only have the GUID, so this
>>>> would give a mapping to use for at least the visual inspection of the
>>>> combined index.
>>>>
>>>> Also, would be good to see an example with FDO, to make sure the
>>>> hotness info of the calls is emitted.
>>>>
>>>> Teresa
>>>>
>>>>
>>>>> Thanks,
>>>>> Charles
>>>>>
>>>>> On Tue, Jun 6, 2017 at 5:21 PM, Mehdi AMINI <joker.eph at gmail.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> 2017-06-06 13:38 GMT-07:00 David Blaikie <dblaikie at gmail.com>:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jun 6, 2017 at 1:26 PM Mehdi AMINI <joker.eph at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> 2017-06-05 14:27 GMT-07:00 David Blaikie via llvm-dev <
>>>>>>>> llvm-dev at lists.llvm.org>:
>>>>>>>>
>>>>>>>>> I know there's been a bunch of discussion here already, but I was
>>>>>>>>> wondering if perhaps someone (probably Teresa? Peter?) could:
>>>>>>>>>
>>>>>>>>> 1) summarize the current state
>>>>>>>>> 2) describe the end-goal
>>>>>>>>> 3) describe what steps (& how this patch relates) are planned to
>>>>>>>>> get to (2)
>>>>>>>>>
>>>>>>>>> My naive thoughts, not being intimately familiar with any of this:
>>>>>>>>> Usually bitcode and textual IR support go in together or around the same
>>>>>>>>> time, and designed that way from the start (take r211920 for examaple,
>>>>>>>>> which added an explicit representation of COMDATs to the IR). This seems to
>>>>>>>>> have been an oversight in the implementation of IR summaries (is that an
>>>>>>>>> accurate representation/statement?)
>>>>>>>>>
>>>>>>>>
>>>>>>>> More or less: it was not an oversight.
>>>>>>>> The summaries are not really part of the IR, it is more like an
>>>>>>>> "analysis result" that is serialized. It can always be recomputed from the
>>>>>>>> IR. This aspect makes it quite "special", it is the only analysis result
>>>>>>>> that I know of that we serialize.
>>>>>>>>
>>>>>>>
>>>>>>> The use list work seems pretty similar in some ways (granted, can't
>>>>>>> be recomputed to match, hence the desire to serialize it for test case
>>>>>>> implementation).
>>>>>>>
>>>>>>
>>>>>> I see use-list as a leaky implementation detail of the IR that we
>>>>>> serialized because it impact the processing of the IR.
>>>>>>
>>>>>> Summaries are more like serializing the CFG for example.
>>>>>>
>>>>>>
>>>>>>> But it looks like the same is true here to a degree - there are test
>>>>>>> cases that exercise the summary handling, so they want summaries for input
>>>>>>> (for now, I think, I've seen test cases that run another LLVM tool to
>>>>>>> insert/create a summary to then feed that back in for a test), or to test
>>>>>>> that the resulting summary is correct.
>>>>>>>
>>>>>>
>>>>>> We have cases were we want summaries as an input and check a combined
>>>>>> summary as an output, and for these having the YAML representation will be
>>>>>> useful (we didn't have it before).
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Can summaries be standalone? I thought they could (that'd be ideal
>>>>>>> for the distributed situation - only the summary needs to go to the 'thin
>>>>>>> link' step, I think? (currently maybe only the debug info is stripped for
>>>>>>> that - but ideally other unused IR wouldn't be shipped there as well, I
>>>>>>> would think)
>>>>>>>
>>>>>>
>>>>>> Yes conceptually they can be standalone.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> & now there's an effort to correct that.
>>>>>>>>>
>>>>>>>>
>>>>>>>> The main motivation here, I believe, is more to help dev to have
>>>>>>>> human readable/understandable dump for ThinLTO bitcodes. Having to inspect
>>>>>>>> separately summaries is a pain.
>>>>>>>>
>>>>>>>
>>>>>>> Not sure I quite follow - inspect separately?
>>>>>>>
>>>>>>
>>>>>> llvm-dis does not display summaries today, so you can't just use
>>>>>> llvm-dis like a "regular" flow.
>>>>>>
>>>>>>
>>>>>>> How are they inspected today?
>>>>>>>
>>>>>>
>>>>>> llvm-bcanalyzer? And now the YAML dump as well.
>>>>>>
>>>>>>
>>>>>>> & also, I think there are test cases that want to/are currently
>>>>>>> testing summary input but do so somewhat awkwardly by using another tool to
>>>>>>> produce the summary first. Ideally the test case would have the summary
>>>>>>> written in to start, I would think, if that's a codepath worth testing?
>>>>>>>
>>>>>>
>>>>>> The IR already contains all the information, so why repeating it?
>>>>>> This makes the test case harder to maintain, in the vast majority, I expect
>>>>>> that if a test needs IR then it shouldn't need to include a summary as well
>>>>>> (and vice-versa).
>>>>>>
>>>>>> In the majority of test we have we want to check if the importing
>>>>>> does what it is supposed to do, and if the linkage are correctly adjusted.
>>>>>> With a YAML (or other) serialization for the summaries this could indeed
>>>>>> been done purely with summaries, without any IR involved.
>>>>>>
>>>>>> --
>>>>>> Mehdi
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> - Dave
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Mehdi
>>>>>>>>
>>>>>>>> So it seems like that would start with a discussion of what the
>>>>>>>>> right end-state would be: What the syntax in textual IR should be, then
>>>>>>>>> implementing it. I can understand implementing such a thing in steps - it's
>>>>>>>>> perhaps more involved than the COMDAT situation. In that case starting on
>>>>>>>>> either side seems fine - implementing the emission first (hidden behind a
>>>>>>>>> flag, so as not to break round-tripping in the interim) or the parsing
>>>>>>>>> first (no need to hide it behind any flags - manually written examples can
>>>>>>>>> be used as input tests).
>>>>>>>>>
>>>>>>>>> (& it sounds like there's some partially implemented functionality
>>>>>>>>> using a YAML format that was intended to address how some test cases could
>>>>>>>>> be written? & this might be a good basis for the syntax - but seems to me
>>>>>>>>> like it might be a bit disjointed/out of place in the textual IR format
>>>>>>>>> that's not otherwise YAML-based?)
>>>>>>>>>
>>>>>>>>> - Dave
>>>>>>>>>
>>>>>>>>> On Fri, Jun 2, 2017 at 8:46 AM Charles Saternos via llvm-dev <
>>>>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>>>>
>>>>>>>>>> Hey all,
>>>>>>>>>>
>>>>>>>>>> Below is the proposed format for the dump of the ThinLTO module
>>>>>>>>>> summary in the llvm-dis utility:
>>>>>>>>>>
>>>>>>>>>> > ../build/bin/llvm-dis t.o && cat t.o.ll
>>>>>>>>>> ; ModuleID = '2.o'
>>>>>>>>>> source_filename = "2.ll"
>>>>>>>>>> target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
>>>>>>>>>> target triple = "x86_64-unknown-linux-gnu"
>>>>>>>>>>
>>>>>>>>>> @X = constant i32 42, section "foo", align 4
>>>>>>>>>>
>>>>>>>>>> @a = weak alias i32, i32* @X
>>>>>>>>>>
>>>>>>>>>> define void @afun() {
>>>>>>>>>> %1 = load i32, i32* @a
>>>>>>>>>> ret void
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> define void @testtest() {
>>>>>>>>>> tail call void @boop()
>>>>>>>>>> ret void
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> declare void @boop()
>>>>>>>>>>
>>>>>>>>>> ; Module summary:
>>>>>>>>>> ; testtest (External linkage)
>>>>>>>>>> ; Function (2 instructions)
>>>>>>>>>> ; Calls: boop
>>>>>>>>>> ; X (External linkage)
>>>>>>>>>> ; Global Variable
>>>>>>>>>> ; afun (External linkage)
>>>>>>>>>> ; Function (2 instructions)
>>>>>>>>>> ; Refs:
>>>>>>>>>> ; a
>>>>>>>>>> ; a (Weak any linkage)
>>>>>>>>>> ; Alias (aliasee X)
>>>>>>>>>>
>>>>>>>>>> I've implemented the above format in the llvm-dis utility, since
>>>>>>>>>> there currently isn't really a way of getting ThinLTO summaries in a
>>>>>>>>>> human-readable format.
>>>>>>>>>>
>>>>>>>>>> Let me know what you think of this format, and what information
>>>>>>>>>> you think should be added/removed.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Charles
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> LLVM Developers mailing list
>>>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> LLVM Developers mailing list
>>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Teresa Johnson | Software Engineer | tejohnson at google.com |
>>>> 408-460-2413 <(408)%20460-2413>
>>>>
>>>
>>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson at google.com |
>> 408-460-2413 <(408)%20460-2413>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>
>
> --
> --
> Peter
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170609/2db5579b/attachment.html>
More information about the llvm-dev
mailing list