[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

Wed Jun 7 10:04:59 PDT 2017

2017-06-07 10:01 GMT-07:00 Mehdi AMINI <joker.eph at gmail.com>:

>
>
> 2017-06-07 9:44 GMT-07:00 David Blaikie <dblaikie at gmail.com>:
>
>>
>>
>> On Tue, Jun 6, 2017 at 2:21 PM Mehdi AMINI <joker.eph at gmail.com> wrote:
>>
>>> 2017-06-06 13:38 GMT-07:00 David Blaikie <dblaikie at gmail.com>:
>>>
>>>>
>>>>
>>>> On Tue, Jun 6, 2017 at 1:26 PM Mehdi AMINI <joker.eph at gmail.com> wrote:
>>>>
>>>>> 2017-06-05 14:27 GMT-07:00 David Blaikie via llvm-dev <
>>>>> llvm-dev at lists.llvm.org>:
>>>>>
>>>>>> I know there's been a bunch of discussion here already, but I was
>>>>>> wondering if perhaps someone (probably Teresa? Peter?) could:
>>>>>>
>>>>>> 1) summarize the current state
>>>>>> 2) describe the end-goal
>>>>>> 3) describe what steps (& how this patch relates) are planned to get
>>>>>> to (2)
>>>>>>
>>>>>> My naive thoughts, not being intimately familiar with any of this:
>>>>>> Usually bitcode and textual IR support go in together or around the same
>>>>>> time, and designed that way from the start (take r211920 for examaple,
>>>>>> which added an explicit representation of COMDATs to the IR). This seems to
>>>>>> have been an oversight in the implementation of IR summaries (is that an
>>>>>> accurate representation/statement?)
>>>>>>
>>>>>
>>>>> More or less: it was not an oversight.
>>>>> The summaries are not really part of the IR, it is more like an
>>>>> "analysis result" that is serialized. It can always be recomputed from the
>>>>> IR. This aspect makes it quite "special", it is the only analysis result
>>>>> that I know of that we serialize.
>>>>>
>>>>
>>>> The use list work seems pretty similar in some ways (granted, can't be
>>>> recomputed to match, hence the desire to serialize it for test case
>>>> implementation).
>>>>
>>>
>>> I see use-list as a leaky implementation detail of the IR that we
>>> serialized because it impact the processing of the IR.
>>>
>>> Summaries are more like serializing the CFG for example.
>>>
>>>
>>>> But it looks like the same is true here to a degree - there are test
>>>> cases that exercise the summary handling, so they want summaries for input
>>>> (for now, I think, I've seen test cases that run another LLVM tool to
>>>> insert/create a summary to then feed that back in for a test), or to test
>>>> that the resulting summary is correct.
>>>>
>>>
>>> We have cases were we want summaries as an input and check a combined
>>> summary as an output, and for these having the YAML representation will be
>>> useful (we didn't have it before).
>>>
>>
>> What I'm suggesting is that this is an (optional) IR feature as much as
>> any other
>>
>
> Well I disagree with this at this point, because I haven't read anything
> that would support it.
> I'd be happy to revise my position if you were providing any argument that
> would make this holds in face of any other analysis result.
>
>
>> - so it seems slightly odd that it'd be YAML rather than something that
>> looked more like the rest of the IR. Though I'm not outright opposed to
>> YAML here - just want to make sure this information is being treated as a
>> first class IR construct (as much as use order, comdats, etc are for rough
>> examples)
>>
>
> YAML was pushed forward as an easy way to get there IIRC. It wasn't set in
> stone and it was clearly open to change it to a more integrate format.
> So I'm supportive of anyone who would replace this with a more "textual-IR
> integrated" format, I haven't proposed this in this thread because Teresa
> is interested in getting something readable "quickly". My point was more
> that as an intermediate step, I rather reuse the existing YAML
> serialization than creating yet another dump.
>
>
>
>>
>> Can summaries be standalone? I thought they could (that'd be ideal for
>>>> the distributed situation - only the summary needs to go to the 'thin link'
>>>> step, I think? (currently maybe only the debug info is stripped for that -
>>>> but ideally other unused IR wouldn't be shipped there as well, I would
>>>> think)
>>>>
>>>
>>> Yes conceptually they can be standalone.
>>>
>>
>> This seems to provide the strongest/clear motivation for having summaries
>> as a first class (though optional) IR construct.
>>
>
> No, this provide a strong motivation to have a proper serialization, I
> don't see how you connect this to the rest of the IR.
>
> On the topic of how it is / isn't part of the IR: given a Module or
Function you can't get a summary. It lives on the side in the bitcode, but
there isn't any connection in memory: it is purely an analysis result that
is serialized for convenience.

Note that we considered attaching it to the Module, however it causes
problem around invalidation (like every analysis result, this is a "view"
of the IR at a given point, changing the IR invalidates it).

-- 
Mehdi

>
>
>>
>> & now there's an effort to correct that.
>>>>>>
>>>>>
>>>>> The main motivation here, I believe, is more to help dev to have human
>>>>> readable/understandable dump for ThinLTO bitcodes. Having to inspect
>>>>> separately summaries is a pain.
>>>>>
>>>>
>>>> Not sure I quite follow - inspect separately?
>>>>
>>>
>>> llvm-dis does not display summaries today, so you can't just use
>>> llvm-dis like a "regular" flow.
>>>
>>>
>>>> How are they inspected today?
>>>>
>>>
>>> llvm-bcanalyzer? And now the YAML dump as well.
>>>
>>>
>>>> & also, I think there are test cases that want to/are currently testing
>>>> summary input but do so somewhat awkwardly by using another tool to produce
>>>> the summary first. Ideally the test case would have the summary written in
>>>> to start, I would think, if that's a codepath worth testing?
>>>>
>>>
>>> The IR already contains all the information, so why repeating it?
>>>
>>
>> For the same reason that it's relevant to test cases which way it's
>> encoded, etc (in the same way that the LLVM IR repeats types of uses, for
>> example - even though they're totally redundant from a "does this have all
>> the semantic information required) & because it can be standalone.
>>
>
>>
>>> This makes the test case harder to maintain, in the vast majority, I
>>> expect that if a test needs IR then it shouldn't need to include a summary
>>> as well (and vice-versa).
>>>
>>
>> Ah, sorry, I'm not suggesting it should be required - in the same way
>> it's not required in the bitcode. But if you want a summary in the bitcode
>> when assembling a .ll file it seems OK To say you write it in the IR,
>>
>
> No it does not seem OK to me to write summaries alongside the IR in tests
> in general (outside of specific need like testing the round-trip of
> course).
> It is entirely redundant and I don't perceive any benefit, I don't see why
> you would want to do that?
>
>
>
>> and equally if there is a summary in the bitcode it seems reasonable that
>> it be printed in the .ll file by llvm-dis.
>>
>
> I agree and I advocated for this earlier.
>
>
>>
>>
>>> In the majority of test we have we want to check if the importing does
>>> what it is supposed to do, and if the linkage are correctly adjusted. With
>>> a YAML (or other) serialization for the summaries this could indeed been
>>> done purely with summaries, without any IR involved.
>>>
>>
>> I'm not sure I understand - you mean for executions of tools that don't
>> need the rest of the IR, there could be a different/separate tool that
>> consumes YAML summaries and produces YAML
>>
>
> It does not have to be a separate tool: a tool that is looking to operate
> purely on summary should just ask to get the summaries out of the input
> file. The input being textual or bitcode shouldn't matter much at this
> point.
> This is exactly how 'opt' and 'llc' operate.
>
> summaries and that would be tested - but the "consuming a summary in a
>> bitcode file" would not be?
>>
>
> This is exactly what we're doing with (almost) *all* of the .ll test: we
> write them as textual, and read them back as textual, and not as bitcode.
>
>
>> I'm not sure I understand the benefit of this separation and asymmetry
>> with the bitcode form of the same data.
>>
>
> Have you tried to write a test directly in bitcode? ;)
>
> I'm not sure we're talking about the same thing right now.
>
> --
> Mehdi
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170607/e6d79464/attachment.html>