[llvm-dev] [RFC][ThinLTO] llvm-dis ThinLTO summary dump format

Wed Jul 19 08:52:54 PDT 2017

On Wed, Jul 19, 2017 at 8:43 AM Teresa Johnson <tejohnson at google.com> wrote:

> On Wed, Jul 19, 2017 at 8:31 AM, David Blaikie <dblaikie at gmail.com> wrote:
>
>>
>>
>> On Mon, Jul 17, 2017 at 5:18 PM Mehdi AMINI <joker.eph at gmail.com> wrote:
>>
>>> 2017-07-17 16:49 GMT-07:00 David Blaikie via llvm-dev <
>>> llvm-dev at lists.llvm.org>:
>>>
>>>>
>>>>
>>>> On Mon, Jul 17, 2017 at 6:11 AM Charles Saternos via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> Hey @chandlerc and @dblaikie,
>>>>>
>>>>> Any updates on this in relation to "[PATCH] D34080: [ThinLTO] Add
>>>>> dump-summary command to llvm-lto2 tool"?
>>>>>
>>>>
>>>> Sorry you've kind of got stuck in the middle of this - but I'm still
>>>> hoping to hear/understand the pushback on implementing this as a first
>>>> class .ll construct with serialization and deserialization support.
>>>>
>>>> I think Peter mentioned he didn't think this was the right path forward
>>>> in the long term? If that's the case, I'd like to understand that/reach
>>>> that conclusion for the project now rather than treating this as a stop-gap
>>>> with some idea that in the future someone might implement full
>>>> serialization support (when it's been over a year already, and other stop
>>>> gaps have been implemented (the yaml input support) already).
>>>>
>>>
>>> I'm totally believing we need first class serialization support in .ll,
>>> and I have a path forward for this (just not a lot of time to dedicate to
>>> this).
>>>
>>
>> What's the rough expectation of time/complexity for this path forward?
>>
>>
>>> & if a .ll construct with serialization/deserialization is the path
>>>> forward, understanding the motivation for a something other than going
>>>> straight for that would be helpful -usually bitcode features come with .ll
>>>> support from day 1, not a year later. I'm not clear on what would make this
>>>> feature an exception/more expensive to do this for (& why it would be worth
>>>> deferring that work, and what/when that work will be motivated/done)
>>>>
>>>
>>>
>>> We need a debugging tool for summaries ASAP, and the YAML is *already*
>>> implemented.
>>>
>>
>> I'm not sure I understand why the tradeoff is worthwhile - in terms of
>> needing to add a new feature (even if it's already implemented) and tests,
>> then porting those tests to a first-class .ll construct later. Usually
>> adding .ll formats doesn't seem to be terribly expensive/time intensive.
>>
>
> The main complication I see is defining the behavior when the serialized
> summary is read back in.
> 1) Do we trust that it is correct and consistent with the IR and blindly
> use it? That could cause some issues if someone changes the IR in a .ll
> file for testing and doesn't realize they need to also update the summary
> correspondingly.
>

Generally textual IR is assumed to be bogus and is validated for invariants
like this.

> 2) Do we always want to build the summary from the IR and check it against
> the summary read from the .ll file? In that case, what is even the use of
> building a summary from the serialized form?
>

Isn't that why the YAML support was added in the first place - for reading
in summaries for test cases?

> 3) If we want to allow tweaks to the summary in the .ll that override what
> is in the IR, for testing purposes, how does any checking we do in
>

I'm not sure why we would - what would be tested that way? If the
invariants of the summary are violated presumably the behavior of LLVM is
undefined & so there's nothing to test/no expected/defined/usable behavior
there.

> 2) distinguish between the case in 1) (user error) and 3) (intended
> difference)?
>
> This is why I suggested emitting as comments in the .ll file initially
> (which is useful for debugging purposes, although the YAML works fine for
> that too), while the above are hashed out.
>

>
>>
>>> Making it available through the llvm-lto tool is a no-brainer to me.
>>>
>>> This was *not* an oversight but a deliberate choice to not do this in
>>> the first place. Because summaries are the first bitcode feature I know of
>>> that isn't attached in any way to a Module (you can't get to it from a
>>> Module).
>>>
>>
>> Not sure I quite follow why that difference made the choice/tradeoff here
>> different (which admittedly is a bit easier to see in retrospect maybe -
>> now that there's been a need to build serialization and deserialization).
>> Do you mean it wasn't clear that serialization support was needed when
>> summaries were first implemented, but it is clear now?
>>
>
> Speaking for myself, it wasn't clear to me that serialization support was
> needed for anything other than debugging/testing,
>

The textual IR in its entirety basically exists only for debugging/testing
- but it's a big 'only'. (as has become apparent in this case with
experience, by the looks of it)

(not a criticism of you/your work - just that this should've been caught in
code review, I think)

> since it is redundant with and computed from the IR, and I wasn't sure
> emitting into the .ll file was the right way for debugging. Which is why
> the testing used llvm-bcanalyzer -dump.
>
> Teresa
>
>
>
>>
>> - Dave
>>
>
>
>
> --
> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
> 408-460-2413 <(408)%20460-2413>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170719/8334f12c/attachment.html>