[llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Thu May 3 15:54:48 PDT 2018

Re-sending with trimmed quotation.

On Thu, May 3, 2018 at 3:52 PM, Peter Collingbourne <peter at pcc.me.uk> wrote:

>
>
> On Thu, May 3, 2018 at 3:29 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
>>
>>
>> On Thu, May 3, 2018 at 3:08 PM Peter Collingbourne via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> On Thu, May 3, 2018 at 2:44 PM, Mehdi AMINI <joker.eph at gmail.com> wrote:
>>>
>>>>
>>>>
>>>> Le mar. 1 mai 2018 à 16:50, Teresa Johnson <tejohnson at google.com> a
>>>> écrit :
>>>>
>>>>> Hi Mehdi, thanks for the comments, responses and a tweaked proposal
>>>>> below. Teresa
>>>>>
>>>>> On Tue, May 1, 2018 at 11:37 AM, Mehdi AMINI <joker.eph at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> My main concern is this one:
>>>>>>
>>>>>> > Currently, I am emitting the summary entries at the end, after the
>>>>>> metadata nodes. Note that the ModuleSummaryIndex is not currently
>>>>>> referenced from the Module, and isn’t currently created when parsing the
>>>>>> Module IR bitcode (there is a separate derived class for reading the
>>>>>> ModuleSummaryIndex from bitcode). This is because they are not currently
>>>>>> used at the same time. However, in the future there is no reason why we
>>>>>> couldn’t tag the global values in the Module’s LLVM assembly with the
>>>>>> corresponding summary entry if the ModuleSummaryIndex is available when
>>>>>> printing the Module in the assembly writer. I.e. we could do the following
>>>>>> for “main” from the above example when printing the IR definition (note the
>>>>>> “^3” at the end):
>>>>>>
>>>>>>
>>>>>> I believe the reason that the ModuleSummaryIndex is not attached to
>>>>>> the Module is that it is fundamentally not a piece of IR, but it is
>>>>>> conceptually really an Analysis result.
>>>>>> Usually other analyses don't serialize their result, we happen to
>>>>>> serialize this one for an optimization purpose (reloading it and making the
>>>>>> thin-link faster).
>>>>>>
>>>>>
>>>>> True. My understanding is that the push for having it serialized via
>>>>> assembly is due to the fact that it is emitted into the bitcode. I know
>>>>> there is disagreement on this reasoning, I am hoping to have a proposal
>>>>> that is acceptable to everyone. =)
>>>>>
>>>>>
>>>>>
>>>>>> The fundamental problem is that an analysis result has to be able to
>>>>>> be invalidated with IR changes, attaching this directly to the module
>>>>>> wouldn't achieve this. The risk is that when the IR and the summary get
>>>>>> out-of-sync (`clang -O2 my_module_with_summaries.ll -emit-llvm -o
>>>>>> my_optimized module_with_summaries.ll`) the summaries would be badly wrong.
>>>>>>
>>>>>> Have you looked into what it'd take to make it a "real" analysis in
>>>>>> the pass manager?
>>>>>>
>>>>>
>>>>> Thanks for raising this issue specifically, I hadn't addressed it in
>>>>> my proposal and it is a big one. I am not proposing that we attempt to
>>>>> maintain the summary through optimization passes, and definitely don't
>>>>> think we should do that. IMO deserializing it should be for testing the
>>>>> thin link and the combined summaries in the backends only. To that end, I
>>>>> have an idea (below some background first).
>>>>>
>>>>> Note that in some cases the module summary analysis is an analysis
>>>>> pass. I.e. when invoked by "opt -module-summary=". However, some time ago
>>>>> when Peter added the support for splitting the bitcode (for CFI purposes)
>>>>> and therefore needed to generate a summary in each partition (Thin and
>>>>> Regular), he added the ThinLTOBitcodeWriterPass, which invokes the module
>>>>> summary builder directly (twice). This writer is what gets invoked now when
>>>>> building via "clang -flto=thin", and with "opt -thinlto-bc". So there it is
>>>>> not invoked/maintained as an analysis pass/result. It would be tricky to
>>>>> figure out how to even split rather than recompute the module summary index
>>>>> in that case. Even in the case where we are still invoking as an analysis
>>>>> pass (opt -module-summary), we would need to figure out how to read in the
>>>>> module summary to use as the analysis result when available (so that it
>>>>> could be invalidated and recomputed when stale).
>>>>>
>>>>> Rather than add this plumbing, and just have it discarded if opt does
>>>>> any optimization, I think we should focus at least for the time being on
>>>>> supporting reading the summary from assembly exactly where we currently
>>>>> read in the summary from bitcode:
>>>>> 1) For the thin link (e.g. tools such as llvm-lto2 or llvm-lto, which
>>>>> currently have to be preceded by "opt -module-summary/-thinlto-bc" to
>>>>> generate an index, but could just build it from assembly instead).
>>>>> 2) For the LTO backends (e.g. tools such as llvm-lto which can consume
>>>>> a combined index and invoke the backends, or "clang -fthinlto-index=" for
>>>>> distributed ThinLTO backend testing), where we could build the combined
>>>>> summary index from assembly instead.
>>>>>
>>>>> This greatly simplifies the reading side, as there are no
>>>>> optimizations performed on the IR after the index is read in these cases
>>>>> that would require invalidation. It also simplifies adding the parsing
>>>>> support, since it gets invoked exactly where we expect to build an index
>>>>> currently (i.e. since we don't currently build or store the
>>>>> ModuleSummaryIndex when parsing the Module from bitcode). It doesn't
>>>>> preclude someone from figuring out how to compute the module summary
>>>>> analysis result from the assembly, and invalidating it after optimization,
>>>>> when reading the Module IR via 'opt' in the future.
>>>>>
>>>>> Does this seem like a reasonable proposal to everyone?
>>>>>
>>>>
>>>> Sounds good to me.
>>>>
>>>> That would make .ll files quite convenient during debugging I think? We
>>>> could disassemble, manually change summaries, and re-assemble a bitcode
>>>> file before running the (thin-)link again.
>>>>
>>>
>>> Yes, that seems more reasonable than what I thought you had in mind. If
>>> the only consumer of this information is llvm-as, then the purpose of the
>>> asm summary format is just to provide a way to create a .bc file for
>>> testing purposes, which is certainly a useful capability.
>>>
>>
>> That feels a bit surprising if ".ll -> llvm-as -> .bc -> <some other tool
>> - opt, etc>" is different from ".ll -> <opt, etc>". Is that what we're
>> talking about here? Any chance that can be avoided & feeding a .ll file
>> works (in the sense of does the same thing/tests the same behavior) in all
>> the same places that feeding a .bc file does? (as is the case with
>> non-summary-based IR, to the best of my knowledge)
>>
>
I may be mistaken, but I don't think we have a lot of tools that can read
both .ll and .bc and end up using the summary if it is a .bc file. LTO
can't read .ll, for example. The only one that I can think of is clang and
presumably we could make that use whichever API we would use in llvm-as for
reading the summary from .ll. So the behaviour of most tools would be that
".ll -> llvm-as -> .bc -> <some tool>" vs ".ll -> <some tool>" would end up
being the same in both cases: the summary gets discarded.

Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180503/ebbc00cb/attachment.html>