[llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Thu May 3 15:02:51 PDT 2018

Ugh, my earlier response is awaiting moderator approval. Apparently with
the response and all the other earlier emails quoted it was too long.
Resending with a bunch of old quoted emails trimmed.
Teresa

On Thu, May 3, 2018 at 2:58 PM, Teresa Johnson <tejohnson at google.com> wrote:

>
>
> On Thu, May 3, 2018 at 2:16 PM, Steven Wu <stevenwu at apple.com> wrote:
>
>>
>>
>> On May 1, 2018, at 4:50 PM, Teresa Johnson via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>> Hi Mehdi, thanks for the comments, responses and a tweaked proposal
>> below. Teresa
>>
>> On Tue, May 1, 2018 at 11:37 AM, Mehdi AMINI <joker.eph at gmail.com> wrote:
>>
>>> Hi,
>>>
>>> My main concern is this one:
>>>
>>> > Currently, I am emitting the summary entries at the end, after the
>>> metadata nodes. Note that the ModuleSummaryIndex is not currently
>>> referenced from the Module, and isn’t currently created when parsing the
>>> Module IR bitcode (there is a separate derived class for reading the
>>> ModuleSummaryIndex from bitcode). This is because they are not currently
>>> used at the same time. However, in the future there is no reason why we
>>> couldn’t tag the global values in the Module’s LLVM assembly with the
>>> corresponding summary entry if the ModuleSummaryIndex is available when
>>> printing the Module in the assembly writer. I.e. we could do the following
>>> for “main” from the above example when printing the IR definition (note the
>>> “^3” at the end):
>>>
>>>
>>> I believe the reason that the ModuleSummaryIndex is not attached to the
>>> Module is that it is fundamentally not a piece of IR, but it is
>>> conceptually really an Analysis result.
>>> Usually other analyses don't serialize their result, we happen to
>>> serialize this one for an optimization purpose (reloading it and making the
>>> thin-link faster).
>>>
>>
>> True. My understanding is that the push for having it serialized via
>> assembly is due to the fact that it is emitted into the bitcode. I know
>> there is disagreement on this reasoning, I am hoping to have a proposal
>> that is acceptable to everyone. =)
>>
>>
>> I totally agree with Mehdi’s concern. It is probably much easier to treat
>> module summary as analysis result, rather than part of the module. Making
>> it part of IR might be overkill just to fix the debugging and readability
>> of the summary.
>>
>>
>>
>>
>>> The fundamental problem is that an analysis result has to be able to be
>>> invalidated with IR changes, attaching this directly to the module wouldn't
>>> achieve this. The risk is that when the IR and the summary get out-of-sync
>>> (`clang -O2 my_module_with_summaries.ll -emit-llvm -o my_optimized
>>> module_with_summaries.ll`) the summaries would be badly wrong.
>>>
>>> Have you looked into what it'd take to make it a "real" analysis in the
>>> pass manager?
>>>
>>
>> Thanks for raising this issue specifically, I hadn't addressed it in my
>> proposal and it is a big one. I am not proposing that we attempt to
>> maintain the summary through optimization passes, and definitely don't
>> think we should do that. IMO deserializing it should be for testing the
>> thin link and the combined summaries in the backends only. To that end, I
>> have an idea (below some background first).
>>
>> Note that in some cases the module summary analysis is an analysis pass.
>> I.e. when invoked by "opt -module-summary=". However, some time ago when
>> Peter added the support for splitting the bitcode (for CFI purposes) and
>> therefore needed to generate a summary in each partition (Thin and
>> Regular), he added the ThinLTOBitcodeWriterPass, which invokes the module
>> summary builder directly (twice). This writer is what gets invoked now when
>> building via "clang -flto=thin", and with "opt -thinlto-bc". So there it is
>> not invoked/maintained as an analysis pass/result. It would be tricky to
>> figure out how to even split rather than recompute the module summary index
>> in that case. Even in the case where we are still invoking as an analysis
>> pass (opt -module-summary), we would need to figure out how to read in the
>> module summary to use as the analysis result when available (so that it
>> could be invalidated and recomputed when stale).
>>
>> Rather than add this plumbing, and just have it discarded if opt does any
>> optimization, I think we should focus at least for the time being on
>> supporting reading the summary from assembly exactly where we currently
>> read in the summary from bitcode:
>> 1) For the thin link (e.g. tools such as llvm-lto2 or llvm-lto, which
>> currently have to be preceded by "opt -module-summary/-thinlto-bc" to
>> generate an index, but could just build it from assembly instead).
>> 2) For the LTO backends (e.g. tools such as llvm-lto which can consume a
>> combined index and invoke the backends, or "clang -fthinlto-index=" for
>> distributed ThinLTO backend testing), where we could build the combined
>> summary index from assembly instead.
>>
>> This greatly simplifies the reading side, as there are no optimizations
>> performed on the IR after the index is read in these cases that would
>> require invalidation. It also simplifies adding the parsing support, since
>> it gets invoked exactly where we expect to build an index currently (i.e.
>> since we don't currently build or store the ModuleSummaryIndex when parsing
>> the Module from bitcode). It doesn't preclude someone from figuring out how
>> to compute the module summary analysis result from the assembly, and
>> invalidating it after optimization, when reading the Module IR via 'opt' in
>> the future.
>>
>> Does this seem like a reasonable proposal to everyone?
>>
>>
>> I would +1 for making this an analysis pass. How about a
>> ModuleSummaryAnalysis that knows how to serialize its result into bitcode
>> and YAML format? We can also add a module flag to indicate if the module
>> should contain module summary or not.
>>
>> * For thinLTO compilation, bitcode writer runs the analysis pass if the
>> module flag is set, and emit module summary into bitcode.
>>
> * When reading bitcode with module summary, the summary is parsed into
>> ModuleSummaryAnalysis result.
>>
>
> Note the reason we don't do this now is that we don't ever need to use a
> the summary at the same time as when we use the IR. We only read the
> summary for the thin link (no IR involved), or for distributed ThinLTO
> build backends (but only the combined index is read, the summary attached
> to the module is not read as it is not needed at all). Which is why I
> suggested the simplification in my proposal above of only reading the
> summary from assembly where we currently read the bitcode summary (which is
> not when we normally read the IR).
>
> Also, note as I mentioned above that the default path may split the IR
> into regular LTO and ThinLTO partitions, which is why we stopped invoking
> it as analysis pass on the default path, and instead invoke the index
> builder directly for each partition. Figuring out how to split a single
> module summary analysis result into two parts is not something I think we
> should block this on.
>
>
> * If there are any transform invalidates the analysis, module summary will
>> be recomputed automatically when writing bitcode, otherwise, it will
>> serialize out the same result as input (probably with some auto upgrade).
>> * For debugging, ModuleSummaryAnalysis can be serialized out to YAML
>> * For testing, ModuleSummaryAnalysis result can be created from YAML
>>
>> I suggested YAML but it can be any other format, including LLVM assembly.
>> For readability, I would prefer YAML.
>>
>
> That is similar to the original proposal from last year, but there was not
> agreement from other upstream maintainers to dump as something other than
> assembly that could be serialized in.
>
> That's why I'm proposing the version above - serialize in as assembly
> exactly where we consume the bitcode version of the module summary index
> today. The advantage is that it doesn't require analysis pass work and
> figuring out how to make the analysis pass work with split LTO partitions.
> At the same time, it doesn't preclude doing that in the future either. I
> think my proposal enables testing of the ThinLTO pipeline via textual
> assembly, which is one of the main goals of being able to serialize it back
> in.
>
> Teresa
>
>
>> Thanks
>>
>> Steven
>>
>>

-- 
Teresa Johnson |  Software Engineer |  tejohnson at google.com |  408-460-2413
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180503/fee5774e/attachment.html>