[llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Tue May 1 10:48:57 PDT 2018

Hi David,
Thanks for the comments, replies below.
Teresa

On Mon, Apr 30, 2018 at 11:52 AM David Blaikie <dblaikie at gmail.com> wrote:

> Hi Teresa,
>
> Awesome to see - looking forward to it!
>
> On Tue, Apr 24, 2018 at 7:44 AM Teresa Johnson via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi everyone,
>>
>> I started working on a long-standing request to have the summary dumped
>> in a readable format to text, and specifically to emit to LLVM assembly.
>> Proposal below, please let me know your thoughts.
>>
>> Thanks,
>> Teresa
>>
>>
>>
>>
>>
>>
>> *RFC: LLVM Assembly format for ThinLTO
>> Summary========================================Background-----------------ThinLTO
>> operates on small summaries computed during the compile step (i.e. with “-c
>> -flto=thin”), which are then analyzed and updated during the Thin Link
>> stage, and utilized to perform IR updates during the post-link ThinLTO
>> backends. The summaries are emitted as LLVM Bitcode, however, not currently
>> in the LLVM assembly.There are two ways to generate a bitcode file
>> containing summary records for a module: 1. Compile with “clang -c
>> -flto=thin”*
>>
>
> As an aside - I seem to recall that at least internally at Google some
> kind of summary-only bitcode files are used (so that the whole bitcode file
> (especially in builds with debug info) doesn't have to be shipped to the
> node doing the summary merging). How are those summary-only files
> produced? Is that upstream? Or done in a more low-level way (like an
> objcopy, llvm-* tool invocation done as a post-processing step, etc)?
>

This is done upstream, under a special clang option that can be given in
addition to -flto=thin, so that the compile step emits both the full
IR+summary (for the distributed backends) as well as a minimized bitcode
file with summary (for the thin link). Note that the distributed backends
don't actually need the summary with the IR (as it gets all the info it
needs from the combined summary index written out by the thin link), so we
could theoretically improve this to suppress the summary write for that
first file under that option.

>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> * 1. Build from LLVM assembly using “opt -module-summary”Either of these
>> will result in the ModuleSummaryIndex analysis pass (which builds the
>> summary index in memory for a module) to be added to the pipeline just
>> before bitcode emission.Additionally, a combined index is created by
>> merging all the per-module indexes during the Thin Link, which is
>> optionally emitted as a bitcode file.Currently, the only way to view these
>> records is via “llvm-bcanalyzer -dump”, then manually decoding the raw
>> bitcode dumps.Relatedly, there is YAML reader/writer support for CFI
>> related summary fields (-wholeprogramdevirt-read-summary and
>> -wholeprogramdevirt-write-summary). Last summer, GSOC student Charles
>> Saternos implemented support to dump the summary in YAML from llvm-lto2
>> (D34080), including the rest of the summary fields (D34063), however, there
>> was pushback on the related RFC for dumping via YAML or another format
>> rather than emitting as LLVM assembly.Goals: 1. Define LLVM assembly format
>> for summary index2. Define interaction between parsing of summary from LLVM
>> assembly and synthesis of new summary index from IR.3. Implement printing
>> and parsing of summary index LLVM assemblyProposed LLVM Assembly
>> Format----------------------------------------------There are several top
>> level data structures within the ModuleSummaryIndex: 1.
>> ModulePathStringTable: Holds the paths to the modules summarized in the
>> index (only one entry for per-module indexes and multiple in the combined
>> index), along with their hashes (for incremental builds and global
>> promotion).2. GlobalValueMap: A map from global value GUIDs to the
>> corresponding function/variable/alias summary (or summaries for the
>> combined index and weak linkage).3. CFI-related data structures (TypeIdMap,
>> CfiFunctionDefs, and CfiFunctionDecls)I have a WIP patch to AsmWriter.cpp
>> to print the ModuleSummaryIndex that I was using to play with the format.
>> It currently prints 1 and 2 above. I’ve left the CFI related summary data
>> structures as a TODO for now, until the format is at least conceptually
>> agreed, but from looking at those I don’t see an issue with using the same
>> format (with a note/question for Peter on CFI type test representation
>> below).I modeled the proposed format on metadata, with a few key
>> differences noted below. Like metadata, I propose enumerating the entries
>> with the SlotTracker, and prefixing them with a special character. Avoiding
>> characters already used in some fashion (i.e. “!” for metadata and “#” for
>> attributes), I initially have chosen “^”. Open to suggestions
>> though.Consider the following example:extern void foo();int X;int bar() {
>>  foo();  return X;}void barAlias() __attribute__ ((alias ("bar")));int
>> main() {  barAlias();  return bar();}The proposed format has one entry per
>> ModulePathStringTable entry and one per GlobalValueMap/GUID, and looks
>> like:^0 = module: {path: testA.o, hash: 5487197307045666224}^1 = gv: {guid:
>> 1881667236089500162, name: X, summaries: {variable: {module: ^0, flags:
>> {linkage: common, notEligibleToImport: 0, live: 0, dsoLocal: 1}}}}^2 = gv:
>> {guid: 6699318081062747564, name: foo}^3 = gv: {guid: 15822663052811949562,
>> name: main, summaries: {function: {module: ^0, flags: {linkage: extern,
>> notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 5, funcFlags:
>> {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls:
>> {{callee: ^5, hotness: unknown}, {callee: ^4, hotness: unknown}}}}}^4 = gv:
>> {guid: 16434608426314478903, name: bar, summaries: {function: {module: ^0,
>> flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1},
>> insts: 3, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0,
>> returnDoesNotAlias: 0}, calls: {{callee: ^2, hotness: unknown}}, refs:
>> {^1}}}}^5 = gv: {guid: 18040127437030252312, name: barAlias, summaries:
>> {alias: {module: ^0, flags: {linkage: extern, notEligibleToImport: 0, live:
>> 0, dsoLocal: 1}, aliasee: ^4}}}*
>>
>
> Syntax seems pretty good to me!
>

Great!

>
>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *Like metadata, the fields are tagged (currently using lower camel case,
>> maybe upper camel case would be preferable).The proposed format has a
>> structure that reflects the data structures in the summary index. For
>> example, consider the entry “^4”. This corresponds to the function “bar”.
>> The entry for that GUID in the GlobalValueMap contains a list of summaries.
>> For per-module summaries such as this, there will be at most one summary
>> (with no summary list for an external function like “foo”). In the combined
>> summary there may be multiple, e.g. in the case of linkonce_odr functions
>> which have definitions in multiple modules. The summary list for bar (“^4”)
>> contains a FunctionSummary, so the summary is tagged “function:”. The
>> FunctionSummary contains both a flags structure (inherited from the base
>> GlobalValueSummary class), and a funcFlags structure (specific to
>> FunctionSummary). It therefore contains a brace-enclosed list of flag
>> tags/values for each.Where a global value summary references another global
>> value summary (e.g. via a call list, reference list, or aliasee), the entry
>> is referenced by its slot. E.g. the alias “barAlias” (“^5”) references its
>> aliasee “bar” as “^4”.Note that in comparison metadata assembly entries
>> tend to be much more decomposed since many metadata fields are themselves
>> metadata (so then entries tend to be shorter with references to other
>> metadata nodes).Currently, I am emitting the summary entries at the end,
>> after the metadata nodes. Note that the ModuleSummaryIndex is not currently
>> referenced from the Module, and isn’t currently created when parsing the
>> Module IR bitcode (there is a separate derived class for reading the
>> ModuleSummaryIndex from bitcode). This is because they are not currently
>> used at the same time. However, in the future there is no reason why we
>> couldn’t tag the global values in the Module’s LLVM assembly with the
>> corresponding summary entry if the ModuleSummaryIndex is available when
>> printing the Module in the assembly writer. I.e. we could do the following
>> for “main” from the above example when printing the IR definition (note the
>> “^3” at the end):define  dso_local i32 @main() #0 !dbg !17 ^3 {For CFI data
>> structures, the format would be similar. It appears that TypeIds are
>> referred to by string name in the top level TypeIdMap (std::map indexed by
>> std::string type identifier), whereas they are referenced by GUID within
>> the FunctionSummary class (i.e. the TypeTests vector and the VFuncId
>> structure). For the LLVM assembly I think there should be a top level entry
>> for each TypeIdMap, which lists both the type identifier string and its
>> GUID (followed by its associated information stored in the map), and the
>> TypeTests/VFuncId references on the FunctionSummary entries can reference
>> it by summary slot number. I.e. something like:^1 = typeid: {guid: 12345,
>> identifier: name_of_type, …^2 = gv: {... {function: {.... typeTests: {^1,
>> …Peter - is that correct and does that sound ok?Issues when Parsing of
>> Summaries from
>> Assembly--------------------------------------------------------------------When
>> reading an LLVM assembly file containing module summary entries, a
>> ModuleSummaryIndex will be created from the entries.Things to consider are
>> the behavior when: - Invoked with “opt -module-summary” (which currently
>> builds a new summary index from the IR). Options:*
>>
>
>>
>> * 1. recompute summary and throw away summary in the assembly file*
>>
>
> What happens currently if you run `opt -module-summary` on a bitcode file
> that already contains a summary? I feel like the behavior should be the
> same when run on a textual IR file containing a summary, probably?
>

We rebuild the summary. Note that this in part is due to the fact mentioned
above that we have separate readers for the Module IR and the summary. The
opt tool does not even read the summary if present. We currently only read
the summary during the thin link (when building the combined index for
analysis), and in the distributed backends where we read the combined
summary index file emitted for that file by the distributed thin link.

>
>>
>>
>>
>>
>>
>> * 1. ignore -module-summary and build the summary from the LLVM
>> assembly2. give an error3. compare the two summaries (one created from the
>> assembly and the new one created by the analysis phase from the IR), and
>> error if they are different.My opinion is to do a),  so that the behavior
>> using -module-summary doesn’t change. We also need a way to force building
>> of a fresh module summary for cases where the user has modified the LLVM
>> assembly of the IR (see below). - How to handle older LLVM assembly files
>> that don’t contain new summary fields. Options:*
>>
>
> Same thoughts would apply here for "what do we do in the bitcode case" -
> with the option to not support old/difficult textual IR. If there are
> easy/obvious defaults, I'd say it's probably worth baking those in (&
> baking them in even for the existing fields we know about, to make it
> easier to write more terse test cases that don't have to
> verbosily/redundantly specify lots of default values?) to the
> parsing/loading logic?
>

So we do emit an index version in the bitcode, and auto-upgrade in a
conservative manner anything that wasn't emitted prior. We could presumably
serialize out the version number and handle auto-upgrading from textual
assembly the same way (as the version is bumped beyond the current version
at least). If we want to allow omission of some fields for test simplicity,
we could do a similar thing and apply conservative values where possible
for omitted fields (e.g. the flags). That seems fine to me, in which case I
don't think we need a version number. Although this has implications for
the validator, see below.

>
>>
>>
>>
>>
>>
>> * 1. Force the LLVM assembly file to be recreated with a new summary.
>> I.e. “opt -module-summary -o - | llvm-dis”.2. Auto-upgrade, by silently
>> creating conservative values for the new summary entries.I lean towards b)
>> (when possible) for user-friendliness and to reduce required churn on test
>> inputs. - How to handle partial or incorrect LLVM assembly summary entries.
>> How to handle partial summaries depends in part on how we answer the prior
>> question about auto-upgrading. I think the best option like there is to
>> handle it automatically when possible. However, I do think we should error
>> on glaring errors like obviously missing information. For example, when
>> there is summary data in the LLVM assembly, but summary entries are missing
>> for some global values. E.g. if the user modified the assembly to add a
>> function but forgot to add a corresponding summary entry. We could still
>> have subtle issues (e.g. user adds a new call but forgets to update the
>> caller’s summary call list), but it will be harder to detect those.*
>>
>
> I'd be OK with the summary being validated by the IR validator (same way
> other properties of IR are validated & even simple things like if you use
> the wrong IR type to refer to an IR value, you get a parse error, etc) -
> which, I realize, would make it feel like the textual summary was entirely
> redundant
>

It is redundant when the IR is also available, which relates to Peter and
others' objections to serializing this back in. An issue with validation
would be if we allowed omission of some fields and/or auto-upgrading as
discussed above. The applied conservative values might very well not match
the recomputed values. But as I mentioned here we may just want to validate
for glaring errors like required info - i.e. I think we should require that
every GV has an associated summary entry.

(except in cases of standalone summaries - which I imagine will be the
> common case in tests, because the summary processing should be tested in
> isolation (except for testing things like this validation logic itself,
> etc)).
>

Yes, I suspect the biggest usage in tests would be a standalone combined
summary file that we can use to test the application of the thin link
optimizations on a single IR file in the LTO backend pipeline. I.e the
input to the test would be one module IR assembly file (no summary) and one
combined index assembly file, it would run just the ThinLTO backend
pipeline, and check the resulting IR via llvm-dis to ensure the
optimization is applied effectively.

> - Dave
>
>
>>
>>
>> --
>> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
>> 408-460-2413 <(408)%20460-2413>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>

-- 
Teresa Johnson |  Software Engineer |  tejohnson at google.com |  408-460-2413
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180501/05e933bd/attachment-0001.html>