[llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Teresa Johnson via llvm-dev llvm-dev at lists.llvm.org
Thu May 3 15:24:18 PDT 2018


On Thu, May 3, 2018 at 3:21 PM, Peter Collingbourne <peter at pcc.me.uk> wrote:

> On Thu, May 3, 2018 at 3:10 PM, Teresa Johnson <tejohnson at google.com>
> wrote:
>
>>
>>
>> On Thu, May 3, 2018 at 2:58 PM, Peter Collingbourne <peter at pcc.me.uk>
>> wrote:
>>
>>> Hi Teresa,
>>>
>>> I have re-read your proposal, and I'm not getting how you plan to
>>> represent combined summaries with this. Unless I'm missing something, there
>>> doesn't seem to be a way to write out summaries that is independent of the
>>> global values that they relate to. Is that something that you plan to
>>> address later?
>>>
>>
>> I envisioned that the combined index assembly files would only contain
>> GUIDs, not GV names, just as we do in the combined index bitcode files.
>> Does that answer your question?
>>
>
> Okay, I get it now. For some reason I got the impression that the
> top-level entities in your proposal were the global values and not the
> summaries.
>

Ok great. Probably it was misleading since I used "gv:" as the tag, but
that was in reference to the GlobalValueSummary structure name.

Thanks,
Teresa


> Peter
>
>
>>
>> Thanks,
>> Teresa
>>
>>
>>> Peter
>>>
>>> On Tue, Apr 24, 2018 at 7:43 AM, Teresa Johnson <tejohnson at google.com>
>>> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> I started working on a long-standing request to have the summary dumped
>>>> in a readable format to text, and specifically to emit to LLVM assembly.
>>>> Proposal below, please let me know your thoughts.
>>>>
>>>> Thanks,
>>>> Teresa
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *RFC: LLVM Assembly format for ThinLTO
>>>> Summary========================================Background-----------------ThinLTO
>>>> operates on small summaries computed during the compile step (i.e. with “-c
>>>> -flto=thin”), which are then analyzed and updated during the Thin Link
>>>> stage, and utilized to perform IR updates during the post-link ThinLTO
>>>> backends. The summaries are emitted as LLVM Bitcode, however, not currently
>>>> in the LLVM assembly.There are two ways to generate a bitcode file
>>>> containing summary records for a module: 1. Compile with “clang -c
>>>> -flto=thin”2. Build from LLVM assembly using “opt -module-summary”Either of
>>>> these will result in the ModuleSummaryIndex analysis pass (which builds the
>>>> summary index in memory for a module) to be added to the pipeline just
>>>> before bitcode emission.Additionally, a combined index is created by
>>>> merging all the per-module indexes during the Thin Link, which is
>>>> optionally emitted as a bitcode file.Currently, the only way to view these
>>>> records is via “llvm-bcanalyzer -dump”, then manually decoding the raw
>>>> bitcode dumps.Relatedly, there is YAML reader/writer support for CFI
>>>> related summary fields (-wholeprogramdevirt-read-summary and
>>>> -wholeprogramdevirt-write-summary). Last summer, GSOC student Charles
>>>> Saternos implemented support to dump the summary in YAML from llvm-lto2
>>>> (D34080), including the rest of the summary fields (D34063), however, there
>>>> was pushback on the related RFC for dumping via YAML or another format
>>>> rather than emitting as LLVM assembly.Goals: 1. Define LLVM assembly format
>>>> for summary index2. Define interaction between parsing of summary from LLVM
>>>> assembly and synthesis of new summary index from IR.3. Implement printing
>>>> and parsing of summary index LLVM assemblyProposed LLVM Assembly
>>>> Format----------------------------------------------There are several top
>>>> level data structures within the ModuleSummaryIndex: 1.
>>>> ModulePathStringTable: Holds the paths to the modules summarized in the
>>>> index (only one entry for per-module indexes and multiple in the combined
>>>> index), along with their hashes (for incremental builds and global
>>>> promotion).2. GlobalValueMap: A map from global value GUIDs to the
>>>> corresponding function/variable/alias summary (or summaries for the
>>>> combined index and weak linkage).3. CFI-related data structures (TypeIdMap,
>>>> CfiFunctionDefs, and CfiFunctionDecls)I have a WIP patch to AsmWriter.cpp
>>>> to print the ModuleSummaryIndex that I was using to play with the format.
>>>> It currently prints 1 and 2 above. I’ve left the CFI related summary data
>>>> structures as a TODO for now, until the format is at least conceptually
>>>> agreed, but from looking at those I don’t see an issue with using the same
>>>> format (with a note/question for Peter on CFI type test representation
>>>> below).I modeled the proposed format on metadata, with a few key
>>>> differences noted below. Like metadata, I propose enumerating the entries
>>>> with the SlotTracker, and prefixing them with a special character. Avoiding
>>>> characters already used in some fashion (i.e. “!” for metadata and “#” for
>>>> attributes), I initially have chosen “^”. Open to suggestions
>>>> though.Consider the following example:extern void foo();int X;int bar() {
>>>>  foo();  return X;}void barAlias() __attribute__ ((alias ("bar")));int
>>>> main() {  barAlias();  return bar();}The proposed format has one entry per
>>>> ModulePathStringTable entry and one per GlobalValueMap/GUID, and looks
>>>> like:^0 = module: {path: testA.o, hash: 5487197307045666224}^1 = gv: {guid:
>>>> 1881667236089500162, name: X, summaries: {variable: {module: ^0, flags:
>>>> {linkage: common, notEligibleToImport: 0, live: 0, dsoLocal: 1}}}}^2 = gv:
>>>> {guid: 6699318081062747564, name: foo}^3 = gv: {guid: 15822663052811949562,
>>>> name: main, summaries: {function: {module: ^0, flags: {linkage: extern,
>>>> notEligibleToImport: 1, live: 0, dsoLocal: 1}, insts: 5, funcFlags:
>>>> {readNone: 0, readOnly: 0, noRecurse: 0, returnDoesNotAlias: 0}, calls:
>>>> {{callee: ^5, hotness: unknown}, {callee: ^4, hotness: unknown}}}}}^4 = gv:
>>>> {guid: 16434608426314478903, name: bar, summaries: {function: {module: ^0,
>>>> flags: {linkage: extern, notEligibleToImport: 1, live: 0, dsoLocal: 1},
>>>> insts: 3, funcFlags: {readNone: 0, readOnly: 0, noRecurse: 0,
>>>> returnDoesNotAlias: 0}, calls: {{callee: ^2, hotness: unknown}}, refs:
>>>> {^1}}}}^5 = gv: {guid: 18040127437030252312, name: barAlias, summaries:
>>>> {alias: {module: ^0, flags: {linkage: extern, notEligibleToImport: 0, live:
>>>> 0, dsoLocal: 1}, aliasee: ^4}}}Like metadata, the fields are tagged
>>>> (currently using lower camel case, maybe upper camel case would be
>>>> preferable).The proposed format has a structure that reflects the data
>>>> structures in the summary index. For example, consider the entry “^4”. This
>>>> corresponds to the function “bar”. The entry for that GUID in the
>>>> GlobalValueMap contains a list of summaries. For per-module summaries such
>>>> as this, there will be at most one summary (with no summary list for an
>>>> external function like “foo”). In the combined summary there may be
>>>> multiple, e.g. in the case of linkonce_odr functions which have definitions
>>>> in multiple modules. The summary list for bar (“^4”) contains a
>>>> FunctionSummary, so the summary is tagged “function:”. The FunctionSummary
>>>> contains both a flags structure (inherited from the base GlobalValueSummary
>>>> class), and a funcFlags structure (specific to FunctionSummary). It
>>>> therefore contains a brace-enclosed list of flag tags/values for each.Where
>>>> a global value summary references another global value summary (e.g. via a
>>>> call list, reference list, or aliasee), the entry is referenced by its
>>>> slot. E.g. the alias “barAlias” (“^5”) references its aliasee “bar” as
>>>> “^4”.Note that in comparison metadata assembly entries tend to be much more
>>>> decomposed since many metadata fields are themselves metadata (so then
>>>> entries tend to be shorter with references to other metadata
>>>> nodes).Currently, I am emitting the summary entries at the end, after the
>>>> metadata nodes. Note that the ModuleSummaryIndex is not currently
>>>> referenced from the Module, and isn’t currently created when parsing the
>>>> Module IR bitcode (there is a separate derived class for reading the
>>>> ModuleSummaryIndex from bitcode). This is because they are not currently
>>>> used at the same time. However, in the future there is no reason why we
>>>> couldn’t tag the global values in the Module’s LLVM assembly with the
>>>> corresponding summary entry if the ModuleSummaryIndex is available when
>>>> printing the Module in the assembly writer. I.e. we could do the following
>>>> for “main” from the above example when printing the IR definition (note the
>>>> “^3” at the end):define  dso_local i32 @main() #0 !dbg !17 ^3 {For CFI data
>>>> structures, the format would be similar. It appears that TypeIds are
>>>> referred to by string name in the top level TypeIdMap (std::map indexed by
>>>> std::string type identifier), whereas they are referenced by GUID within
>>>> the FunctionSummary class (i.e. the TypeTests vector and the VFuncId
>>>> structure). For the LLVM assembly I think there should be a top level entry
>>>> for each TypeIdMap, which lists both the type identifier string and its
>>>> GUID (followed by its associated information stored in the map), and the
>>>> TypeTests/VFuncId references on the FunctionSummary entries can reference
>>>> it by summary slot number. I.e. something like:^1 = typeid: {guid: 12345,
>>>> identifier: name_of_type, …^2 = gv: {... {function: {.... typeTests: {^1,
>>>> …Peter - is that correct and does that sound ok?Issues when Parsing of
>>>> Summaries from
>>>> Assembly--------------------------------------------------------------------When
>>>> reading an LLVM assembly file containing module summary entries, a
>>>> ModuleSummaryIndex will be created from the entries.Things to consider are
>>>> the behavior when: - Invoked with “opt -module-summary” (which currently
>>>> builds a new summary index from the IR). Options:1. recompute summary and
>>>> throw away summary in the assembly file2. ignore -module-summary and build
>>>> the summary from the LLVM assembly3. give an error4. compare the two
>>>> summaries (one created from the assembly and the new one created by the
>>>> analysis phase from the IR), and error if they are different.My opinion is
>>>> to do a),  so that the behavior using -module-summary doesn’t change. We
>>>> also need a way to force building of a fresh module summary for cases where
>>>> the user has modified the LLVM assembly of the IR (see below). - How to
>>>> handle older LLVM assembly files that don’t contain new summary fields.
>>>> Options:1. Force the LLVM assembly file to be recreated with a new summary.
>>>> I.e. “opt -module-summary -o - | llvm-dis”.2. Auto-upgrade, by silently
>>>> creating conservative values for the new summary entries.I lean towards b)
>>>> (when possible) for user-friendliness and to reduce required churn on test
>>>> inputs. - How to handle partial or incorrect LLVM assembly summary entries.
>>>> How to handle partial summaries depends in part on how we answer the prior
>>>> question about auto-upgrading. I think the best option like there is to
>>>> handle it automatically when possible. However, I do think we should error
>>>> on glaring errors like obviously missing information. For example, when
>>>> there is summary data in the LLVM assembly, but summary entries are missing
>>>> for some global values. E.g. if the user modified the assembly to add a
>>>> function but forgot to add a corresponding summary entry. We could still
>>>> have subtle issues (e.g. user adds a new call but forgets to update the
>>>> caller’s summary call list), but it will be harder to detect those.*
>>>>
>>>> --
>>>> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
>>>>  408-460-2413
>>>>
>>>
>>>
>>>
>>> --
>>> --
>>> Peter
>>>
>>
>>
>>
>> --
>> Teresa Johnson |  Software Engineer |  tejohnson at google.com |
>>  408-460-2413
>>
>
>
>
> --
> --
> Peter
>



-- 
Teresa Johnson |  Software Engineer |  tejohnson at google.com |  408-460-2413
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180503/3d74dc79/attachment-0001.html>


More information about the llvm-dev mailing list