[llvm-dev] RFC: LLVM Assembly format for ThinLTO Summary

Teresa Johnson via llvm-dev llvm-dev at lists.llvm.org
Thu May 10 08:41:54 PDT 2018


On Tue, May 1, 2018 at 11:10 AM Peter Collingbourne <peter at pcc.me.uk> wrote:

>
>
> On Tue, May 1, 2018 at 9:00 AM, Teresa Johnson <tejohnson at google.com>
> wrote:
>
>>
>>
>>>>>>
>>>>>>
>>>>>> *For CFI data structures, the format would be similar. It appears
>>>>>> that TypeIds are referred to by string name in the top level TypeIdMap
>>>>>> (std::map indexed by std::string type identifier), whereas they are
>>>>>> referenced by GUID within the FunctionSummary class (i.e. the TypeTests
>>>>>> vector and the VFuncId structure). For the LLVM assembly I think there
>>>>>> should be a top level entry for each TypeIdMap, which lists both the type
>>>>>> identifier string and its GUID (followed by its associated information
>>>>>> stored in the map), and the TypeTests/VFuncId references on the
>>>>>> FunctionSummary entries can reference it by summary slot number. I.e.
>>>>>> something like:^1 = typeid: {guid: 12345, identifier: name_of_type, …^2 =
>>>>>> gv: {... {function: {.... typeTests: {^1, …Peter - is that correct and does
>>>>>> that sound ok?*
>>>>>>
>>>>>
>>>>> I don't think that would work because the purpose of the top-level
>>>>> TypeIdMap is to contain resolutions for each type identifier, and
>>>>> per-module summaries do not contain resolutions (only the combined summary
>>>>> does). What that means in practice is that we would not be able to recover
>>>>> and write out a type identifier name for per-module summaries as part of ^1
>>>>> in your example (well, we could in principle, because the name is stored
>>>>> somewhere in the function's IR, but that could get complicated).
>>>>>
>>>>
>>>> Ah ok. I guess the top-level map then is generated by the regular LTO
>>>> portion of the link (since it presumably requires IR during the Thin Link
>>>> to get into the combined summary)?
>>>>
>>>
>>> Yes, we fill in the map during the LowerTypeTests and WholeProgramDevirt
>>> passes in the regular LTO part of the link, e.g. here:
>>>  http://llvm-cs.pcc.me.uk/lib/Transforms/IPO/LowerTypeTests.cpp#823
>>>
>>>
>>>> Probably the easiest thing to do is to keep the type identifiers as
>>>>> GUIDs in the function summaries and write out the mapping of type
>>>>> identifiers as a top-level entity.
>>>>>
>>>>
>>>> To confirm, you mean during the compile step create a top-level entity
>>>> that maps GUID -> identifier?
>>>>
>>>
>>> I mean that you could represent this with something like:
>>>
>>> ^typeids = {^1, ^2, ^3}
>>> ^1 = typeid: {identifier: typeid1, ...}
>>> ^2 = typeid: {identifier: typeid2, ...}
>>> ^3 = typeid: {identifier: typeid3, ...}
>>>
>>> There's no need to store the GUIDs here because they can be computed
>>> from the type identifiers. The GUIDs would only be stored in the typeTests
>>> (etc.) fields in each function summary.
>>>
>>
>> I suppose we don't need to store the GUIDs at the top level in the
>> in-memory summary. But I think it would be good to emit the GUIDs in the
>> typeid assembly entries because it makes the association in the assembly
>> much more obvious. I.e. going back to my original example:
>>
>> ^1 = typeid: {guid: 12345, identifier: name_of_type, …
>> ^2 = gv: {... {function: {.... typeTests: {^1, …
>>
>> If we didn't include the GUID in the typeid entry, but rather just the
>> identifier, and put the GUID in the typeTest list in the GV's entry, it
>> wouldn't be obvious at all from the assembly listing which typeid goes with
>> which typeTest. It's also less compact to include the GUID in each
>> typeTests list.
>>
>
> I get that, but my point was that in a per-module summary the TypeIdMap is
> empty, so there will be no names, only GUIDs.
>
> For "making the association more obvious" we might just want to have the
> assembly writer emit the GUID of a name as a comment.
>

This is what I have done in the disassembly for the TypeIdMap in a combined
index.
The per-module summaries print the full GUID in the TypeIdInfo entries.
However, since we do have the TypeIdMap in the combined summary, to make
the disassembly more compact and make the association more obvious, I went
ahead and used the associated typeid slot number in the TypeIdInfo entries
instead of the full GUID. You can see how that looks in the change to the
tests in D46700. Let me know what you think.

>
>
>> Or perhaps we are saying the same thing - I can't tell from your above
>> example if the GUID is also emitted in the "typeid:" entries.
>>
>
> No, it wouldn't be.
>
> I'm not sure there is a need for the:
>> ^typeids = {^1, ^2, ^3}
>> We can just build the typeids list on the fly as " = typeid: " entries
>> are read in.
>>
>
> That's true. Given that nothing actually needs to refer to them, we can
> just represent the typeids as something like
> typeid: {identifier: typeid1, ...} ; guid = 123
> typeid: {identifier: typeid2, ...} ; guid = 456
> typeid: {identifier: typeid3, ...} ; guid = 789
> without an associated number.
>
> Peter
>
>
>
>>
>>
>> Teresa
>>
>>
>>
>>> Peter
>>>
>>>>
>>>> --
Teresa Johnson |  Software Engineer |  tejohnson at google.com |  408-460-2413
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180510/00176762/attachment.html>


More information about the llvm-dev mailing list