[llvm-dev] Question about METADATA_BLOCKs in bitcode

Tue Oct 20 18:18:04 PDT 2015

On Tue, Oct 20, 2015 at 4:51 PM, Duncan P. N. Exon Smith
<dexonsmith at apple.com> wrote:
>
>> On 2015-Oct-20, at 16:02, Teresa Johnson <tejohnson at google.com> wrote:
>>
>> I have a question about module-level METADATA_BLOCKs in the bitcode.
>> There are currently two blocks with the METADATA_BLOCK id at module
>> scope. The first has the module-level metadata values (consisting of
>> some combination of METADATA_* record codes except for METADATA_KIND).
>> The second consists only of METADATA_KIND records. The latter is used
>> only in the METADATA_ATTACHMENT block within function blocks (for
>> metadata attached to instructions).
>>
>> For ThinLTO we want to delay the parsing of module level metadata
>> until all functions have been imported from that module (there is some
>> bookkeeping used to suture it up when we read it during a post-pass).
>> However, I do need the METADATA_KIND records since that is used when
>> parsing the function body being imported. In my ThinLTO prototype I
>> was simply skipping the first METADATA_BLOCK and always parsing the
>> second, since I knew from inspecting the BitcodeWriter that the second
>> was the one with the METADATA_KIND records.
>>
>> When working on the implementation of this handling for the upstream
>> patches, I am reassessing how to do this perhaps more robustly. A few
>> options:
>>
>> 1) Should the two module-level blocks use different block ids since
>> they hold different types of records? I.e. a METADATA_BLOCK_ID
>> (holding the actual metadata) and a new METADATA_KIND_BLOCK_ID (for
>> the metadata kind records). That way it is more obvious what the two
>> separate blocks are for and it is also much more trivial to identify
>> and skip just the former.
>
> This SGTM.  The other options sound fragile (not generally correct).

Ok good, I think this is cleaner too.

>
> Note that the reader will still have to understand the old schema.

I will keep support for reading the old format (ThinLTO won't support
it, but it needs recent bitcode anyway).

Thanks,
Teresa

>
>> 2) Should I simply continue to assume that we will have two
>> METADATA_BLOCKs at module level and that the kinds will always be in
>> the latter?
>>
>> 3) Another option is to peek inside the METADATA_BLOCK to check the
>> first record's type, and if it isn't METADATA_KIND, skip the rest of
>> the block. Looks like EnterSubBlock already optionally returns the
>> size of the block in words, so I could use that to compute the bitcode
>> offset to jump to before reading the first record, in case it is the
>> wrong METADATA_BLOCK and I want to skip it.
>>
>> Thoughts?
>>
>> Thanks,
>> Teresa
>>
>> --
>> Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
>

-- 
Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413