[llvm-dev] RFC: Adding a string table to the bitcode format

Teresa Johnson via llvm-dev llvm-dev at lists.llvm.org
Tue Apr 4 07:46:21 PDT 2017


On Tue, Apr 4, 2017 at 7:41 AM, Mehdi Amini <mehdi.amini at apple.com> wrote:

>
> On Apr 4, 2017, at 7:37 AM, Teresa Johnson <tejohnson at google.com> wrote:
>
>
>
> On Mon, Apr 3, 2017 at 8:13 PM, Mehdi Amini <mehdi.amini at apple.com> wrote:
>
>>
>> On Apr 3, 2017, at 7:08 PM, Peter Collingbourne <peter at pcc.me.uk> wrote:
>>
>> Hi,
>>
>> As part of PR27551 I want to add a string table to the bitcode format to
>> allow global value and comdat names to be shared with the proposed symbol
>> table (and, as side effects, allow comdat names to be shared with value
>> names, make bitcode files more compressible and make bitcode easier to
>> parse). The format of the string table would be a top-level block
>> containing a blob containing null-terminated strings [0] similar to the
>> string table format used in most object files.
>>
>>
>>
>> I’m in favor of this, but note that currently string can be encoded with
>> less than 8 bits / char in some cases (there might some size increase
>> because of this).
>> That said we already paid this with the metadata table in the recent past
>> for example.
>>
>> The format of MODULE_CODE_{FUNCTION,GLOBALVAR,ALIAS,IFUNC,COMDAT}
>> records would change so that their first operand would specify their names
>> with a byte offset into the string table. (To allow for backwards
>> compatibility, I would increment the bitcode version.)
>>
>>
>> I assume you mean the EPOCH?
>>
>> Here is what it would look like as bcanalyzer output:
>>
>> <MODULE_BLOCK>
>>   <VERSION op0=2>
>>   <COMDAT op0=0 ...> ; name = foo
>>   <FUNCTION op0=0 ...> ; name = foo
>>   <GLOBALVAR op0=4 ...> ; name = bar
>>   <ALIAS op0=8 ...> ; name = baz
>>  ; function bodies, etc.
>> </MODULE_BLOCK>
>> <STRTAB_BLOCK>
>>   <STRTAB_BLOB blob="foo\0bar\0baz\0">
>> </STRTAB_BLOCK>
>>
>>
>> Why is the string table after the module instead of before?
>>
>>
>> Each STRTAB_BLOCK would apply to all preceding MODULE_BLOCKs. This means
>> that bitcode files can continue to be concatenated with "llvm-cat -b".
>>
>> Do you mean "apply to all preceding MODULE_BLOCKs that aren't followed by
> an intervening STRTAB_BLOCK"? I.e. when bitcode files are concatenated you
> presumably don't want to apply a STRTAB_BLOCK to a MODULE_BLOCK from a
> different input bitcode file that has its own STRTAB_BLOCK.
>
>> (Normally bitcode files would contain a single string table, which in
>> multi-module bitcode files would be shared between modules.)
>>
>> This *almost* allows us to remove the global (top-level) VST entirely, if
>> not for the function offset in the FNENTRY record. However, this offset is
>> not actually required because we can scan the module's FUNCTION_BLOCK_IDs
>> as we were doing before http://reviews.llvm.org/D12536 (this may have a
>> performance impact, so I'll measure it first).
>>
>> Assuming that performance looks good, does this seem reasonable to folks?
>>
>>
>>
>> I rather seek to have a symbol table that entirely replace the VST, kee.
>> If there is a perf impact with the FNENTRY offset, why can’t it be
>> replicated in the symbol table?
>>
>
> Won't the new symbol table be added before the top-level VST can be
> removed, i.e. you need the linkage types etc right? In that case, can the
> offset just be added to the new symbol table? That would be more analogous
> to object file symbol tables which also have an offset anyway.
>
>
> I’m not sure I read you correctly, isn’t it what I suggested?
>

It is - I'm just wondering why we would consider removing the offset since
other things have to be moved from the VST to a new symbol table anyway.
I.e., confused by pcc's comment that this " *almost* allows us to remove
the global (top-level) VST entirely, if not for the function offset in the
FNENTRY record" - there are currently other things in the VST that we need
to a new symbol before it can be removed, and I'm not sure why this is any
different.

Teresa


>> Mehdi
>
>


-- 
Teresa Johnson |  Software Engineer |  tejohnson at google.com |  408-460-2413
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170404/7132f5eb/attachment.html>


More information about the llvm-dev mailing list