[llvm-dev] RFC: Adding a string table to the bitcode format
Peter Collingbourne via llvm-dev
llvm-dev at lists.llvm.org
Tue Apr 4 12:21:36 PDT 2017
On Tue, Apr 4, 2017 at 7:37 AM, Teresa Johnson <tejohnson at google.com> wrote:
>
>
> On Mon, Apr 3, 2017 at 8:13 PM, Mehdi Amini <mehdi.amini at apple.com> wrote:
>
>>
>> On Apr 3, 2017, at 7:08 PM, Peter Collingbourne <peter at pcc.me.uk> wrote:
>>
>> Hi,
>>
>> As part of PR27551 I want to add a string table to the bitcode format to
>> allow global value and comdat names to be shared with the proposed symbol
>> table (and, as side effects, allow comdat names to be shared with value
>> names, make bitcode files more compressible and make bitcode easier to
>> parse). The format of the string table would be a top-level block
>> containing a blob containing null-terminated strings [0] similar to the
>> string table format used in most object files.
>>
>>
>>
>> I’m in favor of this, but note that currently string can be encoded with
>> less than 8 bits / char in some cases (there might some size increase
>> because of this).
>> That said we already paid this with the metadata table in the recent past
>> for example.
>>
>> The format of MODULE_CODE_{FUNCTION,GLOBALVAR,ALIAS,IFUNC,COMDAT}
>> records would change so that their first operand would specify their names
>> with a byte offset into the string table. (To allow for backwards
>> compatibility, I would increment the bitcode version.)
>>
>>
>> I assume you mean the EPOCH?
>>
>> Here is what it would look like as bcanalyzer output:
>>
>> <MODULE_BLOCK>
>> <VERSION op0=2>
>> <COMDAT op0=0 ...> ; name = foo
>> <FUNCTION op0=0 ...> ; name = foo
>> <GLOBALVAR op0=4 ...> ; name = bar
>> <ALIAS op0=8 ...> ; name = baz
>> ; function bodies, etc.
>> </MODULE_BLOCK>
>> <STRTAB_BLOCK>
>> <STRTAB_BLOB blob="foo\0bar\0baz\0">
>> </STRTAB_BLOCK>
>>
>>
>> Why is the string table after the module instead of before?
>>
>>
>> Each STRTAB_BLOCK would apply to all preceding MODULE_BLOCKs. This means
>> that bitcode files can continue to be concatenated with "llvm-cat -b".
>>
>> Do you mean "apply to all preceding MODULE_BLOCKs that aren't followed by
> an intervening STRTAB_BLOCK"? I.e. when bitcode files are concatenated you
> presumably don't want to apply a STRTAB_BLOCK to a MODULE_BLOCK from a
> different input bitcode file that has its own STRTAB_BLOCK.
>
Yes, sorry, that is exactly what I meant.
> (Normally bitcode files would contain a single string table, which in
>> multi-module bitcode files would be shared between modules.)
>>
>> This *almost* allows us to remove the global (top-level) VST entirely, if
>> not for the function offset in the FNENTRY record. However, this offset is
>> not actually required because we can scan the module's FUNCTION_BLOCK_IDs
>> as we were doing before http://reviews.llvm.org/D12536 (this may have a
>> performance impact, so I'll measure it first).
>>
>> Assuming that performance looks good, does this seem reasonable to folks?
>>
>>
>>
>> I rather seek to have a symbol table that entirely replace the VST, kee.
>> If there is a perf impact with the FNENTRY offset, why can’t it be
>> replicated in the symbol table?
>>
>
> Won't the new symbol table be added before the top-level VST can be
> removed, i.e. you need the linkage types etc right? In that case, can the
> offset just be added to the new symbol table? That would be more analogous
> to object file symbol tables which also have an offset anyway.
>
The VST only stores names (and function offsets). The other attributes are
stored on the MODULE_CODE_{FUNCTION,GLOBALVAR,ALIAS,IFUNC} records. So once
we move the names elsewhere, the VST isn't really storing much data at all.
As I mentioned to Mehdi, we could indeed store the function offset in the
symbol table. That would be done in a separate step to this change, which
is just about string tables.
Thanks,
--
--
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170404/8fd34f4d/attachment.html>
More information about the llvm-dev
mailing list