[llvm-dev] RFC: Adding a string table to the bitcode format

Peter Collingbourne via llvm-dev llvm-dev at lists.llvm.org
Tue Apr 4 11:03:08 PDT 2017


Maybe, and I think it can be done while still allowing symbol names to be
null terminated. (Basically, we would allow strings in the string table to
be optionally null terminated, and allow null terminated strings to stand
in for non-null-terminated strings if they are present in the string table.)

I think this is something that can be done orthogonally, though.

Peter

On Mon, Apr 3, 2017 at 7:50 PM, Duncan P. N. Exon Smith <
dexonsmith at apple.com> wrote:

> Can this be merged somehow with the METADATA_STRINGS record?  (Perhaps, by
> having METADATA_STRINGS refer to this?)  I suspect there is significant
> duplication between the two.  There might be a nice space optimization here.
>
> Note that the strings in METADATA_STRINGS can contain null characters.
> You'd need a different storage format.
>
> On 2017-Apr-03, at 19:08, Peter Collingbourne <peter at pcc.me.uk> wrote:
>
> Hi,
>
> As part of PR27551 I want to add a string table to the bitcode format to
> allow global value and comdat names to be shared with the proposed symbol
> table (and, as side effects, allow comdat names to be shared with value
> names, make bitcode files more compressible and make bitcode easier to
> parse). The format of the string table would be a top-level block
> containing a blob containing null-terminated strings [0] similar to the
> string table format used in most object files.
>
> The format of MODULE_CODE_{FUNCTION,GLOBALVAR,ALIAS,IFUNC,COMDAT}
> records would change so that their first operand would specify their names
> with a byte offset into the string table. (To allow for backwards
> compatibility, I would increment the bitcode version.) Here is what it
> would look like as bcanalyzer output:
>
> <MODULE_BLOCK>
>   <VERSION op0=2>
>   <COMDAT op0=0 ...> ; name = foo
>   <FUNCTION op0=0 ...> ; name = foo
>   <GLOBALVAR op0=4 ...> ; name = bar
>   <ALIAS op0=8 ...> ; name = baz
>  ; function bodies, etc.
> </MODULE_BLOCK>
> <STRTAB_BLOCK>
>   <STRTAB_BLOB blob="foo\0bar\0baz\0">
> </STRTAB_BLOCK>
>
> Each STRTAB_BLOCK would apply to all preceding MODULE_BLOCKs. This means
> that bitcode files can continue to be concatenated with "llvm-cat -b".
> (Normally bitcode files would contain a single string table, which in
> multi-module bitcode files would be shared between modules.)
>
> This *almost* allows us to remove the global (top-level) VST entirely, if
> not for the function offset in the FNENTRY record. However, this offset is
> not actually required because we can scan the module's FUNCTION_BLOCK_IDs
> as we were doing before http://reviews.llvm.org/D12536 (this may have a
> performance impact, so I'll measure it first).
>
> Assuming that performance looks good, does this seem reasonable to folks?
>
> Thanks,
> --
> --
> Peter
>
> [0] This means that no GlobalValue or comdat name can contain a null, but
> this isn't substantially more restrictive than what we already have. The
> restriction already exists in the form of an assert for value names
> containing nulls (but not comdats) and we reject value and comdat names
> containing nulls in the textual IR parser. The COFF/ELF/MachO object
> formats do not support nulls in symbol or comdat names. The wasm format
> could in principle support nulls in symbol names, but as mentioned we
> already reject nulls early for IR symbols and wasm does not support comdats
> (yet). The missing restriction on comdat names seems to be just a bug, so
> we could assert on comdat names containing nulls as well.
>
>
>


-- 
-- 
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170404/a023453b/attachment.html>


More information about the llvm-dev mailing list