[llvm-dev] [RFC] Thoughts on a bitcode symbol table
Teresa Johnson via llvm-dev
llvm-dev at lists.llvm.org
Tue May 31 07:27:57 PDT 2016
On Fri, May 27, 2016 at 8:48 AM, Rafael Espíndola <llvm-dev at lists.llvm.org>
> This is about https://llvm.org/bugs/show_bug.cgi?id=27551.
> Currently there is no easy way to get symbol information out of
> bitcode files. One has to read the module and mangle the names. This
> has a few problem
This would be great for ThinLTO as well:
> * During lto we have to create the Module earlier.
During the ThinLink step we could avoid creating the Module altogether,
only the parallel backends would need the Module.
> * There is no convenient spot to store flags/summary.
Right now we are duplicating some info like the linkage type into the
summary since it isn't available in the ValueSymbolTable (which I assume
this would subsume?)
> * Simpler tools like llvm-nm have massive dependencies because Object
> depends on MC to find asm defined symbols.
> To fix this I think we need a symbol table. The desired properties are
> * Include the *final* name of symbols (_foo, not foo).
> * Not be compressed so that be can keep StringRefs to the names.
> * Be easy to parse without a LLVMContext.
> * Include names created by inline assembly.
> * Include other information a linker or nm would want: linkage,
> visbility, comdat
> The first question is: where should we store it? Some options I thought
> * Use the existing support for putting bitcode in a section of a
> native file and use the file's symbol table.
> * Use a custom wrapper over the .bc
> * Encode it with records/blocks in the .bc
> The first option would be a bit annoying as we are sure to want to
> represent more than the native files have. It is also a bit odd for
> cross compiling. Do we create a MachO when the bitcode is for darwin
> and an ELF when it is for Linux? It would also mean that llvm-as would
> depend on a library to create these files.
> The second option is tempting for parsing simplicity, but introduces
> duplication as the names for regular global values would be stored
> twice (once mangled, once not). The symbol table would also use a
> string table, which is a concept I think would improve the .bc format.
> So my current preference is for the last one. Encode the symbol table
> in the .bc. This means that lib/Object will depend on BitReader, but
> not more than that.
> The next issue is what to do with .ll files. One option is to change
> nothing and have llvm-as parse module level inline asm to crete symbol
> entries. That would work, but sounds odd. I think we need directives
> in the .ll so that symbols created or used by inline asm can be
> Yet another issue is how to handle a string table in .bc. The problem
> is not with the format, it is with StreamingMemoryObject. We have to
> keep the string table alive while the rest of the file is read, and
> the StreamingMemoryObject can reallocate the buffer.
> I can think of two solutions
> * Drop it. The one known user is PNaCl and it is moving to subzero, so
> it is not clear if this is still needed.
> * Change the representation so that each read is required to be
> contiguous and not be freed. It would basically store a vector of
> std::pair<offset, char*> and we would make sure the string table is
> read as a blob in a single read.
> With all that sorted, I think the representation can be fairly simple:
> * a top level record stores the string table as a single blob. This
> can be used for any string in the .bc, not just the symbol table.
> * a sub block contains the symbol table with one record per symbol. It
> would include an offset in the string table, the name size, the
> linkage, etc. Being a record makes it easy to extend.
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
Teresa Johnson | Software Engineer | tejohnson at google.com | 408-460-2413
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev