<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, May 27, 2016 at 8:48 AM, Rafael Espíndola <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">This is about <a href="https://llvm.org/bugs/show_bug.cgi?id=27551" rel="noreferrer" target="_blank">https://llvm.org/bugs/show_bug.cgi?id=27551</a>.<br>

<br>

Currently there is no easy way to get symbol information out of<br>

bitcode files. One has to read the module and mangle the names. This<br>

has a few problem<br></blockquote><div><br></div><div>This would be great for ThinLTO as well:</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

* During lto we have to create the Module earlier.<br></blockquote><div><br></div><div>During the ThinLink step we could avoid creating the Module altogether, only the parallel backends would need the Module.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

* There is no convenient spot to store flags/summary.<br></blockquote><div><br></div><div>Right now we are duplicating some info like the linkage type into the summary since it isn't available in the ValueSymbolTable (which I assume this would subsume?)</div><div><br></div><div>Thanks,</div><div>Teresa</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

* Simpler tools like llvm-nm have massive dependencies because Object<br>

depends on MC to find asm defined symbols.<br>

<br>

To fix this I think we need a symbol table. The desired properties are<br>

<br>

* Include the *final* name of symbols (_foo, not foo).<br>

* Not be compressed so that be can keep StringRefs to the names.<br>

* Be easy to parse without a LLVMContext.<br>

* Include names created by inline assembly.<br>

* Include other information a linker or nm would want: linkage,<br>

visbility, comdat<br>

<br>

The first question is: where should we store it? Some options I thought about:<br>

<br>

* Use the existing support for putting bitcode in a section of a<br>

native file and use the file's symbol table.<br>

* Use a custom wrapper over the .bc<br>

* Encode it with records/blocks in the .bc<br>

<br>

The first option would be a bit annoying as we are sure to want to<br>

represent more than the native files have. It is also a bit odd for<br>

cross compiling. Do we create a MachO when the bitcode is for darwin<br>

and an ELF when it is for Linux? It would also mean that llvm-as would<br>

depend on a library to create these files.<br>

<br>

The second option is tempting for parsing simplicity, but introduces<br>

duplication as the names for regular global values would be stored<br>

twice (once mangled, once not). The symbol table would also use a<br>

string table, which is a concept I think would improve the .bc format.<br>

<br>

So my current preference is for the last one. Encode the symbol table<br>

in the .bc. This means that lib/Object will depend on BitReader, but<br>

not more than that.<br>

<br>

The next issue is what to do with .ll files. One option is to change<br>

nothing and have llvm-as parse module level inline asm to crete symbol<br>

entries. That would work, but sounds odd. I think we need directives<br>

in the .ll so that symbols created or used by inline asm can be<br>

declared.</blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

Yet another issue is how to handle a string table in .bc. The problem<br>

is not with the format, it is with StreamingMemoryObject. We have to<br>

keep the string table alive while the rest of the file is read, and<br>

the StreamingMemoryObject can reallocate the buffer.<br>

<br>

I can think of two solutions<br>

<br>

* Drop it. The one known user is PNaCl and it is moving to subzero, so<br>

it is not clear if this is still needed.<br>

<br>

* Change the representation so that each read is required to be<br>

contiguous and not be freed. It would basically store a vector of<br>

std::pair<offset, char*> and we would make sure the string table is<br>

read as a blob in a single read.<br>

<br>

With all that sorted, I think the representation can be fairly simple:<br>

<br>

* a top level record stores the string table as a single blob. This<br>

can be used for any string in the .bc, not just the symbol table.<br>

* a sub block contains the symbol table with one record per symbol. It<br>

would include an offset in the string table, the name size, the<br>

linkage, etc. Being a record makes it easy to extend.<br>

<br>

Cheers,<br>

Rafael<br>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>

</blockquote></div><br><br clear="all"><div><br></div>-- <br><div><span style="font-family:Times;font-size:medium"><table cellspacing="0" cellpadding="0"><tbody><tr style="color:rgb(85,85,85);font-family:sans-serif;font-size:small"><td nowrap style="border-top-style:solid;border-top-color:rgb(213,15,37);border-top-width:2px">Teresa Johnson |</td><td nowrap style="border-top-style:solid;border-top-color:rgb(51,105,232);border-top-width:2px"> Software Engineer |</td><td nowrap style="border-top-style:solid;border-top-color:rgb(0,153,57);border-top-width:2px"> <a href="mailto:tejohnson@google.com" target="_blank">tejohnson@google.com</a> |</td><td nowrap style="border-top-style:solid;border-top-color:rgb(238,178,17);border-top-width:2px"> <a href="tel:408-460-2413" value="+14084602413" target="_blank">408-460-2413</a></td></tr></tbody></table></span></div>

</div></div>