[PATCH] D32061: [wip] Bitcode: Write the irsymtab to disk.

Tue Apr 18 18:35:49 PDT 2017

mehdi_amini added inline comments.

================
Comment at: llvm/include/llvm/Object/IRSymtab.h:122
+  /// in symbol table format, symbol enumeration order and so on.
+  Str Producer;
+
----------------
pcc wrote:
> mehdi_amini wrote:
> > pcc wrote:
> > > mehdi_amini wrote:
> > > > tejohnson wrote:
> > > > > pcc wrote:
> > > > > > tejohnson wrote:
> > > > > > > Does this mean that the symbol table will be thrown away every time the llvm revision changes? Can we avoid that by having a symbol table version number?
> > > > > > That would be tricky to get right. There are many things that could change the symbol table, even without changing the format.
> > > > > > 
> > > > > > For example, a change such as D31443 would re-order the symbol table, despite not touching the bitcode reader at all.
> > > > > > 
> > > > > > As another example, suppose that we decide to remove an intrinsic from LLVM, and we upgrade existing IR by removing the intrinsic declaration (and changing calls to do something else). That by itself would change the symbol table.
> > > > > > 
> > > > > > So I think the safest option would be to accept only symbol tables from the same revision of LLVM.
> > > > > > For example, a change such as D31443 would re-order the symbol table, despite not touching the bitcode reader at all.
> > > > > 
> > > > > I guess this makes a different for regularLTO where we need to correlate the symbol resolutions built from the symbol table with the symbols in the module. Presumably it shouldn't matter for something like ThinLTO though, where in the thin link we eventually shouldn't have to look at the module ir at all. Not that this helps much since we don't know the use case. Unfortunate, but I can't think of a way to avoid it offhand, without adding some other way to correlate the module symbols to the symbol table symbols.
> > > > >So I think the safest option would be to accept only symbol tables from the same revision of LLVM.
> > > > 
> > > > But that's a design decision. The symbol table could be stored as bitcode with the same exact restriction as all the usual compatibility, and avoid any duplication with the rest of the module (i.e. not being able to regenerate the symbol table from the rest of the bitcode is *not* a bad thing to me, quite the opposite).
> > > > I'm not sure what makes a symbol table "special" here.
> > > > 
> > > Even with a bitcode design that somehow avoided all duplication between the module and the symbol table, I don't see why we wouldn't still have the same set of issues when upgrading old bitcode.
> > I don't follow.
> > 
> The ModuleSymbolTable is the "source of truth" for a module's symbol table, and the irsymtab is effectively a "cache" of the ModuleSymbolTable (an irsymtab is derived from a ModuleSymbolTable). As previously mentioned, a ModuleSymbolTable can change as a result of arbitrary changes to the compiler. The situation we want to avoid is where irsymtab::read gives us a different set of symbols than the ModuleSymbolTable, which could happen if the irsymtab was created using an old version of LLVM. We are robust against such situations by checking the revision number embedded in the bitcode file, and rebuilding the irsymtab if it does not match.
> 
> One possible way of being resistant to changes in the symbol table suggests itself: we could discard resolutions for symbols that do not appear in the ModuleSymbolTable, and make up resolutions for symbols that did not appear in the irsymtab. But that doesn't seem particularly sound, because the linker still needs to see the correct set of symbols at symbol resolution time for correctness. To go back to the example where we remove an intrinsic, if the intrinsic is upgraded by replacing it with a call to a runtime library function, this would result in a new undefined symbol in the symbol table. Although we could invent a resolution for it (non-prevailing + non-local + external would do), this new undefined symbol may require the linker to have different behaviour at symbol resolution time. For example, the new undefined symbol may require additional archive members to be loaded, which the linker may not be prepared to do once it has completed LTO. The simplest way to handle that situation is by rebuilding the irsymtab, which would ensure that the undefined symbol is available at symbol resolution time.
> The ModuleSymbolTable is the "source of truth" for a module's symbol table, and the irsymtab is effectively a "cache" of the ModuleSymbolTable (an irsymtab is derived from a ModuleSymbolTable)

Well that is exactly what I qualified right above:

> But that's a design decision. 

It does not *have to be* this way. 

https://reviews.llvm.org/D32061