[PATCH] D42279: [WebAssembly] Don't duplicate strings in SYM_INFO subsection

Fri Jan 19 10:53:25 PST 2018

ncw added a comment.

Yeah, I can see why it's scary, because it involves changing the way things are actually modelled. But I find it easier to reason about the linker once #5 has landed.

We have Symbols, written out by Clang (WasmObjectWriter) and LLD (with `--relocatable`), and in Clang we have the duplicated-import hack where some symbols are written as both an import and an export - but LLD doesn't write out relocatable objects that hack, so there's disagreement between the tools, so we can't compare the outputs to check that "linking" a single object file with `--relocatable` leaves it unchanged.

And then we have to have special code in LLD (WasmObjectFile) when it reads in the object files to recover the symbol table from the imports and exports - but that's not documented in Linking.md and it doesn't make it clear how the indexes are actually worked out unless I think hard about it. And so Symbols don't have a unique index that's easy to reason about.

It comes down to this confusion between indexes into the array of functions and indexes into the array of symbols - but the symbols in a C file don't match the function bodies in a C file, it's just a lucky fluke for simple files. Once we cleanly separate the concept of function/global bodies from the concept of symbols, the linking model (should) become simpler.

This change is switching from referencing symbols by name to referencing them by index, but we haven't got a straightforward concept yet of symbol index. It's a shame to effectively increase the number of places where we're abusing symbols, which drives up the number of lines of diff in #5 (if it lands after this).

To me, "don't duplicate symbol strings" is very low priority, it just reduces filesize a bit. But getting the concepts straight should benefit the codebase throughout.

That's assuming that #5 does actually conceptually improve things. But #5 isn't even my idea! Having each Symbol represented by one import or export is how it was supposed to be, it's simply that the relocation indexes weren't being calculated correctly - in a sense #5 is conservative change that's just trying to get back to the original, simple, model.

Repository:
  rL LLVM

https://reviews.llvm.org/D42279