[PATCH] D41954: [WebAssembly] Refactor symbol index handling (LLVM)

Nicholas Wilson via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jan 11 15:24:14 PST 2018


ncw added a comment.

Thanks for the early feedback!

I hadn't quite finished work on it, and was going to add some reviewers tomorrow.

The tests aren't all passing, some more need to be updated.

Regarding the tests - the diffs are going to seem larger & more invasive than I would have liked. Sorry :( Part of the changes are due to (partially) undoing https://reviews.llvm.org/D41472 at the same time.

Splitting it up would be a fiddle, but might be possible. Changing the Symbol index space kind-of forces most of the changes to happen all at once; some of the changes though are stuffed in a bit gratuitously I admit.



================
Comment at: include/llvm/BinaryFormat/Wasm.h:93
 struct WasmFunction {
-  uint32_t Index;
+  uint32_t WasmIndex; // index into code array + num imported fns
   std::vector<WasmLocalDecl> Locals;
----------------
sbc100 wrote:
> How about just:
> 
> ```
> uint32_t Index;   // Function index space
> ```
> 
> There is really only one index that a function can have.
> 
> If we want to be extra clear we could also add new types for `FunctionIndex` and `GlobalIndex` I suppose.
Oops, yes, need to rename it back! I renamed every var with "Index" in to force myself to examine/audit everywhere they were used.

(In my mind there are three possible index spaces for functions: the index into `std::vector<WasmFunction>` Functions, secondly the Wasm function index space, and thirdly the function Symbol index space. They're pretty clear from the context though! This one didn't need renaming.)


================
Comment at: include/llvm/Object/Wasm.h:57
+  // index space consists of the imports followed by the exports (each of
+  // which creates a Symbol).
+  uint32_t SymbolIndex;
----------------
sbc100 wrote:
> Its a single index space right?  How about just "Index into the symbol index space, which consists of imports and exports"
I've made functions/globals have separate Symbol index spaces, just like they have separate Wasm index spaces.

My rationale:
1. Matches Wasm index space(s)
2. Makes range checks easier in dozens of places - no need to check "is < max && is correct type" when the index spaces are separate
3. No chance of type confusion - forgetting to do a type check and then accidentally using a Symbol of the wrong type. I really feel that merging the index spaces would increase the risk of bugs.


================
Comment at: include/llvm/Object/Wasm.h:165
+  uint32_t getNumExportedGlobals() const { return NumExportedGlobals; }
+  uint32_t getNumExportedFunctions() const { return NumExportedFunctions; }
 
----------------
sbc100 wrote:
> Do we need these new methods?
They're short and simple! I find it disappointing to have at least two (redundant) instances of the following code:

```
int NumImportedGlobals = 0;
int NumImportedFunctions = 0;
for (auto Import : Imports) {
  switch (Import.Type) {
  case Global: ++NumImportedGlobals; break;
  case Function: ++NumImportedFunctions; break;
  }
}
```

How verbose! Definitely worth eliminating silly counting loops; the WasmObjectFile is an ideal place to store simple information like how many imports/exports there are, so that other code doesn't have to do the counting.


================
Comment at: include/llvm/Object/Wasm.h:268
+  std::vector<WasmSymbol> FunctionSymbols;
+  std::vector<WasmSymbol> GlobalSymbols;
   std::vector<StringRef> Comdats;
----------------
sbc100 wrote:
> Do these needs to be separate?  Why not just have "Symbols"?
See above. I could be convinced (or told!) to merge them, but for the moment I think it's more robust (and shorter overall) to keep separately-typed objects separate.


================
Comment at: include/llvm/ObjectYAML/WasmYAML.h:133
 struct SymbolInfo {
-  StringRef Name;
+  ExportKind Kind;
+  uint32_t Index;
----------------
This is something I slightly regret. The need for a "Kind" field here is the one downside I can think of for keeping the function/global index space of Symbols separate.

"Index" var below needs comment.


================
Comment at: lib/MC/WasmObjectWriter.cpp:824
   write8(wasm::WASM_OPCODE_I32_CONST);
-  encodeSLEB128(0, getStream());
+  encodeSLEB128(InitialTableOffset, getStream());
   write8(wasm::WASM_OPCODE_END);
----------------
sbc100 wrote:
> Why this change?
Good catch. I wanted to make the intermediate object files as similar as possible to the final output files.

Why the odd differences? When making normal output, LLD started the Table from 1, but for relocatable output, LLD started it at zero, and WasmObjectWriter also started from zero. They should all just agree on what the right convention is, and use it.

I did consider not outputting a table at all from WasmObjectWriter (and for LLD's relocatable output). It's not used... so it's just dead code to generate something redundant. On the other hand, it's "nice" to have the object file mirror the behaviour of the LLD output. I'm still not 100% convinced "nice" is enough to justify dozens of lines of useless code however. Do you have any thoughts?


================
Comment at: lib/Object/WasmObjectFile.cpp:285
+        if (Index >= NumImportedFunctions)
+          Functions[Index].Name = Name; // Override any existing name
       }
----------------
I've decided actually to edit this.

Currently, I've changed it to get rid of the weird/unused DEBUG symbols, and generate a fresh+shiny "names" section in LLD for the browser to use. But, I should actually keep the original vector of names around, just for the time being, for wasm2yaml's sake.


================
Comment at: lib/Object/WasmObjectFile.cpp:926
+         ? GlobalSymbols[Index - FunctionSymbols.size()]
+         : FunctionSymbols[Index];
 }
----------------
sbc100 wrote:
> Again, not sure we need to separate these do we?  Seems like just makes our lives harder.  
Aha, you've pounced on the one line of code where it's extra...


================
Comment at: test/tools/llvm-nm/wasm/weak-symbols.yaml:61
     SymbolInfo:
-      - Name:            weak_global_func
+      - Kind:            FUNCTION
+        Index:           1
----------------
sbc100 wrote:
> It would probably be useful to have the symbol index here too so it clear which symbol the info is for.   I've been going this for other sections too:  https://reviews.llvm.org/D41877
> 
> I guess you would need to rename the existing Index: field then.  Perhaps to WasmIndex?
> 
The index here is already the symbol index (since it's symbol metadata!)

I need to make a corresponding PR in the Linking Conventions repo, to document what I've done (assuming this is eventually merged). Most of the linking metadata uses Symbol index space, just one or two bits of metadata use the wasm index space.


Repository:
  rL LLVM

https://reviews.llvm.org/D41954





More information about the llvm-commits mailing list