[PATCH] D97657: [lld][WebAssembly] Initial support merging string data

Thu Apr 29 17:38:38 PDT 2021

dschuff added a comment.

looks pretty good overall, a lot of this is making sure I understand everything

================
Comment at: lld/wasm/InputChunks.cpp:374
+      return ms->parent->getOffset(ms->getParentOffset(offset));
+    return ms->getParentOffset(offset);
+  }
----------------
This is dead because of the assertion above.

================
Comment at: lld/wasm/InputChunks.cpp:477
+// Note that this function is called from parallelForEach. This must be
+// thread-safe (i.e. no memory allocation from the pools).
+void MergeInputSegment::splitIntoPieces() {
----------------
just to be clear, each thread has a different `this` object? When this function calls `splitStrings`, the emplaceBack will definitely allocate.

================
Comment at: lld/wasm/InputChunks.cpp:490
+
+  // If Offset is not at beginning of a section piece, it is not in the map.
+  // In that case we need to  do a binary search of the original section piece
----------------
is the map checked somewhere else?

================
Comment at: lld/wasm/InputChunks.h:42
+    DataSegment,
+    Merge,
+    MergedSegment,
----------------
Is "Merge" a segment that can be merged, and "MergedSegment" after merging? Would "MergeableSegment" better reflect what it means?

================
Comment at: lld/wasm/InputFiles.cpp:435
+
+  // On a regular link we don't merge sections if -O0 (default is -O1). This
+  // sometimes makes the linker significantly faster, although the output will
----------------
Does clang explicitly pass `-O0` to the linker to opt out, or does the user have to do that with `-Wl`?

================
Comment at: lld/wasm/OutputSegment.cpp:42
+  LLVM_DEBUG(llvm::dbgs() << "finalizeInputSegments: " << name << "\n");
+  std::vector<SyntheticMergedDataSegment *> mergeSegments;
+  std::vector<InputSegment *> newSegments;
----------------
the name `mergeSegments` is slightly confusing given (IIUC) that "merge segments" is used elsewhere to refer to segments before merging and here they are already-merged segments.

================
Comment at: lld/wasm/OutputSegment.cpp:54
+
+    auto i = llvm::find_if(mergeSegments, [=](SyntheticMergedDataSegment *seg) {
+      return seg->flags == ms->flags && seg->alignment == ms->alignment;
----------------
this capture could probably just be `ms` instead of `=`

================
Comment at: llvm/lib/MC/MCObjectFileInfo.cpp:798
+  DwarfStrSection = Ctx->getWasmSection(
+      ".debug_str", SectionKind::getMetadata(), wasm::WASM_SEG_FLAG_STRINGS);
   DwarfLocSection =
----------------
So this sets the section flag as a string section, but we just don't merge them yet?

================
Comment at: llvm/lib/MC/MCParser/WasmAsmParser.cpp:107
       default:
-        return Parser->Error(getTok().getLoc(),
-                             StringRef("Unexepcted section flag: ") + FlagStr);
+        return -1U;
       }
----------------
Why this change? I don't see any new callers of this function, and the error message is worse now.

Edit: actually, does `TokError` print the token that it read? if so, then maybe it's no worse.

================
Comment at: llvm/lib/MC/MCParser/WasmAsmParser.cpp:165
+    if (Flags == -1U)
+      return TokError("unknown flag");

----------------
This should maybe clarify that it's a section flag (rather than the generic "flag")?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97657/new/

https://reviews.llvm.org/D97657