[PATCH] D97657: [lld][WebAssembly] Initial support merging string data

Thu Apr 29 18:00:50 PDT 2021

sbc100 added inline comments.

================
Comment at: lld/wasm/Driver.cpp:388
   config->mapFile = args.getLastArgValue(OPT_Map);
-  config->optimize = args::getInteger(args, OPT_O, 0);
+  config->optimize = args::getInteger(args, OPT_O, 1);
   config->outputFile = args.getLastArgValue(OPT_o);
----------------
dschuff wrote:
> this means we optimize by default now?
Sorry I should have mentioned that in the description.    This doesn't change any existing behaviour since there were now users of `config->optimize` until now(!).   And yes we optimize by default, like the ELF linker.   Note that this is different to `-lto-O2` which sets the LTO level.

================
Comment at: lld/wasm/InputChunks.cpp:477
+// Note that this function is called from parallelForEach. This must be
+// thread-safe (i.e. no memory allocation from the pools).
+void MergeInputSegment::splitIntoPieces() {
----------------
dschuff wrote:
> just to be clear, each thread has a different `this` object? When this function calls `splitStrings`, the emplaceBack will definitely allocate.
Again this is just cargo culted over.. I'm not sure what this comments means by "pools" but I think it must be refering to some linker specific pools, as opposed to malloc in general, which splitStrings clearly does.

================
Comment at: lld/wasm/InputChunks.cpp:490
+
+  // If Offset is not at beginning of a section piece, it is not in the map.
+  // In that case we need to  do a binary search of the original section piece
----------------
dschuff wrote:
> is the map checked somewhere else?
This code comes directly from `lld/ELF/InputSection.cpp` ... I honestly didn't spend time trying to grok it its just cargo cults over :-/

================
Comment at: lld/wasm/OutputSegment.cpp:42
+  LLVM_DEBUG(llvm::dbgs() << "finalizeInputSegments: " << name << "\n");
+  std::vector<SyntheticMergedDataSegment *> mergeSegments;
+  std::vector<InputSegment *> newSegments;
----------------
dschuff wrote:
> the name `mergeSegments` is slightly confusing given (IIUC) that "merge segments" is used elsewhere to refer to segments before merging and here they are already-merged segments.
Ah yes, perhaps `mergedSegments` is better.

================
Comment at: lld/wasm/OutputSegment.cpp:42
+  LLVM_DEBUG(llvm::dbgs() << "finalizeInputSegments: " << name << "\n");
+  std::vector<SyntheticMergedDataSegment *> mergeSegments;
+  std::vector<InputSegment *> newSegments;
----------------
sbc100 wrote:
> dschuff wrote:
> > the name `mergeSegments` is slightly confusing given (IIUC) that "merge segments" is used elsewhere to refer to segments before merging and here they are already-merged segments.
> Ah yes, perhaps `mergedSegments` is better.
Actually this also comes from ELF which has `std::vector<MergeSyntheticSection *> mergeSections;`   but they have slightly different class naming going on.

================
Comment at: lld/wasm/OutputSegment.cpp:54
+
+    auto i = llvm::find_if(mergeSegments, [=](SyntheticMergedDataSegment *seg) {
+      return seg->flags == ms->flags && seg->alignment == ms->alignment;
----------------
dschuff wrote:
> this capture could probably just be `ms` instead of `=`
I copied this from `lld/ELF/OutputSections.cpp`.   Is there a runtime difference are you suggesting that for readability?

================
Comment at: llvm/include/llvm/MC/MCSectionWasm.h:44
+  // Bitfield of WasmSegmentFlag
+  unsigned Flags;
+
----------------
dschuff wrote:
> I'm a bit fuzzy now on segments and sections. Is the reason we're calling these "segment flags" here and elswhere, just because we are using segments in the wasm binary to represent them?
Indeed, this flag and the two above apply only to data segments.. maybe I can make that more clear, at least in comments.

================
Comment at: llvm/lib/MC/MCObjectFileInfo.cpp:798
+  DwarfStrSection = Ctx->getWasmSection(
+      ".debug_str", SectionKind::getMetadata(), wasm::WASM_SEG_FLAG_STRINGS);
   DwarfLocSection =
----------------
dschuff wrote:
> So this sets the section flag as a string section, but we just don't merge them yet?
Yes, currently this will not get used anywhere, or written the binary format.  If you prefer I can drop this part of the change and add it in the followup.

================
Comment at: llvm/lib/MC/MCParser/WasmAsmParser.cpp:107
       default:
-        return Parser->Error(getTok().getLoc(),
-                             StringRef("Unexepcted section flag: ") + FlagStr);
+        return -1U;
       }
----------------
dschuff wrote:
> Why this change? I don't see any new callers of this function, and the error message is worse now.
> 
> Edit: actually, does `TokError` print the token that it read? if so, then maybe it's no worse.
Mostly for consistency with ELFAsmParser.cpp that uses this pattern.

================
Comment at: llvm/lib/MC/MCParser/WasmAsmParser.cpp:165
+    if (Flags == -1U)
+      return TokError("unknown flag");

----------------
dschuff wrote:
> This should maybe clarify that it's a section flag (rather than the generic "flag")?
I copied these lines directly from ELFAsmParser.cpp so I'm fine with this as is.   When the user sees this error it also shows them the location on the line where it happens so it pretty obvious.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D97657/new/

https://reviews.llvm.org/D97657