[llvm] [DebugInfo] Add fast path for parsing DW_TAG_compile_unit abbrevs (PR #108757)

David Blaikie via llvm-commits llvm-commits at lists.llvm.org
Tue Sep 17 11:31:44 PDT 2024


================
@@ -34,36 +34,49 @@ bool DWARFDebugInfoEntry::extractFast(const DWARFUnit &U, uint64_t *OffsetPtr,
     return false;
   }
   assert(DebugInfoData.isValidOffset(UEndOffset - 1));
+  AbbrevDecl = nullptr;
+
   uint64_t AbbrCode = DebugInfoData.getULEB128(OffsetPtr);
   if (0 == AbbrCode) {
     // NULL debug tag entry.
-    AbbrevDecl = nullptr;
     return true;
   }
-  const auto *AbbrevSet = U.getAbbreviations();
-  if (!AbbrevSet) {
-    U.getContext().getWarningHandler()(
-        createStringError(errc::invalid_argument,
-                          "DWARF unit at offset 0x%8.8" PRIx64 " "
-                          "contains invalid abbreviation set offset 0x%" PRIx64,
-                          U.getOffset(), U.getAbbreviationsOffset()));
-    // Restore the original offset.
-    *OffsetPtr = Offset;
-    return false;
+
+  // Fast path: parsing the entire abbreviation table is wasteful if we only
+  // need the unit DIE (typically AbbrCode == 1).
+  if (1 == AbbrCode) {
----------------
dwblaikie wrote:

> > or change abbrev parsing to be lazy in general?
> 
> I have considered this, but this would likely pessimize the general case.

Ah, sorry, not /that/ lazy. Like we could store the existing vector of abbrevs, and an offset to past the end of the last abbrev we parsed (or some sentinel if we reached the end of the abbrev list, with its 0 marker). Then if an abbrev number is requested that isn't in the list already, we parse abbrevs (& put them in the vector, etc) until we find the one we're looking for.

But I think the only place I'm thinking of that'd benefit from lazy abbrev parsing is just for the CU abbrev too anyway (building an address lookup table when the DWARF doesn't include .debug_aranges - currently that involves parsing all the abbrevs, and should only need the CU's abbrev too)

& also there's some code in llvm-dwp that tries to parse just the first CU to access the dwo_id for DWARFv4. Though I'm not sure that code's using libDebugInfoDWARF at all, since it mostly doesn't need to parse DWARF.

I guess it'd help to be more generalized laziness would handle cases where the CU DIE abbrev isn't the first one (like with type units, but those aren't used on MacOS, so aren't relevant to ld64 perf).

Why is ld64 reading the CUs anyway? 



https://github.com/llvm/llvm-project/pull/108757


More information about the llvm-commits mailing list