[PATCH] D137657: [DWARFLibrary] Add support to re-construct cu-index

Wed Nov 9 17:49:23 PST 2022

ayermolo added inline comments.

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFContext.cpp:819-820
+      // to do anything.
+      if (Header.getVersion() == 4 && type == IndexType::TUIndex)
+        break;
+
----------------
dblaikie wrote:
> dblaikie wrote:
> > ayermolo wrote:
> > > dblaikie wrote:
> > > > ayermolo wrote:
> > > > > dblaikie wrote:
> > > > > > I'm a bit more worried about implementing this for DWARFv4 due to needing to rebuild .debug_abbrev sections together, which is less reliable/guaranteed (there's no guarantee that the abbrev contributions are written in the same order as the .debug_info sections - though it's the case in reality I guess) than .debug_info parsing.
> > > > > > 
> > > > > > Any chance this workaround can be restricted to only DWARFv5?
> > > > > I would prefer to keep it more generic and include DWARF4. Part of the reason is that we still heavily rely on DWARF4.
> > > > > One thing we can do is that if DWOID is "garbage". As in when we populate CU Index and can't find the signature in a map we stop.
> > > > > With the current implementation we can just clear the map and parse CU/TU index as before. If we move implementation of updating contributions entires after parsing we just stop updating further. I think with this approach we can handle the common case of how things are now in reality, and on of chance the assumption fails we are no worse than we were before. Although it does assume that with wrong abbrev. getDWOID doesn't crash.... What do you think?
> > > > maybe a way to make this more robust for DWARFv4 would be to parse the index as-written first, and using the index entries to find the suitable .debug_abbrev contribution for the CU (found via the manual walk, associated via the DWO ID)
> > > > 
> > > > As for the fallback behavior in case of "garbage" DWO IDs - given how tenuous all this already is, perhaps we should stop/error if that happens, rather than producing corrupted data?
> > > Sorry, not sure I follow. When we manually walk there is implicit assumption that .debug_info.dwo section and corresponding .debug_abbrev.dwo are in the same order. Thus we can use abbrev to parse unit die to get DWO ID (since this is DWARF4 and it's one of the attributes). Then use DWO ID to figure out which contribution we need to modify.
> > > 
> > > When we parse Index CUs are in the order in which DWOIDs got hashed into the table. Since .debug_info offsets column is corrupt (at least for some), we still can't map the abbrev section to the corresponding debug section.
> > > 
> > > The sentiment from users is better to have some/most of debug information than none. Which is why we have dwp with over 4GB of debug info even thought in trunk it's an error in llvm-dwp....
> > > 
> > Ah, right, right - circular problem. Sorry.
> > 
> > I'm still a bit worried about the complexity of supporting v4 here... not sure it's the right thing to do.
> I guess we could at least validate that the abbrev offset was probably correct - once we have the DWO ID we could look up the index entry, find the abbrev section offset, and confirm it was the one we used?
We could. For the case where we use wrong abbrev, interpret some random 8 bytes as DWO ID, and they happen to be same ones as a valid signature?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D137657/new/

https://reviews.llvm.org/D137657