[PATCH] D42937: [DWARF] Make llvm-dwp handle DWARF v5 string offsets tables and indexed strings.

Mon Feb 26 17:53:51 PST 2018

probinson added inline comments.

================
Comment at: string_offsets_mixed.s:40-46
+# With DWARF v5 we need to match up CUs with their corresponding string offsets table
+# contributions (in order to properly process the contribution headers) and hence need
+# to remap and write them unit-by-unit to the output file. The pre-v5 implementation
+# did not need to worry about this because it could just rewrite the input file's string
+# offsets table in one chunk. The test therefore ensures that llvm-dwp rewrites the
+# string offsets table contributions in the order they appear in the input file section,
+# and not in the order in which their corresponding CUs appear in the index table.
----------------
dblaikie wrote:
> wolfgangp wrote:
> > dblaikie wrote:
> > > Oh... I hadn't realized/understood/thought about this.
> > > 
> > > That's kind of awkward - mixing blobs with headers and blobs without headers in the same section (str_offsets) & then having to use the CU/TUs to disambiguate/dictate how to parse those chunks.
> > > 
> > > Can we avoid that? Could we just say v4 and v5 are incompatible? Have a flag or something that checks.
> > > 
> > > Then we could always walk the str_offsets alone, either expecting header'd sections (which I hope are self descriptive - once you know you're reading a v5 str_offsets, you don't need to consult the CU for anything to do that?) or non-header'd sections (where you just treat every word as a string offset without consideration for how those are divided into contributions)
> > > 
> > > 
> > > (Ah, I see you mentioned/highlighted that in the patch description too)
> > > That's kind of awkward - mixing blobs with headers and blobs without headers in the same section (str_offsets) & then having to use the CU/TUs to disambiguate/dictate how to parse those chunks.
> > > 
> > > Can we avoid that? Could we just say v4 and v5 are incompatible? Have a flag or something that checks.
> > > 
> > That would have been the simpler solution, but llvm-dwarfdump is already handling the same thing, so I thought it would be inconsistent if llvm-dwp didn't handle it too. We also would have to reject any dwp input files with mixed units that were created by non-llvm tools.
> > 
> > It would certainly be easy to reject any mixing of v5 and v4 units. 
> > Pity, though, since it's already working...
> I'd be happy to hear some other people's opinions on this (Adrian & Paul?).
> 
> It seems pretty unfortunate to have this split between versions making a bunch of complexity to support like this.
The sticking point for me is whether there are objects in the wild that use GNU-style .debug_str_offsets, and might get linked with proper v5 .debug_str_offsets.  The linker will combine them without a second thought, and it's unreasonable to ask a linker to verify.
I guess... given that you need to enable this stuff explicitly for gcc with DWARF v4, and the point is to reduce relocations which is an irrelevant consideration for fission, it's maybe not so likely that we have objects like this in the wild.  That makes it reasonable to put our collective foot down and say llvm-dwp won't mix-n-match.

I do think it's worthwhile for llvm-dwarfdump to handle the mixed case, partly because it's already done and partly because it's a different kind of tool (diagnostic/analysis rather than production).

https://reviews.llvm.org/D42937