[PATCH] D69847: DWARFDebugLoc(v4): Add an incremental parsing function

Wed Nov 20 10:48:59 PST 2019

dblaikie added a comment.

In D69847#1753190 <https://reviews.llvm.org/D69847#1753190>, @labath wrote:

> I'm sorry, but I'm confused. Are you saying that the `.debug_loc.dwo` (i.e. non-standardized dwarf4 fission) location lists should be terminated with a classical `.debug_loc` double-zero entry ? That doesn't seem right -- the .debug_loc.dwo lists are in reality very different from `.debug_loc` (they use DW_LLE_*** constants and everything), and are really more similar to `.debug_loclists(.dwo)` than .debug_loc (though there are some subtle differences). It also doesn't seem to be what gcc is doing now -- this is a typical location list I get from gcc-9 -gsplit-dwarf (comments added by me):
>
>           .section        .debug_loc.dwo,"e", at progbits
>   .Ldebug_loc0:
>   .LVUS0: # gcc "view" stuff -- ignore that
>           .uleb128 .LVU2
>           .uleb128 0
>   .LLST0: # DW_AT_location points here
>           .byte   0x3                  # DW_LLE_startx_length
>           .uleb128 0x1                 # index
>           .long   .LFE0-.LVL0          # length
>           .value  0x5                  # expression length
>           .byte   0x75
>           .sleb128 0
>           .byte   0x31
>           .byte   0x24
>           .byte   0x9f
>           .byte   0                # DW_LLE_end_of_list
>   .LVUS1: # next "view" starts here
>
>
> In practice, changing the `.byte 0` to two word-sized zeroes probably won't change anything, as the first byte of that will terminate the location list anyway. However, I don't think that's the right thing to do, and I think llvm-dwarfdump is right to display a bunch of end_of_list entries for something like that.

Hmm, I'm not seeing these extra ulebs in GCC's output in this example: https://godbolt.org/z/x5RvMY - am I missing something in the reproduction or not reading it accurately?

================
Comment at: llvm/test/tools/dsymutil/X86/basic-lto-linking-x86.test:140-141
 CHECK:      0x00000025:
-CHECK-NEXT:              [0x0000000000000000, 0x000000000000000f): DW_OP_reg5 RDI, DW_OP_piece 0x4
-CHECK-NEXT:              [0x0000000000000019, 0x000000000000001d): DW_OP_reg5 RDI, DW_OP_piece 0x4
+CHECK-NEXT:              (0x0000000000000000, 0x000000000000000f): DW_OP_reg5 RDI, DW_OP_piece 0x4
+CHECK-NEXT:              (0x0000000000000019, 0x000000000000001d): DW_OP_reg5 RDI, DW_OP_piece 0x4

----------------
labath wrote:
> One thing I'm not sure of it's the syntax of these "raw" entries. The `()` syntax is sort of similar to the `DW_LLE(x,y)` syntax, but at same time, it's easy to miss that one is looking at the "raw" dump as the difference is only one character. OTOH, the v4 loclists don't have any fancy encodings, so the "raw" format is nearly the same as the "interpreted" format anyway...
> 
> Maybe we could use curly brackets here or something?
Few thoughts:

* I think I mentioned in a related review that we could use the same thing as the range dumping - that includes the section name/number (though that's only done in verbose mode? And wouldn't help with pure offset-pair/base-relative dumping in the absence of a base address anyway - because there's no way to know the section...)

* {} sound OK to me (maybe we could change the DW_LLE to that too, since the (), espceially with the whitespace between the LL and the '(' are a bit ambiguous too, I think? But I'm not sure/don't have a strong leaning either way there.

* Could write something like "base + " before the pair in cases where the base is unknown - but I guess that means most traditional location lists (ones that don't use any base addresses/constant zero base address) would print with "base + " which isn't super great

So... "{}" I guess? It'd be good to look at v4 range dumping too & come up with a decision we're OK with for both cases, I think. 

Hmm, I am sort of coming back to maybe printing these as [x, y) isn't /so/ bad. In the unrelocated object file we would have the section info (because we'd have relocations) maybe we could print that like the verbose range dumping but even in non-verbose mode when section dumping & not having the base address? & we could ignore the base address part... that's how we print v4 range lists currently? Either in objects where the CU has a relocated base address, or in linked executables where the range list no longer contains relocations (but also doesn't contain apparently overlapping ranges all starting at zero - because now they're all relocated). I guess we could print "base + " as a separate line at the start of the list to make it clear that they're all relative to any potential CU base...

Eh, would love to hear other people's ideas/preferences here. How much shuold the dumping output be unambiguous/? How much is skipping things like "base+" useful to us folks who already work on DWARF and understand that the code we're looking at doesn't start at address zero, so of course ranges starting at zero must be either unresolved relocations (well, in that case it's easy to print something nice about the section) or relative to a base address?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69847/new/

https://reviews.llvm.org/D69847