[PATCH] D49214: [DWARF v5] emit DWARF v5 range lists (no support for fission yet)

Tue Jul 17 10:53:11 PDT 2018

probinson added inline comments.

================
Comment at: llvm/trunk/lib/CodeGen/AsmPrinter/DwarfDebug.cpp:2092-2095
+  // FIXME: Generate the offsets table and use DW_FORM_rnglistx with the
+  // DW_AT_ranges attribute. Until then set the number of offsets to 0.
+  Asm->emitInt32(0);
+  Asm->OutStreamer->EmitLabel(RnglistTableBaseSym);
----------------
dblaikie wrote:
> probinson wrote:
> > dblaikie wrote:
> > > probinson wrote:
> > > > dblaikie wrote:
> > > > > wolfgangp wrote:
> > > > > > dblaikie wrote:
> > > > > > > To be honest, I'm not sure why the standard added this offset table - it seems to add extra bytes without any savings I can see (shorter encodings in the debug_info section - using indexes instead of offsets, but it doesn't save relocations or bytes overall, unless I'm missing something?). Meh. (@probinson - any ideas on what the purpose of this indirection is?)
> > > > > > My understanding is that DW_AT_ranges has to either use FORM_sec_offset (which needs to be relocated) or FORM_rnglistx, which is an index into the offset table. So with the offset table and FORM_rnglistx we'd be saving the relocations in .debug_info.
> > > > > Right, this certainly saves relocations - but it's not the only way that could've saved relocations.
> > > > > 
> > > > > Rather than introducing FORM_rnglistx, it could've been specified that a DW_AT_ranges value with a data (not section-relative) form is considered as a byte offset relative to the DW_AT_ranges_base value. That way there would be no need for relocations, and no need for the extra indirection table I think... unless I'm missing something.
> > > > > 
> > > > > (@probinson - any ideas why this is the way it is, with the indirection table?)
> > > > Right, the benefit is eliminating one relocation per DW_AT_ranges (all collapsed into a single relocation per CU for DW_AT_rnglists_base).  Overall it would be a slight size increase in the linked file, as each reference would cost an extra byte or two total.  Reducing the linker's effort (and what, 24 bytes in the .o per relocation?) seemed like the right tradeoff.
> > > Right - I get the desire to remove a reloc per range list reference. But I don't understand why that solution created the indirection table in debug_rnglist to go from range list indexes to range list offsets? What problem did that indirection solve compared to having range list offsets (relative to DW_AT_ranges_base or whatnot) directly in the DW_AT_ranges attributes?
> > OK I understand the question now.
> > If you want to encode the range list offset directly into DW_AT_ranges, then you must build the encoded range-list table incrementally so you know what offset to use for DW_AT_ranges.  If instead you have an index in DW_AT_ranges, you can allocate a slot in the offset table immediately and build the encoded range-list table later, and fill in the offset table when you do that.   This gives the producer more flexibility.
> > 
> > Now, if you as a producer are in fact building the encoded range-list table incrementally already, then your suggestion makes sense, and we can say that (for example) a constant form such as DW_FORM_data means  the value is directly an offset from ranges_base.  This saves relocations and also the offset table in .debug_rnglists.  File it as a proposal on dwarfstd.org if it seems like a worthwhile benefit.
> Thanks for explaining/your perspective there, Paul!
> 
> Yeah - for LLVM at least, we use assembler expressions computing label differences in many places to compute these sort of region-relative offsets, so it doesn't require incremental emission, as such... (sort of does, sort of doesn't - underlying assembly emission logic already has to handle these sort of temporarily unresolved expressions (use them, for example, for high_pc constants (end of function minus start of function), actually, maybe that's the only place we use it currently))
> 
> Once this is all implemented, perhaps we can see how much the indirection table takes up - but, yeah, I doubt it'll be a pressing issue/top of anyone's list. Just seemed like a strange inefficiency to me. I wonder if any implementation would actually need it/couldn't use the label difference approach (guess it could be a tradeoff even if they could use it - maybe it'd be slower or more memory intensive to use it, even if it did result in marginally smaller output)
Even after all this time I still don't intuitively think of compilers as effectively emitting assembler first, and that it can do useful stuff like that. Right, we can emit a difference expression to a uleb directive and it all works out.

It's quite possible we just had "index into table" on the brain when we were doing the new encoding for range/location lists, what with debug_str_offsets and debug_addr already in place.  but the rationale I described above is what occurred to me, even if compilers can actually be smarter than that.  :-)

Repository:
  rL LLVM

https://reviews.llvm.org/D49214