[PATCH] D58952: [llvm] Skip over empty line table entries.

Wed Mar 6 15:12:52 PST 2019

dblaikie added inline comments.

================
Comment at: llvm/lib/DebugInfo/DWARF/DWARFDebugLine.cpp:871
+  // In some cases, e.g. first instruction in a function, the compiler generates
+  // two entries, both with the same address. We want the last one.
   LineTable::RowIter RowPos = std::lower_bound(
----------------
probinson wrote:
> probinson wrote:
> > dblaikie wrote:
> > > aprantl wrote:
> > > > probinson wrote:
> > > > > dblaikie wrote:
> > > > > > probinson wrote:
> > > > > > > aprantl wrote:
> > > > > > > > Doesn't the DWARF spec require all PC values in the line table to be strictly monotonically increasing? How is it possible to have more than one entry at the same address?
> > > > > > > The DWARF spec's model is based on increasing PC values, however the encoding allows advancing PC by 0. So it's entirely possible to have a line table that (for example) advances line by 2 and PC by 0.
> > > > > > I couldn't find specific wording about that - but please do take a look, quite possible I missed it.
> > > > > > 
> > > > > > LLVM's been generating line tables like this for a while (the test updates show evidence of that) & it might be justifiable to fix this code even if it's technically invalid DWARF, given how long we've been generating it. (to my mind, it is wasted DWARF - it doesn't describe any location, so I think it should be correct to remove it if we wanted to, but not a high priority bug (@echristo reckons there might be consumers using these empty ranges to find the start of a function or the like))
> > > > > The non-normative text on the first couple pages of section 6.2 describe the line table as notionally "a large matrix, with one row for each instruction" (p.149 of DWARF 5). However, the normative text describes how special opcodes are formed (section 6.2.5.1, pp. 160-162) without requiring "operation advance" to be nonzero. This permits a special opcode to increment the line number without changing the (address, op_index) tuple.
> > > > > It's skating on thin ice because DWARF doesn't explicitly give a meaning to this, but the encoding allows it.
> > > > That's quite interesting for a completely unrelated reason: If the linetable encoding actually supports more than one entry per PC, we could change the implementation of getMergedLocation to emit both locations and then teach the debugger to disambiguate the locations based on DW_AT_call_* information or something.
> > > > 
> > > > > it might be justifiable to fix this code even if it's technically invalid DWARF, given how long we've been generating it. 
> > > > Agreed.
> > > I don't know/think I'd say "the linetable encoding actually supports more than one entry per PC"
> > > 
> > > I think each pair of addresses in the line table is the half-open range [X, Y), so if you have two lines after each other you have [X, X) - I would argue that is the empty range (it certainly is with C++ iterators, for instance), not X. So it doesn't describe any addresses at all.
> > Not so fast...  We don't encode the line table until deep in the bowels of the assembler.  getMergedLocation is working out what to attach to an IR instruction, which doesn't know how to attach more than one location.  Teach it that first, and getMergedLocation can return you a list.
> > It will confuse the heck out of consumers that aren't expecting it, and this will be especially weird for profilers.
> > I think this particular dark corner of the spec might be exploitable, but it won't be smooth going.
> > 
> > As far as the current patch is concerned, I am uncertain. Having addr2line do something consistent seems likely to do more good than harm, but I can imagine this could cause some confusion in other clients.
> Treating the zero-increment case as an empty range has some appeal. However, it can be confusing.  You tell the debugger to plant a breakpoint on foo(), which the debugger will tell you is on line 12; and yet when you hit said breakpoint, you're told that address is on line 15, because things got folded and optimized and the instruction range for lines 12-14 has gone to nothing.
> Maybe we hand-wave that away as "what do you expect from optimized code" but it's still not a great debugging experience.
> Treating the zero-increment case as an empty range has some appeal. However, it can be confusing. You tell the debugger to plant a breakpoint on foo(), which the debugger will tell you is on line 12; and yet when you hit said breakpoint, you're told that address is on line 15, because things got folded and optimized and the instruction range for lines 12-14 has gone to nothing.
> Maybe we hand-wave that away as "what do you expect from optimized code" but it's still not a great debugging experience.

I'm not sure I follow this - the line table maps instructions to source locations. If there are no instructions there's nothing to map - I'm pretty sure LLVM's weird zero-length line table range is because of this (set the location, emit no instructions, set the location, emit insntructions - so you end up with the two entries and no locations between them). So DWARF's representation here is pretty clear - if nothing in your function causes instructions until 10 lines in - that's where you first break when you enter the function.

If you say "break foo" - the debugger can tell you, "foo is declared at line 3 (from the DW_TAG_subprogram) and the first instruction is on line 10, which is where you'll break".

Pretty sure GCC doesn't do anything like this, and gas... hmm, nope, actually GCC and gas both do do this thing, producing two line entries at the start of functions that have nothing in the prologue.

All the more reason to support parsing them and interpreting them the same way addr2line does - even if we did decide to change Clang's behavior not to generate these.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D58952/new/

https://reviews.llvm.org/D58952