[cfe-dev] Setting breakpoints before assignments or calls

Thu Aug 23 16:37:16 PDT 2018

> On Aug 23, 2018, at 4:13 PM, Vedant Kumar <vsk at apple.com> wrote:
> 
>> On Aug 23, 2018, at 2:03 PM, Jim Ingham <jingham at apple.com> wrote:
>> 
>> I'm not sure we always want to have the debugger ignore the inter-line line-table entries for complex expressions.  In a case like:
>> 
>> 1:     x = foo(5,
>> 2:	     bar(),
>> 3:           baz()
>> 4:           after_baz());
>> 
>> if I want to step into "baz()", it's convenient to break on the line 3 and do a step-in.  If we move to the is_stmt line and that's somewhere at the beginning of the function call, then that effort will be thwarted.
> 
> In this example, why would the line containing the call to bar() not have the "is_stmt" flag? I assumed that it would because, as a function call, it's "a recommended breakpoint location".

I was arguing that it should have an is_stmt, because otherwise the algorithm Dave suggested would move a breakpoint on line 3 to the is_stmt marked one on line 1.

> 
> Oh, are you suggesting that locations of call arguments shouldn't be eligible for "is_stmt"? .. This might be a naive question, but is there some relevant standard / source of truth about what constitutes a recommended breakpoint location, or is this just a subjective decision on the part of the compiler?

This is always going to be heuristic, what works well for stepping or for setting breakpoints.  For instance, gcc used to only give one line table entry for a complex multi-line expression like the one above.  When people were making the transition to clang in the early days, we got some bugs about " why does "step-over" at line 1 above stop at line 2?  And admittedly it is odd that if you do:

     x = foo(bar(), baz(),
             after_baz());

step-over on this line doesn't stop at bar or baz, but does stop at after_baz.  That's why I was arguing for giving all these function calls "is_stmt" so this would be symmetric.

OTOH, you also get cases in C++ like:

         foo(10,
	     20,
	     30,
             40);

where when you are stepping along you don't stop at the 20 line, but you do stop at the 30 line because 30 gets converted to something that has a copy constructor, so there is code from the line there.  That just makes the debugger look odd when it's stepping.  This could be made better by using line 0 for this code, or by using is_stmt if we're going to take that more seriously.  

Another odd one is why is there sometimes a line table entry for an end bracket and sometimes not?  The user generally can't figure that out, and it is disconcerting not to know where a step is going to go exactly...

Anyway, because the line table information is incomplete, the compiler can't just shove every bit of information it knows in there and let the debugger sort it out.  The debugger doesn't know enough to do that.  And so, the line table ends up being part art whose goal is getting stepping and breaking to feel natural.

But of course the best thing is if lldb knew that there was a nesting, and the debug information represented the nesting, and knew what code was artificial, etc, from the debug information.  Then it would be up to the debugger to make the stepping look right - and we could have a smarter set of gestures to navigate through this sort of code depending on what the user wanted to do.

Jim

> 
> vedant
> 
> 
>> Then you will (a) curse the debugger a bit 'cause it didn't stop where you told it to, and then (b) and step over a few times, (or uses "sif" which nobody knows to use), but with a lot of arguments the former can get annoying.  In this case is seems like we are removing potentially helpful information from the user when setting breakpoints.  And adding a --ignore-is-stmt option doesn't seem like the sort of thing anybody would know to use.  But maybe the compiler could be smarter about when it applies this is_stmt, so it would know to put on on lines 2,3, & 4 before the function calls, but not on line 2 in Vedant's example?  Not sure how hard that would be.
>> 
>> We would also have to have a way to know when to trust the is_stmt for these purposes.  DWARF should really have a way to say features are taken seriously by the producer, so the debugger will know whether to trust them or not.  But that a slightly orthogonal issue...
>> 
>> Jim
>> 
>> 	     
>> 
>>> On Aug 23, 2018, at 1:43 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>> 
>>> 
>>> 
>>> On Thu, Aug 23, 2018 at 1:00 PM Vedant Kumar <vsk at apple.com> wrote:
>>> 
>>>> On Aug 23, 2018, at 12:22 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>> 
>>>> +all the usual debugger folks
>>>> 
>>>> I reckon adding nops would be a pretty unfortunate way to go in terms of code size, etc, if we could help it. One alternative that might work and we've tossed it around a bit - is emitting more accurate "is_stmt" flags. In your first example, the is_stmt flag could be set on the bar() call instruction - and any breakpoint set on an instruction that isn't "is_stmt" could instead set a breakpoint on the statement that covers that instruction (the nearest previous is_stmt instruction*)
>>> 
>>> I hadn't considered using "is_stmt" at all, thanks for bringing that up!
>>> 
>>> Given the rule you've outlined, I'm not sure that the debugger could do a good job in the following scenarios:
>>> 
>>> Ah, one thing that might've been confusing is when I say "before" I mean "before in the instruction stream/line table" not "before in the source order".
>>> 
>>> 
>>> 1|  foo =
>>> 2|    bar();
>>> 3|  foo =    //< Given a breakpoint on line 3, would the debugger stop on line 2?
>>> 4|    bar();
>>> 
>>> 
>>> Given the line table (assuming a sort of simple pseudocode):
>>> 
>>> %x1 = call bar(); # line 2
>>> store %x1 -> @foo # line 1
>>> %x2 = call bar(); # line 4
>>> store %x2 -> @foo # line 3
>>> 
>>> We could group lines 1 and 2 as part of the same statement and lines 3 and 4 as part of the same statement - then when the line table is emitted, the lexically (in the assembly, not in the source code) first instruction that's part of that statement gets the "is_stmt" flag. So in this case line 2 and 4 would have "is_stmt", a debugger, when requested to set a breakpoint on line 3 would look at the line table and say "well, I have this location on line 3, but it's not the start of a statement/not a good place to break, so I'll back up in the instruction stream (within a basic block - so that might get tricky for conditional operators and the like) until I find an instruction marked with "is_stmt" - and it would find the second call instruction and break there. The debugger could make the UI nicer (rather than showing line 4 when the user requested line 3) by highlighting the whole statement (all lines attributed to instructions between this "is_stmt" instruction and the next (Or the end of the basic block)).
>>> 
>>> Does that make sense?
>>> 
>>> or
>>> 
>>> 1|  if (...)
>>> 2|    foo();
>>> 3|  else
>>> 4|    bar();
>>> 5|  baz =   //< Given a breakpoint on line 5, would the debugger stop on line 2, 4, or possibly either?
>>> 6|    func();
>>> 
>>> Is there another way to apply and interpret "is_stmt" flags to resolve these sorts of ambiguities?
>>> 
>>> 
>>>> The inlining case seems to me like something that could be fixed without changes to the DWARF, purely in the consumer - the consumer has the info that the call occurs on a certain line and when I ask to break on that line it could break on that first instruction in the inlined subroutine (potentially giving me an artificial view that makes it look like I'm at the call site and leting me 'step' (though a no-op) into the inlined function).
>>> 
>>> Oh, that's a great point. Yes, there is a TAG_inlined_subroutine for "inline_me" which contains AT_call_file and AT_call_line. That's enough information to do the right thing on the debugger side.
>>> 
>>> thanks,
>>> vedant
>>> 
>>>> 
>>>> * This is a bit problematic when a statement gets interleaved/mixed up with other statements - DWARF has no equivalent of "ranges" for describing a statement. Likely not a problem at -O0 anyway, though (because little interleaving occurs there).
>>>> 
>>>> On Wed, Aug 22, 2018 at 2:01 PM Vedant Kumar via cfe-dev <cfe-dev at lists.llvm.org> wrote:
>>>> Hello,
>>>> 
>>>> I'd like to improve the situation with setting breakpoints on lines with assignments or inlinable calls. This email outlines problem areas, possible solutions, and why I think emitting extra nops at -O0 might be the best solution.
>>>> 
>>>> # Problem 1: Assignments
>>>> 
>>>> Counter to user expectation, a breakpoint on a line containing an assignment is reached when the assignment happens, not before the r.h.s is evaluated.
>>>> 
>>>> ## Example: Can't step into bar()
>>>> 
>>>> 1| foo = // Set a breakpoint here. Note that it's not possible to step into bar().
>>>> 2|   bar();
>>>> 
>>>> One solution is to set the location of the assignment to the location of the r.h.s (line 2). The problem with this approach is that it becomes impossible to set a breakpoint on line 1.
>>>> 
>>>> Another solution is to emit a nop (on line 1) prior to emitting the r.h.s, and to emit an artificial location on the assignment's store instruction. This makes it possible to step to line 1 before line 2, and prevents users from stepping back to line 1 after line 2.
>>>> 
>>>> # Problem 2: Inlinable calls
>>>> 
>>>> Instructions from an inlined function don't have debug locations within the caller. This can make it impossible to set a breakpoint on a line that contains a call to an inlined function.
>>>> 
>>>> ## Example: Can't set a breakpoint on a call
>>>> 
>>>> It's easier to see the bug via Godbolt: https://godbolt.org/z/scwF20. Note that it's not possible to set a breakpoint on line 9 (on "inline_me"). At the IR-level, we do emit an unconditional branch with a location that's on line 9, but we have to drop that branch during ISel.
>>>> 
>>>> The only solution to this problem (afaik) is to insert a nop before inlinable calls. In this example the nop would be on line 9.
>>>> 
>>>> One alternative I've heard is to make the first inlined instruction look like it's located within the caller, but that actually introduces a bug. You wouldn't be able to set a breakpoint on the relevant location in the inlined callee.
>>>> 
>>>> # Proposal
>>>> 
>>>> As outlined above, I think the best way to address issues with setting breakpoints on assignments and calls is to insert nops with suitable locations at -O0. These nops would lower to a target-specific nop at -O0, and lower to nothing at -O1 or greater (like @llvm.donothing).
>>>> 
>>>> The tentative plan is to introduce an intrinsic (say, @llvm.dbg.nop) to accomplish this.
>>>> 
>>>> I don't anticipate there being a substantial compile-time impact, but haven't actually measured this yet. I'd like to get some feedback before going forward. Let me know what you think!
>>>> 
>>>> thanks,
>>>> vedant
>>>> 
>>>> 
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>> 
>