[cfe-dev] Setting breakpoints before assignments or calls

Thu Aug 23 15:55:46 PDT 2018

> On Aug 23, 2018, at 2:53 PM, David Blaikie <dblaikie at gmail.com> wrote:
> 
> 
> 
> On Thu, Aug 23, 2018 at 2:35 PM Jim Ingham <jingham at apple.com> wrote:
> 
> 
> > On Aug 23, 2018, at 2:12 PM, David Blaikie <dblaikie at gmail.com> wrote:
> > 
> > 
> > 
> > On Thu, Aug 23, 2018 at 2:03 PM Jim Ingham <jingham at apple.com> wrote:
> > I'm not sure we always want to have the debugger ignore the inter-line line-table entries for complex expressions.  In a case like:
> > 
> > 1:     x = foo(5,
> > 2:           bar(),
> > 3:           baz()
> > 4:           after_baz());
> > 
> > if I want to step into "baz()", it's convenient to break on the line 3 and do a step-in.  If we move to the is_stmt line and that's somewhere at the beginning of the function call, then that effort will be thwarted. 
> > 
> > I'd figure some UI could improve that - pass a flag to break (or hold down shift when you click a line to set a breakpoint on in a GUI) when you specifically want to break on a certain line? Or assume a user setting a breakpoint on something that's not the first source line (not the lowest line number of all lines associated with that statement (all lines from all instructions from the nearest previous is_stmt instruction to the nearest following is_stmt instruction)) they mean to precisely break on that line - but otherwise assume they mean to break before the first instruction on the whole statement.
> 
> The latter seems workable.  Note also, the definition for "is_stmt" says:
> 
> A boolean indicating that the current instruction is a recommended breakpoint location. A recommended breakpoint location is intended to “represent” a line, a statement and/or a semantically distinct subpart of a statement.
> 
> So it seems reasonable if we're going to start doing this to consider nested function calls in a statement semantically distinct subparts of a statement.  Actually this would also be useful when you have:
> 
>       foo (bar(), baz(),
>            after_bar(), after_baz());
> 
> You do get separate column entries for the function calls, but we have no idea what that means and if we just set a breakpoint on every individual line entry as currently emitted by clang we end up with annoyingly many breakpoints at present.  Marking actually interesting subexpressions would help here too.  Anyway, if we started using is_stmt for these subparts then we wouldn't have to fix things up in the debugger.
> 
> I've some apprehension for having the compiler make particularly subjective judgments about "semantically distinct subpart of a statement" - especially around operator overloads, for instance. Not without merit/certainly something to consider, but my gut reaction is to lean away from that - because the judgment might vary & I could imagine different users wanting different expereinces at different times/situations/etc - so that it'd be useful for consumers to be making some of those judgments, showing them to the user as options, etc.

Not sure about this.  

Right now, if there are many line table entries that map to a given line lldb will choose the first one by address within a given block.  That's because the line tables are really noisy, and our experience was that if I set a breakpoint location per entry, stepping gets annoyingly jerky and you have to keep hitting step over and over.  That's really too draconian and you can't get back to a really independent subsection of a line.  Note to self - I should try playing around with one location per distinct column - I haven't revisited the breakpoint by line setting since I was told clang was serious about column info.  It would be interesting to see how that works.

But if you emitted the inter-line entries you currently do but added "I think these are the important ones" with is_stmt, that wouldn't remove information, and the debugger could still offer different experiences directed by the user.  It would just make the default behavior a little nicer. 

> 
> Given a single line with "foo(bar(), baz())" the user doesn't have the ability to step into the call lines - I'm not sure that wrapping a line should make a huge difference to debuggability (admittedly the inverse is true - taking two statements and writing them on one line does degrade debugging experience) - seems like it'd provide awkward incentives for users to layout their code to play to these debugging issues.
> 
> Column info - especially for a GUI debugger, could be super helpful - you could set a breakpoint on specific calls sites which could be nice.
> 
> 
> Massively long-term: it'd be awesome to be able to encode something like Clang's source ranges into DWARF. Basically attributing source ranges with a "preferred location" (eg: the assignment operator's range covers the whole "x = y" and the preferred location points to the "=" - as with Clang diagnostics) so that users can see the hierarchy of evaluation, etc. I think I remember throwing some ideas for this around with Chandler a few years ago when I corrected a bunch of source location stuff (there's still a bug or two outstanding with that with regards to loops especially - Adrian and I spent some time discussing that - oh, right, some weird things about how/where GCC breaks and doesn't... ). Dunno what that looks like - maybe something sort of like/related to/using/extending Cary's two level line tables to include effectively scopes for expressions and subexpressions. Then a user could choose to step into an expression evaluation or skip over it & the debugger could highlight the source ranges rather than lines which would be more meaningful to the user about where in the expression evaluation the program is at.

Caroline Tice (who was one of the original lldb authors)'s PhD thesis had a section on expressing the nesting of operations - IIRC she called them atoms.  We talked about this some in the early days but didn't get much traction on the compiler side (at that time we were still using gcc as the front end).  So this ended up being only talk.  But it would be really handy to know the scope and not just the initial point of the expressions and subexpressions in the debug info.

Jim

>  
> 
> Jim
> 
> 
> >  
> > Then you will (a) curse the debugger a bit 'cause it didn't stop where you told it to, and then (b) and step over a few times, (or uses "sif" which nobody knows to use), but with a lot of arguments the former can get annoying.  In this case is seems like we are removing potentially helpful information from the user when setting breakpoints.  And adding a --ignore-is-stmt option doesn't seem like the sort of thing anybody would know to use.  But maybe the compiler could be smarter about when it applies this is_stmt, so it would know to put on on lines 2,3, & 4 before the function calls, but not on line 2 in Vedant's example?  Not sure how hard that would be.
> > 
> > We would also have to have a way to know when to trust the is_stmt for these purposes.  DWARF should really have a way to say features are taken seriously by the producer, so the debugger will know whether to trust them or not.  But that a slightly orthogonal issue...
> > 
> > Yeah, it'd be nice to have some flag bits to say "hey, this is the /specific/ meaning of this flag/etc we're guaranteeing to implement in this output". Though I'm not sure that'd be necessary - I think Clang currently just puts is_stmt on everything, so if you implement the advanced behavior in LLDB and give it Clang's current output, it'd just degrade to the behavior we already see from LLDB - and when LLDB sees new/fancy Clang DWARF that is more judicious about is_stmt, it'd get better behavior.
> >  
> > 
> > Jim
> > 
> > 
> > 
> > > On Aug 23, 2018, at 1:43 PM, David Blaikie <dblaikie at gmail.com> wrote:
> > > 
> > > 
> > > 
> > > On Thu, Aug 23, 2018 at 1:00 PM Vedant Kumar <vsk at apple.com> wrote:
> > > 
> > >> On Aug 23, 2018, at 12:22 PM, David Blaikie <dblaikie at gmail.com> wrote:
> > >> 
> > >> +all the usual debugger folks
> > >> 
> > >> I reckon adding nops would be a pretty unfortunate way to go in terms of code size, etc, if we could help it. One alternative that might work and we've tossed it around a bit - is emitting more accurate "is_stmt" flags. In your first example, the is_stmt flag could be set on the bar() call instruction - and any breakpoint set on an instruction that isn't "is_stmt" could instead set a breakpoint on the statement that covers that instruction (the nearest previous is_stmt instruction*)
> > > 
> > > I hadn't considered using "is_stmt" at all, thanks for bringing that up!
> > > 
> > > Given the rule you've outlined, I'm not sure that the debugger could do a good job in the following scenarios:
> > > 
> > > Ah, one thing that might've been confusing is when I say "before" I mean "before in the instruction stream/line table" not "before in the source order".
> > >  
> > > 
> > >   1|  foo =
> > >   2|    bar();
> > >   3|  foo =    //< Given a breakpoint on line 3, would the debugger stop on line 2?
> > >   4|    bar();
> > > 
> > > 
> > > Given the line table (assuming a sort of simple pseudocode):
> > > 
> > >   %x1 = call bar(); # line 2
> > >   store %x1 -> @foo # line 1
> > >   %x2 = call bar(); # line 4
> > >   store %x2 -> @foo # line 3
> > > 
> > > We could group lines 1 and 2 as part of the same statement and lines 3 and 4 as part of the same statement - then when the line table is emitted, the lexically (in the assembly, not in the source code) first instruction that's part of that statement gets the "is_stmt" flag. So in this case line 2 and 4 would have "is_stmt", a debugger, when requested to set a breakpoint on line 3 would look at the line table and say "well, I have this location on line 3, but it's not the start of a statement/not a good place to break, so I'll back up in the instruction stream (within a basic block - so that might get tricky for conditional operators and the like) until I find an instruction marked with "is_stmt" - and it would find the second call instruction and break there. The debugger could make the UI nicer (rather than showing line 4 when the user requested line 3) by highlighting the whole statement (all lines attributed to instructions between this "is_stmt" instruction and the next (Or the end of the basic block)).
> > > 
> > > Does that make sense?
> > >  
> > > or
> > > 
> > >   1|  if (...)
> > >   2|    foo();
> > >   3|  else
> > >   4|    bar();
> > >   5|  baz =   //< Given a breakpoint on line 5, would the debugger stop on line 2, 4, or possibly either?
> > >   6|    func();
> > > 
> > > Is there another way to apply and interpret "is_stmt" flags to resolve these sorts of ambiguities?
> > > 
> > > 
> > >> The inlining case seems to me like something that could be fixed without changes to the DWARF, purely in the consumer - the consumer has the info that the call occurs on a certain line and when I ask to break on that line it could break on that first instruction in the inlined subroutine (potentially giving me an artificial view that makes it look like I'm at the call site and leting me 'step' (though a no-op) into the inlined function).
> > > 
> > > Oh, that's a great point. Yes, there is a TAG_inlined_subroutine for "inline_me" which contains AT_call_file and AT_call_line. That's enough information to do the right thing on the debugger side.
> > > 
> > > thanks,
> > > vedant
> > > 
> > >> 
> > >> * This is a bit problematic when a statement gets interleaved/mixed up with other statements - DWARF has no equivalent of "ranges" for describing a statement. Likely not a problem at -O0 anyway, though (because little interleaving occurs there).
> > >> 
> > >> On Wed, Aug 22, 2018 at 2:01 PM Vedant Kumar via cfe-dev <cfe-dev at lists.llvm.org> wrote:
> > >> Hello,
> > >> 
> > >> I'd like to improve the situation with setting breakpoints on lines with assignments or inlinable calls. This email outlines problem areas, possible solutions, and why I think emitting extra nops at -O0 might be the best solution.
> > >> 
> > >> # Problem 1: Assignments
> > >> 
> > >> Counter to user expectation, a breakpoint on a line containing an assignment is reached when the assignment happens, not before the r.h.s is evaluated.
> > >> 
> > >> ## Example: Can't step into bar()
> > >> 
> > >>   1| foo = // Set a breakpoint here. Note that it's not possible to step into bar().
> > >>   2|   bar();
> > >> 
> > >> One solution is to set the location of the assignment to the location of the r.h.s (line 2). The problem with this approach is that it becomes impossible to set a breakpoint on line 1.
> > >> 
> > >> Another solution is to emit a nop (on line 1) prior to emitting the r.h.s, and to emit an artificial location on the assignment's store instruction. This makes it possible to step to line 1 before line 2, and prevents users from stepping back to line 1 after line 2.
> > >> 
> > >> # Problem 2: Inlinable calls
> > >> 
> > >> Instructions from an inlined function don't have debug locations within the caller. This can make it impossible to set a breakpoint on a line that contains a call to an inlined function.
> > >> 
> > >> ## Example: Can't set a breakpoint on a call
> > >> 
> > >> It's easier to see the bug via Godbolt: https://godbolt.org/z/scwF20. Note that it's not possible to set a breakpoint on line 9 (on "inline_me"). At the IR-level, we do emit an unconditional branch with a location that's on line 9, but we have to drop that branch during ISel.
> > >> 
> > >> The only solution to this problem (afaik) is to insert a nop before inlinable calls. In this example the nop would be on line 9.
> > >> 
> > >> One alternative I've heard is to make the first inlined instruction look like it's located within the caller, but that actually introduces a bug. You wouldn't be able to set a breakpoint on the relevant location in the inlined callee.
> > >> 
> > >> # Proposal
> > >> 
> > >> As outlined above, I think the best way to address issues with setting breakpoints on assignments and calls is to insert nops with suitable locations at -O0. These nops would lower to a target-specific nop at -O0, and lower to nothing at -O1 or greater (like @llvm.donothing).
> > >> 
> > >> The tentative plan is to introduce an intrinsic (say, @llvm.dbg.nop) to accomplish this.
> > >> 
> > >> I don't anticipate there being a substantial compile-time impact, but haven't actually measured this yet. I'd like to get some feedback before going forward. Let me know what you think!
> > >> 
> > >> thanks,
> > >> vedant
> > >> 
> > >> 
> > >> _______________________________________________
> > >> cfe-dev mailing list
> > >> cfe-dev at lists.llvm.org
> > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> > 
>