[cfe-dev] Setting breakpoints before assignments or calls

David Blaikie via cfe-dev cfe-dev at lists.llvm.org
Thu Aug 23 14:53:08 PDT 2018


On Thu, Aug 23, 2018 at 2:35 PM Jim Ingham <jingham at apple.com> wrote:

>
>
> > On Aug 23, 2018, at 2:12 PM, David Blaikie <dblaikie at gmail.com> wrote:
> >
> >
> >
> > On Thu, Aug 23, 2018 at 2:03 PM Jim Ingham <jingham at apple.com> wrote:
> > I'm not sure we always want to have the debugger ignore the inter-line
> line-table entries for complex expressions.  In a case like:
> >
> > 1:     x = foo(5,
> > 2:           bar(),
> > 3:           baz()
> > 4:           after_baz());
> >
> > if I want to step into "baz()", it's convenient to break on the line 3
> and do a step-in.  If we move to the is_stmt line and that's somewhere at
> the beginning of the function call, then that effort will be thwarted.
> >
> > I'd figure some UI could improve that - pass a flag to break (or hold
> down shift when you click a line to set a breakpoint on in a GUI) when you
> specifically want to break on a certain line? Or assume a user setting a
> breakpoint on something that's not the first source line (not the lowest
> line number of all lines associated with that statement (all lines from all
> instructions from the nearest previous is_stmt instruction to the nearest
> following is_stmt instruction)) they mean to precisely break on that line -
> but otherwise assume they mean to break before the first instruction on the
> whole statement.
>
> The latter seems workable.  Note also, the definition for "is_stmt" says:
>
> A boolean indicating that the current instruction is a recommended
> breakpoint location. A recommended breakpoint location is intended to
> “represent” a line, a statement and/or a semantically distinct subpart of a
> statement.
>
> So it seems reasonable if we're going to start doing this to consider
> nested function calls in a statement semantically distinct subparts of a
> statement.  Actually this would also be useful when you have:
>
>       foo (bar(), baz(),
>            after_bar(), after_baz());
>
> You do get separate column entries for the function calls, but we have no
> idea what that means and if we just set a breakpoint on every individual
> line entry as currently emitted by clang we end up with annoyingly many
> breakpoints at present.  Marking actually interesting subexpressions would
> help here too.  Anyway, if we started using is_stmt for these subparts then
> we wouldn't have to fix things up in the debugger.
>

I've some apprehension for having the compiler make particularly subjective
judgments about "semantically distinct subpart of a statement" - especially
around operator overloads, for instance. Not without merit/certainly
something to consider, but my gut reaction is to lean away from that -
because the judgment might vary & I could imagine different users wanting
different expereinces at different times/situations/etc - so that it'd be
useful for consumers to be making some of those judgments, showing them to
the user as options, etc.

Given a single line with "foo(bar(), baz())" the user doesn't have the
ability to step into the call lines - I'm not sure that wrapping a line
should make a huge difference to debuggability (admittedly the inverse is
true - taking two statements and writing them on one line does degrade
debugging experience) - seems like it'd provide awkward incentives for
users to layout their code to play to these debugging issues.

Column info - especially for a GUI debugger, could be super helpful - you
could set a breakpoint on specific calls sites which could be nice.


Massively long-term: it'd be awesome to be able to encode something like
Clang's source ranges into DWARF. Basically attributing source ranges with
a "preferred location" (eg: the assignment operator's range covers the
whole "x = y" and the preferred location points to the "=" - as with Clang
diagnostics) so that users can see the hierarchy of evaluation, etc. I
think I remember throwing some ideas for this around with Chandler a few
years ago when I corrected a bunch of source location stuff (there's still
a bug or two outstanding with that with regards to loops especially -
Adrian and I spent some time discussing that - oh, right, some weird things
about how/where GCC breaks and doesn't... ). Dunno what that looks like -
maybe something sort of like/related to/using/extending Cary's two level
line tables to include effectively scopes for expressions and
subexpressions. Then a user could choose to step into an expression
evaluation or skip over it & the debugger could highlight the source ranges
rather than lines which would be more meaningful to the user about where in
the expression evaluation the program is at.


>
> Jim
>
>
> >
> > Then you will (a) curse the debugger a bit 'cause it didn't stop where
> you told it to, and then (b) and step over a few times, (or uses "sif"
> which nobody knows to use), but with a lot of arguments the former can get
> annoying.  In this case is seems like we are removing potentially helpful
> information from the user when setting breakpoints.  And adding a
> --ignore-is-stmt option doesn't seem like the sort of thing anybody would
> know to use.  But maybe the compiler could be smarter about when it applies
> this is_stmt, so it would know to put on on lines 2,3, & 4 before the
> function calls, but not on line 2 in Vedant's example?  Not sure how hard
> that would be.
> >
> > We would also have to have a way to know when to trust the is_stmt for
> these purposes.  DWARF should really have a way to say features are taken
> seriously by the producer, so the debugger will know whether to trust them
> or not.  But that a slightly orthogonal issue...
> >
> > Yeah, it'd be nice to have some flag bits to say "hey, this is the
> /specific/ meaning of this flag/etc we're guaranteeing to implement in this
> output". Though I'm not sure that'd be necessary - I think Clang currently
> just puts is_stmt on everything, so if you implement the advanced behavior
> in LLDB and give it Clang's current output, it'd just degrade to the
> behavior we already see from LLDB - and when LLDB sees new/fancy Clang
> DWARF that is more judicious about is_stmt, it'd get better behavior.
> >
> >
> > Jim
> >
> >
> >
> > > On Aug 23, 2018, at 1:43 PM, David Blaikie <dblaikie at gmail.com> wrote:
> > >
> > >
> > >
> > > On Thu, Aug 23, 2018 at 1:00 PM Vedant Kumar <vsk at apple.com> wrote:
> > >
> > >> On Aug 23, 2018, at 12:22 PM, David Blaikie <dblaikie at gmail.com>
> wrote:
> > >>
> > >> +all the usual debugger folks
> > >>
> > >> I reckon adding nops would be a pretty unfortunate way to go in terms
> of code size, etc, if we could help it. One alternative that might work and
> we've tossed it around a bit - is emitting more accurate "is_stmt" flags.
> In your first example, the is_stmt flag could be set on the bar() call
> instruction - and any breakpoint set on an instruction that isn't "is_stmt"
> could instead set a breakpoint on the statement that covers that
> instruction (the nearest previous is_stmt instruction*)
> > >
> > > I hadn't considered using "is_stmt" at all, thanks for bringing that
> up!
> > >
> > > Given the rule you've outlined, I'm not sure that the debugger could
> do a good job in the following scenarios:
> > >
> > > Ah, one thing that might've been confusing is when I say "before" I
> mean "before in the instruction stream/line table" not "before in the
> source order".
> > >
> > >
> > >   1|  foo =
> > >   2|    bar();
> > >   3|  foo =    //< Given a breakpoint on line 3, would the debugger
> stop on line 2?
> > >   4|    bar();
> > >
> > >
> > > Given the line table (assuming a sort of simple pseudocode):
> > >
> > >   %x1 = call bar(); # line 2
> > >   store %x1 -> @foo # line 1
> > >   %x2 = call bar(); # line 4
> > >   store %x2 -> @foo # line 3
> > >
> > > We could group lines 1 and 2 as part of the same statement and lines 3
> and 4 as part of the same statement - then when the line table is emitted,
> the lexically (in the assembly, not in the source code) first instruction
> that's part of that statement gets the "is_stmt" flag. So in this case line
> 2 and 4 would have "is_stmt", a debugger, when requested to set a
> breakpoint on line 3 would look at the line table and say "well, I have
> this location on line 3, but it's not the start of a statement/not a good
> place to break, so I'll back up in the instruction stream (within a basic
> block - so that might get tricky for conditional operators and the like)
> until I find an instruction marked with "is_stmt" - and it would find the
> second call instruction and break there. The debugger could make the UI
> nicer (rather than showing line 4 when the user requested line 3) by
> highlighting the whole statement (all lines attributed to instructions
> between this "is_stmt" instruction and the next (Or the end of the basic
> block)).
> > >
> > > Does that make sense?
> > >
> > > or
> > >
> > >   1|  if (...)
> > >   2|    foo();
> > >   3|  else
> > >   4|    bar();
> > >   5|  baz =   //< Given a breakpoint on line 5, would the debugger
> stop on line 2, 4, or possibly either?
> > >   6|    func();
> > >
> > > Is there another way to apply and interpret "is_stmt" flags to resolve
> these sorts of ambiguities?
> > >
> > >
> > >> The inlining case seems to me like something that could be fixed
> without changes to the DWARF, purely in the consumer - the consumer has the
> info that the call occurs on a certain line and when I ask to break on that
> line it could break on that first instruction in the inlined subroutine
> (potentially giving me an artificial view that makes it look like I'm at
> the call site and leting me 'step' (though a no-op) into the inlined
> function).
> > >
> > > Oh, that's a great point. Yes, there is a TAG_inlined_subroutine for
> "inline_me" which contains AT_call_file and AT_call_line. That's enough
> information to do the right thing on the debugger side.
> > >
> > > thanks,
> > > vedant
> > >
> > >>
> > >> * This is a bit problematic when a statement gets interleaved/mixed
> up with other statements - DWARF has no equivalent of "ranges" for
> describing a statement. Likely not a problem at -O0 anyway, though (because
> little interleaving occurs there).
> > >>
> > >> On Wed, Aug 22, 2018 at 2:01 PM Vedant Kumar via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
> > >> Hello,
> > >>
> > >> I'd like to improve the situation with setting breakpoints on lines
> with assignments or inlinable calls. This email outlines problem areas,
> possible solutions, and why I think emitting extra nops at -O0 might be the
> best solution.
> > >>
> > >> # Problem 1: Assignments
> > >>
> > >> Counter to user expectation, a breakpoint on a line containing an
> assignment is reached when the assignment happens, not before the r.h.s is
> evaluated.
> > >>
> > >> ## Example: Can't step into bar()
> > >>
> > >>   1| foo = // Set a breakpoint here. Note that it's not possible to
> step into bar().
> > >>   2|   bar();
> > >>
> > >> One solution is to set the location of the assignment to the location
> of the r.h.s (line 2). The problem with this approach is that it becomes
> impossible to set a breakpoint on line 1.
> > >>
> > >> Another solution is to emit a nop (on line 1) prior to emitting the
> r.h.s, and to emit an artificial location on the assignment's store
> instruction. This makes it possible to step to line 1 before line 2, and
> prevents users from stepping back to line 1 after line 2.
> > >>
> > >> # Problem 2: Inlinable calls
> > >>
> > >> Instructions from an inlined function don't have debug locations
> within the caller. This can make it impossible to set a breakpoint on a
> line that contains a call to an inlined function.
> > >>
> > >> ## Example: Can't set a breakpoint on a call
> > >>
> > >> It's easier to see the bug via Godbolt: https://godbolt.org/z/scwF20.
> Note that it's not possible to set a breakpoint on line 9 (on "inline_me").
> At the IR-level, we do emit an unconditional branch with a location that's
> on line 9, but we have to drop that branch during ISel.
> > >>
> > >> The only solution to this problem (afaik) is to insert a nop before
> inlinable calls. In this example the nop would be on line 9.
> > >>
> > >> One alternative I've heard is to make the first inlined instruction
> look like it's located within the caller, but that actually introduces a
> bug. You wouldn't be able to set a breakpoint on the relevant location in
> the inlined callee.
> > >>
> > >> # Proposal
> > >>
> > >> As outlined above, I think the best way to address issues with
> setting breakpoints on assignments and calls is to insert nops with
> suitable locations at -O0. These nops would lower to a target-specific nop
> at -O0, and lower to nothing at -O1 or greater (like @llvm.donothing).
> > >>
> > >> The tentative plan is to introduce an intrinsic (say, @llvm.dbg.nop)
> to accomplish this.
> > >>
> > >> I don't anticipate there being a substantial compile-time impact, but
> haven't actually measured this yet. I'd like to get some feedback before
> going forward. Let me know what you think!
> > >>
> > >> thanks,
> > >> vedant
> > >>
> > >>
> > >> _______________________________________________
> > >> cfe-dev mailing list
> > >> cfe-dev at lists.llvm.org
> > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180823/6a24b896/attachment.html>


More information about the cfe-dev mailing list