[cfe-dev] Setting breakpoints before assignments or calls

Thu Aug 23 17:21:48 PDT 2018

On Thu, Aug 23, 2018 at 3:55 PM Jim Ingham <jingham at apple.com> wrote:

>
>
> > On Aug 23, 2018, at 2:53 PM, David Blaikie <dblaikie at gmail.com> wrote:
> >
> >
> >
> > On Thu, Aug 23, 2018 at 2:35 PM Jim Ingham <jingham at apple.com> wrote:
> >
> >
> > > On Aug 23, 2018, at 2:12 PM, David Blaikie <dblaikie at gmail.com> wrote:
> > >
> > >
> > >
> > > On Thu, Aug 23, 2018 at 2:03 PM Jim Ingham <jingham at apple.com> wrote:
> > > I'm not sure we always want to have the debugger ignore the inter-line
> line-table entries for complex expressions.  In a case like:
> > >
> > > 1:     x = foo(5,
> > > 2:           bar(),
> > > 3:           baz()
> > > 4:           after_baz());
> > >
> > > if I want to step into "baz()", it's convenient to break on the line 3
> and do a step-in.  If we move to the is_stmt line and that's somewhere at
> the beginning of the function call, then that effort will be thwarted.
> > >
> > > I'd figure some UI could improve that - pass a flag to break (or hold
> down shift when you click a line to set a breakpoint on in a GUI) when you
> specifically want to break on a certain line? Or assume a user setting a
> breakpoint on something that's not the first source line (not the lowest
> line number of all lines associated with that statement (all lines from all
> instructions from the nearest previous is_stmt instruction to the nearest
> following is_stmt instruction)) they mean to precisely break on that line -
> but otherwise assume they mean to break before the first instruction on the
> whole statement.
> >
> > The latter seems workable.  Note also, the definition for "is_stmt" says:
> >
> > A boolean indicating that the current instruction is a recommended
> breakpoint location. A recommended breakpoint location is intended to
> “represent” a line, a statement and/or a semantically distinct subpart of a
> statement.
> >
> > So it seems reasonable if we're going to start doing this to consider
> nested function calls in a statement semantically distinct subparts of a
> statement.  Actually this would also be useful when you have:
> >
> >       foo (bar(), baz(),
> >            after_bar(), after_baz());
> >
> > You do get separate column entries for the function calls, but we have
> no idea what that means and if we just set a breakpoint on every individual
> line entry as currently emitted by clang we end up with annoyingly many
> breakpoints at present.  Marking actually interesting subexpressions would
> help here too.  Anyway, if we started using is_stmt for these subparts then
> we wouldn't have to fix things up in the debugger.
> >
> > I've some apprehension for having the compiler make particularly
> subjective judgments about "semantically distinct subpart of a statement" -
> especially around operator overloads, for instance. Not without
> merit/certainly something to consider, but my gut reaction is to lean away
> from that - because the judgment might vary & I could imagine different
> users wanting different expereinces at different times/situations/etc - so
> that it'd be useful for consumers to be making some of those judgments,
> showing them to the user as options, etc.
>
> Not sure about this.
>
> Right now, if there are many line table entries that map to a given line
> lldb will choose the first one by address within a given block.  That's
> because the line tables are really noisy, and our experience was that if I
> set a breakpoint location per entry, stepping gets annoyingly jerky and you
> have to keep hitting step over and over.  That's really too draconian and
> you can't get back to a really independent subsection of a line.  Note to
> self - I should try playing around with one location per distinct column -
> I haven't revisited the breakpoint by line setting since I was told clang
> was serious about column info.  It would be interesting to see how that
> works.
>

Not sure I quite follow all of that - yeah, having the debugger stop every
time it hits every distinct line table entry for a given source line would
be annoying. But allowing the user to do that if they want in certain
situations could be good... but yeah, mostly comes back to wanting
ranges/hierarchy rather than isolated locations.

> But if you emitted the inter-line entries you currently do but added "I
> think these are the important ones" with is_stmt, that wouldn't remove
> information, and the debugger could still offer different experiences
> directed by the user.  It would just make the default behavior a little
> nicer.
>

*nod* It wouldn't be worse than today, but it would be worse for a consumer
that did want to do only full source level statements - with is_stmt on
more subjective "interesting" things, that'd be baked in by the compiler &
less flexibility for users than if it was per statement & the consumer/user
could tweak as desired from there.

> Given a single line with "foo(bar(), baz())" the user doesn't have the
> ability to step into the call lines - I'm not sure that wrapping a line
> should make a huge difference to debuggability (admittedly the inverse is
> true - taking two statements and writing them on one line does degrade
> debugging experience) - seems like it'd provide awkward incentives for
> users to layout their code to play to these debugging issues.
> >
> > Column info - especially for a GUI debugger, could be super helpful -
> you could set a breakpoint on specific calls sites which could be nice.
> >
> >
> > Massively long-term: it'd be awesome to be able to encode something like
> Clang's source ranges into DWARF. Basically attributing source ranges with
> a "preferred location" (eg: the assignment operator's range covers the
> whole "x = y" and the preferred location points to the "=" - as with Clang
> diagnostics) so that users can see the hierarchy of evaluation, etc. I
> think I remember throwing some ideas for this around with Chandler a few
> years ago when I corrected a bunch of source location stuff (there's still
> a bug or two outstanding with that with regards to loops especially -
> Adrian and I spent some time discussing that - oh, right, some weird things
> about how/where GCC breaks and doesn't... ). Dunno what that looks like -
> maybe something sort of like/related to/using/extending Cary's two level
> line tables to include effectively scopes for expressions and
> subexpressions. Then a user could choose to step into an expression
> evaluation or skip over it & the debugger could highlight the source ranges
> rather than lines which would be more meaningful to the user about where in
> the expression evaluation the program is at.
>
> Caroline Tice (who was one of the original lldb authors)'s PhD thesis had
> a section on expressing the nesting of operations - IIRC she called them
> atoms.  We talked about this some in the early days but didn't get much
> traction on the compiler side (at that time we were still using gcc as the
> front end).  So this ended up being only talk.  But it would be really
> handy to know the scope and not just the initial point of the expressions
> and subexpressions in the debug info.
>

Ah, cool.

- Dave

>
> Jim
>
>
> >
> >
> > Jim
> >
> >
> > >
> > > Then you will (a) curse the debugger a bit 'cause it didn't stop where
> you told it to, and then (b) and step over a few times, (or uses "sif"
> which nobody knows to use), but with a lot of arguments the former can get
> annoying.  In this case is seems like we are removing potentially helpful
> information from the user when setting breakpoints.  And adding a
> --ignore-is-stmt option doesn't seem like the sort of thing anybody would
> know to use.  But maybe the compiler could be smarter about when it applies
> this is_stmt, so it would know to put on on lines 2,3, & 4 before the
> function calls, but not on line 2 in Vedant's example?  Not sure how hard
> that would be.
> > >
> > > We would also have to have a way to know when to trust the is_stmt for
> these purposes.  DWARF should really have a way to say features are taken
> seriously by the producer, so the debugger will know whether to trust them
> or not.  But that a slightly orthogonal issue...
> > >
> > > Yeah, it'd be nice to have some flag bits to say "hey, this is the
> /specific/ meaning of this flag/etc we're guaranteeing to implement in this
> output". Though I'm not sure that'd be necessary - I think Clang currently
> just puts is_stmt on everything, so if you implement the advanced behavior
> in LLDB and give it Clang's current output, it'd just degrade to the
> behavior we already see from LLDB - and when LLDB sees new/fancy Clang
> DWARF that is more judicious about is_stmt, it'd get better behavior.
> > >
> > >
> > > Jim
> > >
> > >
> > >
> > > > On Aug 23, 2018, at 1:43 PM, David Blaikie <dblaikie at gmail.com>
> wrote:
> > > >
> > > >
> > > >
> > > > On Thu, Aug 23, 2018 at 1:00 PM Vedant Kumar <vsk at apple.com> wrote:
> > > >
> > > >> On Aug 23, 2018, at 12:22 PM, David Blaikie <dblaikie at gmail.com>
> wrote:
> > > >>
> > > >> +all the usual debugger folks
> > > >>
> > > >> I reckon adding nops would be a pretty unfortunate way to go in
> terms of code size, etc, if we could help it. One alternative that might
> work and we've tossed it around a bit - is emitting more accurate "is_stmt"
> flags. In your first example, the is_stmt flag could be set on the bar()
> call instruction - and any breakpoint set on an instruction that isn't
> "is_stmt" could instead set a breakpoint on the statement that covers that
> instruction (the nearest previous is_stmt instruction*)
> > > >
> > > > I hadn't considered using "is_stmt" at all, thanks for bringing that
> up!
> > > >
> > > > Given the rule you've outlined, I'm not sure that the debugger could
> do a good job in the following scenarios:
> > > >
> > > > Ah, one thing that might've been confusing is when I say "before" I
> mean "before in the instruction stream/line table" not "before in the
> source order".
> > > >
> > > >
> > > >   1|  foo =
> > > >   2|    bar();
> > > >   3|  foo =    //< Given a breakpoint on line 3, would the debugger
> stop on line 2?
> > > >   4|    bar();
> > > >
> > > >
> > > > Given the line table (assuming a sort of simple pseudocode):
> > > >
> > > >   %x1 = call bar(); # line 2
> > > >   store %x1 -> @foo # line 1
> > > >   %x2 = call bar(); # line 4
> > > >   store %x2 -> @foo # line 3
> > > >
> > > > We could group lines 1 and 2 as part of the same statement and lines
> 3 and 4 as part of the same statement - then when the line table is
> emitted, the lexically (in the assembly, not in the source code) first
> instruction that's part of that statement gets the "is_stmt" flag. So in
> this case line 2 and 4 would have "is_stmt", a debugger, when requested to
> set a breakpoint on line 3 would look at the line table and say "well, I
> have this location on line 3, but it's not the start of a statement/not a
> good place to break, so I'll back up in the instruction stream (within a
> basic block - so that might get tricky for conditional operators and the
> like) until I find an instruction marked with "is_stmt" - and it would find
> the second call instruction and break there. The debugger could make the UI
> nicer (rather than showing line 4 when the user requested line 3) by
> highlighting the whole statement (all lines attributed to instructions
> between this "is_stmt" instruction and the next (Or the end of the basic
> block)).
> > > >
> > > > Does that make sense?
> > > >
> > > > or
> > > >
> > > >   1|  if (...)
> > > >   2|    foo();
> > > >   3|  else
> > > >   4|    bar();
> > > >   5|  baz =   //< Given a breakpoint on line 5, would the debugger
> stop on line 2, 4, or possibly either?
> > > >   6|    func();
> > > >
> > > > Is there another way to apply and interpret "is_stmt" flags to
> resolve these sorts of ambiguities?
> > > >
> > > >
> > > >> The inlining case seems to me like something that could be fixed
> without changes to the DWARF, purely in the consumer - the consumer has the
> info that the call occurs on a certain line and when I ask to break on that
> line it could break on that first instruction in the inlined subroutine
> (potentially giving me an artificial view that makes it look like I'm at
> the call site and leting me 'step' (though a no-op) into the inlined
> function).
> > > >
> > > > Oh, that's a great point. Yes, there is a TAG_inlined_subroutine for
> "inline_me" which contains AT_call_file and AT_call_line. That's enough
> information to do the right thing on the debugger side.
> > > >
> > > > thanks,
> > > > vedant
> > > >
> > > >>
> > > >> * This is a bit problematic when a statement gets interleaved/mixed
> up with other statements - DWARF has no equivalent of "ranges" for
> describing a statement. Likely not a problem at -O0 anyway, though (because
> little interleaving occurs there).
> > > >>
> > > >> On Wed, Aug 22, 2018 at 2:01 PM Vedant Kumar via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
> > > >> Hello,
> > > >>
> > > >> I'd like to improve the situation with setting breakpoints on lines
> with assignments or inlinable calls. This email outlines problem areas,
> possible solutions, and why I think emitting extra nops at -O0 might be the
> best solution.
> > > >>
> > > >> # Problem 1: Assignments
> > > >>
> > > >> Counter to user expectation, a breakpoint on a line containing an
> assignment is reached when the assignment happens, not before the r.h.s is
> evaluated.
> > > >>
> > > >> ## Example: Can't step into bar()
> > > >>
> > > >>   1| foo = // Set a breakpoint here. Note that it's not possible to
> step into bar().
> > > >>   2|   bar();
> > > >>
> > > >> One solution is to set the location of the assignment to the
> location of the r.h.s (line 2). The problem with this approach is that it
> becomes impossible to set a breakpoint on line 1.
> > > >>
> > > >> Another solution is to emit a nop (on line 1) prior to emitting the
> r.h.s, and to emit an artificial location on the assignment's store
> instruction. This makes it possible to step to line 1 before line 2, and
> prevents users from stepping back to line 1 after line 2.
> > > >>
> > > >> # Problem 2: Inlinable calls
> > > >>
> > > >> Instructions from an inlined function don't have debug locations
> within the caller. This can make it impossible to set a breakpoint on a
> line that contains a call to an inlined function.
> > > >>
> > > >> ## Example: Can't set a breakpoint on a call
> > > >>
> > > >> It's easier to see the bug via Godbolt:
> https://godbolt.org/z/scwF20. Note that it's not possible to set a
> breakpoint on line 9 (on "inline_me"). At the IR-level, we do emit an
> unconditional branch with a location that's on line 9, but we have to drop
> that branch during ISel.
> > > >>
> > > >> The only solution to this problem (afaik) is to insert a nop before
> inlinable calls. In this example the nop would be on line 9.
> > > >>
> > > >> One alternative I've heard is to make the first inlined instruction
> look like it's located within the caller, but that actually introduces a
> bug. You wouldn't be able to set a breakpoint on the relevant location in
> the inlined callee.
> > > >>
> > > >> # Proposal
> > > >>
> > > >> As outlined above, I think the best way to address issues with
> setting breakpoints on assignments and calls is to insert nops with
> suitable locations at -O0. These nops would lower to a target-specific nop
> at -O0, and lower to nothing at -O1 or greater (like @llvm.donothing).
> > > >>
> > > >> The tentative plan is to introduce an intrinsic (say,
> @llvm.dbg.nop) to accomplish this.
> > > >>
> > > >> I don't anticipate there being a substantial compile-time impact,
> but haven't actually measured this yet. I'd like to get some feedback
> before going forward. Let me know what you think!
> > > >>
> > > >> thanks,
> > > >> vedant
> > > >>
> > > >>
> > > >> _______________________________________________
> > > >> cfe-dev mailing list
> > > >> cfe-dev at lists.llvm.org
> > > >> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> > >
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180823/bdadfaf0/attachment.html>