[Lldb-commits] [PATCH] D94846: Allow breakpoints to be set on C++11 inline initializers

Tue Jan 19 15:58:14 PST 2021

> On Jan 19, 2021, at 3:34 PM, David Blaikie <dblaikie at gmail.com> wrote:
> 
> On Tue, Jan 19, 2021 at 2:55 PM Jim Ingham <jingham at apple.com> wrote:
>> 
>> 
>> 
>>> On Jan 19, 2021, at 11:40 AM, David Blaikie <dblaikie at gmail.com> wrote:
>>> 
>>> On Tue, Jan 19, 2021 at 10:17 AM Jim Ingham <jingham at apple.com> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Jan 17, 2021, at 10:47 AM, David Blaikie <dblaikie at gmail.com> wrote:
>>>>> 
>>>>> On Fri, Jan 15, 2021 at 6:22 PM Jim Ingham <jingham at apple.com> wrote:
>>>>>> 
>>>>>> If you set a breakpoint on source lines with no line table entries, lldb slides the actual match forward to the nearest line table entry to the given line number.  That's necessary for instance because if you lay out your function definitions over multiple lines they won't all get line table entries, but we don't want people to have to guess which of them was...  Same is true of more complicated compound statements.
>>>>> 
>>>>> Ah, sure - gdb seems to do that too, totally makes sense - as you say,
>>>>> it'd be pretty hard to know exactly which tokens end up with
>>>>> corresponding entries in the line table and which don't (especially
>>>>> under optimizations and complex compound statements).
>>>> 
>>>> Yeah, I think you either have to have this heuristic, or have a debugger that can display "valid breakpoint locations".  The old Metrowerks debugger used to do it that way.  They only put up breakpoint affordances in the UI for the lines that had line table entries and there was no free-form way to set breakpoints, so you couldn't do an ambiguous thing.  But I don't think we have that option in lldb.
>>> 
>>> Could be nice to have in source printing on the command line and could
>>> be done more fully in IDE integrations like XCode - but yeah, still
>>> would want all this sliding stuff for raw command line usage even if
>>> we had the rest.
>> 
>> If we just cared about lines and not columns, it would be pretty easy to annotate the source line output, maybe just a • before each breakable line.  Marking the column locations as well, while keeping the whole thing readable, is a bit trickier.
>> 
>> The problem with Xcode is that they don't do full compiles as you are editing.  They do run the front end for code completion, brace matching, etc...  But that doesn't generate line tables.  So you would have to support breakpoints in files where you do and don't know the valid line table entries.  If you didn't design for using the compiled code for UI in the IDE, it's not easy to add it in after the fact.
>> 
>>> 
>>>>>> So people like that, but they really hate it when they have:
>>>>>> 
>>>>>> 
>>>>>> #ifdef SOMETHING_NOT_DEFINED
>>>>>> 
>>>>>> int foo() {
>>>>>> 
>>>>>> }
>>>>>> 
>>>>>> #else
>>>>>> int bar() {
>>>>>> 
>>>>>> }
>>>>>> #end
>>>>>> 
>>>>>> but with lots more junk so you can't see the ifdef's and then they set a breakpoint in the "int foo" part and the breakpoint gets moved to bar instead and then doesn't get hit.
>>>>> 
>>>>> Ah, indeed - that is a curious case that could be surprising to a
>>>>> user. Thanks for explaining it - though it seems gdb doesn't special
>>>>> case this & does produce the not-quite-what-the-user-intended
>>>>> behavior.
>>>>> 
>>>>>> You might try to argue that they should have checked where the breakpoint actually landed before coming into your office to yell at you, but it's likely to leave you with fewer friends...
>>>>> 
>>>>> FWIW: I don't have coworkers or friends come into my office to yell at
>>>>> me about anything like this, and I don't really think anyone should -
>>>>> that's not appropriate behavior for a workplace.
>>>>> 
>>>>> Having a nuanced discussion about the tradeoff of features - sure.
>>>> 
>>>> Yeah, IME people get cheesed off at debuggers in ways they don't with compilers.  I have some theories, but more of the after work quality than the public mailing list quality.
>>>> 
>>>>>> So I was trying to detect this case and not move the breakpoint if sliding it crossed over the function start boundary.  That way you'd see that the breakpoint didn't work, and go figure out why.
>>>>> 
>>>>> Neat idea!
>>>>> 
>>>>>> The thinko in the original version was that we were still doing this when we DIDN'T have to slide the breakpoint, when we got an exact match in the line table.  In that case, we shouldn't try to second guess the line table at all.  That's the patch in this fix.
>>>>> 
>>>>> Might this still leave some confusing behavior. Now it'll slide
>>>>> forward into a function, but it won't slide forward into an
>>>>> initializer? (so now the user has to know exactly which lines of the
>>>>> initializer are attributed to the line table, making that somewhat
>>>>> difficult to use?)
>>>> 
>>>> 
>>>> Yes I knew this was a tradeoff of the implementation.  If I did a more in depth search in order to slide this over to function initializer, it would IMO add way more complexity to an already overly fiddly part of the debugger, and the trade-off didn't seem worth it.  Most initializers have a fairly simple structure, as opposed say to the before the body part of a function definition (that's got to have a name, but I don't know it off the top of my head.)  So I think the payoff would be a lot less in this area.  I asked around here and the folks I polled agreed that it wasn't worth the complexity.
>>>> 
>>>>> 
>>>>> Testing some things out:
>>>>> 
>>>>> $ cat break.cpp
>>>>> __attribute__((optnone)) int f1() { return 1; }
>>>>> struct t1 {
>>>>> int     // 3
>>>>>    i   // 4
>>>>>    =   // 5
>>>>>    f1  // 6
>>>>>    ();
>>>>> t1() {
>>>>> }
>>>>> };
>>>>> 
>>>>> t1 v1;
>>>>> int main() {
>>>>> }
>>>>> 
>>>>> The line table:
>>>>> 
>>>>> Address            Line   Column File   ISA Discriminator Flags
>>>>> ------------------ ------ ------ ------ --- ------------- -------------
>>>>> 0x0000000000401140      1      0      1   0             0  is_stmt
>>>>> 0x0000000000401144      1     37      1   0             0  is_stmt prologue_end
>>>>> 0x0000000000401150     13      0      1   0             0  is_stmt
>>>>> 0x0000000000401154     14      1      1   0             0  is_stmt prologue_end
>>>>> 0x0000000000401158     14      1      1   0             0  is_stmt end_sequence
>>>>> 0x0000000000401020      0      0      1   0             0  is_stmt
>>>>> 0x0000000000401024     12      4      1   0             0  is_stmt prologue_end
>>>>> 0x0000000000401040      0      0      1   0             0  is_stmt
>>>>> 0x000000000040104b      0      0      1   0             0  is_stmt end_sequence
>>>>> 0x0000000000401160      8      0      1   0             0  is_stmt
>>>>> 0x0000000000401174      6      7      1   0             0  is_stmt prologue_end
>>>>> 0x000000000040117f      4      7      1   0             0  is_stmt
>>>>> 0x0000000000401181      9      3      1   0             0  is_stmt
>>>>> 0x0000000000401187      9      3      1   0             0  is_stmt end_sequence
>>>>> 
>>>>> Has entries for line 4 and 6 from the initializer (and 8 and 9 from
>>>>> the ctor), but gdb doesn't seem to want to break on any of them. So
>>>>> I'm not sure what gdb's doing, but it seems pretty unhelpful
>>>>> (apparently gdb breaks in to the ctor too late to even let you step
>>>>> into the initializers... guess that's because the initializers are
>>>>> called in the prologue according to the debug info):
>>>>> (gdb) b 1
>>>>> Breakpoint 1 at 0x401144: file break.cpp, line 1.
>>>>> (gdb) b 2
>>>>> Breakpoint 2 at 0x401181: file break.cpp, line 9.
>>>>> (gdb) b 3
>>>>> Note: breakpoint 2 also set at pc 0x401181.
>>>>> Breakpoint 3 at 0x401181: file break.cpp, line 9.
>>>>> (gdb) b 4
>>>>> Note: breakpoints 2 and 3 also set at pc 0x401181.
>>>>> Breakpoint 4 at 0x401181: file break.cpp, line 9.
>>>>> (gdb) b 5
>>>>> Note: breakpoints 2, 3 and 4 also set at pc 0x401181.
>>>>> Breakpoint 5 at 0x401181: file break.cpp, line 9.
>>>>> (gdb) b 6
>>>>> Note: breakpoints 2, 3, 4 and 5 also set at pc 0x401181.
>>>>> Breakpoint 6 at 0x401181: file break.cpp, line 9.
>>>>> (gdb) b 7
>>>>> Note: breakpoints 2, 3, 4, 5 and 6 also set at pc 0x401181.
>>>>> Breakpoint 7 at 0x401181: file break.cpp, line 9.
>>>>> (gdb) b 8
>>>>> Note: breakpoints 2, 3, 4, 5, 6 and 7 also set at pc 0x401181.
>>>>> Breakpoint 8 at 0x401181: file break.cpp, line 9.
>>>>> (gdb) b 9
>>>>> Note: breakpoints 2, 3, 4, 5, 6, 7 and 8 also set at pc 0x401181.
>>>>> Breakpoint 9 at 0x401181: file break.cpp, line 9.
>>>>> 
>>>>> lldb does already seem to let me break on the initializer without this
>>>>> patch - so any idea what part of this already works compared to what's
>>>>> being fixed here? (though lldb breaks on line 6 even when I asked to
>>>>> break on line 7 or 8, but it doesn't break on 6 when I ask to break on
>>>>> 6... )
>>>> 
>>>> 
>>>> Yes.  This is because the line table entry for the "first line" of a function is also imprecise.  For instance, if you do:
>>>> 
>>>> int
>>>> foo()
>>>> {
>>>> 
>>>> }
>>>> 
>>>> the "int" line in the function definition pretty much never gets a line table entry, yet that's a perfectly reasonable place to set a breakpoint on the function.  So the "am I sliding across function definitions" check has a little slop in the comparison.  Just put a handful of blank lines between the "();" of the struct initialization and the definition of the constructor, and then the breakpoint won't take.
>>> 
>>> Ah, right right - I was still a bit confused by why the line was
>>> showing as line 6, but I get it - it's because that's where the first
>>> instruction for function is. Hmm, actually that doesn't quite make
>>> sense to me still - if I set a breakpoint at line 9, why wouldn't I
>>> get a breakpoint on that line? I guess maybe lldb detects that that is
>>> the line where a function starts according to the DW_AT_decl_line in
>>> the DW_TAG_subprogram, and instead of breaking on line 9 per the line
>>> table, it breaks on the start of the function which is line 6? Though
>>> when the program... huh, it does stop at 6, even though I said break 9
>>> (well, I said break 13 in my new example where I've added the
>>> suggested handful of blank lines)
>>> 
>>> Seems a bit surprising/not something I can readily explain:
>>> 
>>> $ cat -n test.cpp
>>>    1  __attribute__((optnone)) int f1() { return 1; }
>>>    2  struct t1 {
>>>    3    int     // 3
>>>    4        i   // 4
>>>    5        =   // 5
>>>    6        f1  // 6
>>>    7        ();
>>>    8
>>>    9
>>>   10
>>>   11
>>>   12
>>>   13    t1() {}
>>>   14  };
>>>   15  t1 v1;
>>>   16  int main() {}
>>> blaikie at blaikie-linux2:~/dev/scratch$ lldb ./a.out
>>> (lldb) target create "./a.out"
>>> Current executable set to
>>> '/usr/local/google/home/blaikie/dev/scratch/a.out' (x86_64).
>>> (lldb) b 13
>>> Breakpoint 1: where = a.out`t1::t1() + 20 at test.cpp:6:7, address =
>>> 0x0000000000401174
>>> (lldb) r
>>> Process 554103 launched:
>>> '/usr/local/google/home/blaikie/dev/scratch/a.out' (x86_64)
>>> Process 554103 stopped
>>> * thread #1, name = 'a.out', stop reason = breakpoint 1.1
>>>   frame #0: 0x0000000000401174 a.out`t1::t1(this=0x000000000040402c)
>>> at test.cpp:6:7
>>>  3      int     // 3
>>>  4          i   // 4
>>>  5          =   // 5
>>> -> 6          f1  // 6
>>>  7          ();
>>>  8
>>>  9
>>> (lldb)
>> 
>> There's another "fiddly" bit of the breakpoint setting that likely explains what you are seeing.  If you set breakpoint (either by name or by file & line) that happens to fall inside the prologue of a function, lldb moves the breakpoint to the first instruction after the prologue.  If we don't do this then when you stop in the function at this breakpoint, argument values are likely to be incorrect, since the debug information generally only gives correct locations after the prologue.  It also makes getting the backtrace right trickier since you then have to understand where in the prologue you are.
>> 
>> Since not very many people actually need to debug the prologue sequence of a function, this avoids a lot of confusion at the cost of a just a smidge more mystery.  gdb does this as well (or at least it did the last time I played with it.)  There is a setting to not skip the prologue, which you need to do for instance if you want to ensure that you stop while the arguments are still all in their ABI specified locations.  But the default is to do the slide.
>> 
>> So if the first thing that happens after the prologue in t1() is the initialization of i, then the sliding of the breakpoint past the prologue would move it from line 13 to line 6, as you saw.  And we don't let the prologue slide cross function boundaries so we don't do a "cross function boundaries" check in that move.
> 
> Hmm, yep, that seems to be it. Which just makes me more confused about
> what gdb's doing then (when I set a breakpoint at the function, it
> seems to be breaking after all the initializers have run - well beyond
> the prologue - perhaps that's what they've done to try to get the user
> to where the member initializers have finished running).

Interesting.  I haven't worked on gdb in decades so I don't have any insight there.

I wouldn't personally move the breakpoint of a constructor past the initializers.  Seems like getting back into the initializers if you wanted to would be tedious.  And if you were counting on it on the first hit you would have already missed the event you were looking for.  But I could see somebody getting annoyed with stepping through all those boring initializers and doing it the other way.

> 
> Thanks for talking me through it.
> 

Thanks for giving this a thorough look!

Jim

> - Dave
> 
>> 
>> If somebody has some spare time on their hands it might be fun to try to build up a little report describing these moves, and then store that in the location and print it out with "break list -v".  That would help when you are puzzling over how your breakpoint got where it did...
>> 
>> Jim
>> 
>> 
>> 
>>> 
>>>>> (lldb) target create "./a.out"
>>>>> Current executable set to
>>>>> '/usr/local/google/home/blaikie/dev/scratch/a.out' (x86_64).
>>>>> (lldb) b 1
>>>>> Breakpoint 1: where = a.out`f1() + 4 at break.cpp:1:37, address =
>>>>> 0x0000000000401144
>>>>> (lldb) b 2
>>>>> Breakpoint 2: no locations (pending).
>>>>> WARNING:  Unable to resolve breakpoint to any actual locations.
>>>>> (lldb) b 3
>>>>> Breakpoint 3: no locations (pending).
>>>>> WARNING:  Unable to resolve breakpoint to any actual locations.
>>>>> (lldb) b 4
>>>>> Breakpoint 4: no locations (pending).
>>>>> WARNING:  Unable to resolve breakpoint to any actual locations.
>>>>> (lldb) b 5
>>>>> Breakpoint 5: no locations (pending).
>>>>> WARNING:  Unable to resolve breakpoint to any actual locations.
>>>>> (lldb) b 6
>>>>> Breakpoint 6: no locations (pending).
>>>>> WARNING:  Unable to resolve breakpoint to any actual locations.
>>>>> (lldb) b 7
>>>>> Breakpoint 7: where = a.out`t1::t1() + 20 at break.cpp:6:7, address =
>>>>> 0x0000000000401174
>>>>> (lldb) b 8
>>>>> Breakpoint 8: where = a.out`t1::t1() + 20 at break.cpp:6:7, address =
>>>>> 0x0000000000401174
>>>>> (lldb) b 9
>>>>> Breakpoint 9: where = a.out`t1::t1() + 33 at break.cpp:9:3, address =
>>>>> 0x0000000000401181
>>>>> (lldb) r
>>>>> 
>>>>>> BTW, the check wouldn't have affected code from .defs files because I only do it if the original breakpoint specification and the "function start" are from the same source file.
>>>>> 
>>>>> Sure enough - no doubt someone out there has some file that #includes
>>>>> itself in some entertaining way, but likely quite rare.
>>>> 
>>>> Given that the penalty you have to pay for such a deed is having to be a little more careful how you set breakpoints, I don't feel the need for super-human efforts in this area.
>>>> 
>>>> Jim
>>>> 
>>>> 
>>>>> 
>>>>>> And we know about inlined blocks so inlining isn't going to fool us either.
>>>>>> 
>>>>>> Jim
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Jan 15, 2021, at 5:09 PM, David Blaikie <dblaikie at gmail.com> wrote:
>>>>>>> 
>>>>>>> "But because their source lines are outside the function source range"
>>>>>>> 
>>>>>>> Not sure I understand that - the DWARF doesn't describe a function
>>>>>>> source range, right? Only the line a function starts on. And a
>>>>>>> function can have code from source lines in many files/offsets that
>>>>>>> are unrelated to the function start line (LLVM in several places
>>>>>>> #includes .def files into functions to stamp out tables, switches,
>>>>>>> arrays, etc, for instance)
>>>>>>> 
>>>>>>> On Fri, Jan 15, 2021 at 4:47 PM Jim Ingham via Phabricator via
>>>>>>> lldb-commits <lldb-commits at lists.llvm.org> wrote:
>>>>>>>> 
>>>>>>>> jingham created this revision.
>>>>>>>> jingham requested review of this revision.
>>>>>>>> Herald added a project: LLDB.
>>>>>>>> Herald added a subscriber: lldb-commits.
>>>>>>>> 
>>>>>>>> The inline initializers contribute code to the constructor(s).  You will step onto them in the source view as you step through the constructor, for instance.  But because their source lines are outside the function source range, lldb thought a breakpoint on the initializer line was crossing from one function to another, which file & line breakpoints don't allow.  That meant if you tried to set a breakpoint on one of these lines it doesn't create any locations.
>>>>>>>> 
>>>>>>>> This patch fixes that by asserting that if the LineEntry in one of the SymbolContexts that the search produced exactly matches the file & line specifications in the breakpoint, it has to be a valid place to set the breakpoint, and we should just set it.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Repository:
>>>>>>>> rG LLVM Github Monorepo
>>>>>>>> 
>>>>>>>> https://reviews.llvm.org/D94846
>>>>>>>> 
>>>>>>>> Files:
>>>>>>>> lldb/source/Breakpoint/BreakpointResolverFileLine.cpp
>>>>>>>> lldb/test/API/lang/cpp/break-on-initializers/Makefile
>>>>>>>> lldb/test/API/lang/cpp/break-on-initializers/TestBreakOnCPP11Initializers.py
>>>>>>>> lldb/test/API/lang/cpp/break-on-initializers/main.cpp
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> lldb-commits mailing list
>>>>>>>> lldb-commits at lists.llvm.org
>>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits