[cfe-dev] RFC: Remove uninteresting debug locations at -O0

David Blaikie via cfe-dev cfe-dev at lists.llvm.org
Fri May 1 08:35:55 PDT 2020


On Thu, Apr 30, 2020 at 8:42 PM Adrian Prantl <aprantl at apple.com> wrote:

>
>
> On Apr 29, 2020, at 2:58 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Wed, Apr 29, 2020 at 2:52 PM Reid Kleckner <rnk at google.com> wrote:
>
>> Sure, I can try to elaborate, but I am assuming a bit about what the
>> users desired stepping behavior is. I am assuming in this case that the
>> user is doing the equivalent of `s` or `n` in gdb, and they want to get
>> from one semicolon to the next. If that is not the case, if the user wants
>> the debugger to stop at the location of both getFoo() calls in your
>> example, my suggestion isn't helpful.
>>
>> However, since the last dev meeting, I have been thinking about having IR
>> intrinsics that mark the IR position where a statement begins. The
>> intrinsic would carry a source location. The next instruction emitted with
>> that source location would carry the DWARF is_stmt line table flag.
>>
>> Making this idea work with optimizations is harder, since the
>> instructions that make up a statement may all be removed or hoisted. To
>> make this work, we would have to establish which instructions properly
>> belong to the statement. This could be done by adding a level to the
>> DIScope hierarchy. The statement markers would remain in place, and code
>> motion would happen around them.
>>
>
> In this case there would be no intrinsic, just a new kind of "statement"
> DIScope? yeah, that's the best idea I've had/heard of/discussed for
> statement grouping so far. Certainly seems imminently prototype-able & then
> get a sense for just how expensive scoping everything like that would be.
>
>
>> Late in the codegen pipeline, the first instruction belonging to the most
>> recently activated statement will be emitted with the is_stmt flag.
>>
>> On Wed, Apr 29, 2020 at 9:36 AM Adrian Prantl <aprantl at apple.com> wrote:
>>
>>> Thanks to all of you for sharing your perspective! You all brought up
>>> important aspects that I hadn't considered. The argument that really
>>> clicked for me was Pavel's that you might want to change the value before
>>> the load happens.
>>>
>>> One of my worries here is that what I called "less interesting"
>>> locations might delete more interesting ones when instructions and their
>>> locations are merged. But if we indeed consider the stack slot loads to be
>>> interesting then that is really the best we can do. After all, storing more
>>> than source location per instruction would be a UI design nightmare for a
>>> debugger.
>>>
>>
> Actually I'd kind of love this - I think it'd be really great to describe
> source ranges instead of source locations - probably super size costly
> though (& intermediate compiler memory usage, etc). Being able to describe
> nesting, etc - the same way Clang does with expression/subexpression
> highlighting I think would be wonderful - imagine if every instruction
> weren't just attributed to one location, but to a source range with a
> preferred location (instead of just pointing to the "+" for an add, be able
> to highlight the LHS and RHS of that plus expression, etc).
>
>
> I've been thinking about this, too. An extreme point of view is that
> today's editors are so syntax-aware that they wouldn't need the debugger to
> tell it where an expression ends, as long as the start point is
> unambiguous. For example, LLDB already uses clang for syntax highlighting.
> But having it pre-encoded makes a lot things easier and unambiguous.
>
> For the "+" infix operator example, you really need a third value in
> addition to the (start, length) that is otherwise sufficient. But in the
> normal case the length would be very short, so we could probably come up
> with a cheap encoding for it.
>

Arguably if you are relying on the syntax-awareness of the debugger/editor,
perhaps you could rely on that to know precedence, grouping, etc. So the
'+' would unambiguously describe the range? I think there's probably a lot
of cases where either you'd have to define some very subtle/specific
contract between producer and consumer to say "this position refers to this
conceptual entity, this position refers to this one, etc... " (I'm thinking
around for loops - how to describe different parts of the loop/body/etc -
the sort of thing that's been being discussed again on a few bugs lately)


>
> -- adrian
>
>
>
>
>>
>>>
>>> Reid, I would like to learn more about what you mean by statement
>>> tracking. I'm thinking of a statements as the expressions separated by
>>> semicolons, but since my whole example was a single statement I'm assuming
>>> you have something more fine-grained in mind. Perhaps you can post an
>>> example to illustrate what you have in mind?
>>>
>>>
>>> thanks!
>>> -- adrian
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200501/0317f908/attachment.html>


More information about the cfe-dev mailing list