[cfe-dev] RFC: Remove uninteresting debug locations at -O0

Wed Apr 29 07:59:12 PDT 2020

Broadly, I would like to instead put effort into statement location
tracking instead of going down this path. With statement markers, the
debugger could step over the whole statement, if that's the desired
stepping behavior.

I think that the source location of loads from allocas is often interesting
in C++, where the majority of class local variables end up being
address-taken and mem2reg / SROA do not fire. Consider
llvm::SmallVectorBase::grow_pod, which makes every SmallVector of
unknowable length address taken.

If I were to put a watchpoint on the variable, I would want precise source
location information for the load that accesses it.

Instrumentation tools (PGO & ASan) will probably want fine-grained source
location information for cases like this:
  int x;
  escape(&x);
  return someCondition ? x : barFunc();
The load from x is trivial, but it will be the only source location in its
basic block.

My interest in tracking statement boundaries has been renewed because we
need it to support the Visual Studio set next statement feature:
https://crbug.com/1061084

I wouldn't strongly object to moving in the direction you are proposing if
you go with it. I just wanted to raise an alternative for discussion.

On Tue, Apr 28, 2020 at 1:57 PM Adrian Prantl <aprantl at apple.com> wrote:

> Getting source location information right is tricky and all about finding
> a balance.
> Recently, I was wondering why stepping through this contrived example
>
>    1       struct Foo {
>    2         Foo *getFoo() { return this; }
>    3       };
>    4
>    5       int main(int argc, char **argv) {
>    6         Foo *foo = new Foo();
>    7         foo->getFoo()->getFoo();
>    8         return 0;
>    9       }
>
> LLDB was showing the column marker as
>
>    7         foo->getFoo()->getFoo();
>              ^^^
>
> focussing on foo instead of at the method call getFoo() that I was
> expecting.
>
> In LLVM IR, this code looks like
>
>   %1 = load %struct.Foo*, %struct.Foo** %foo, align 8, !dbg !30
>   %call1 = call %struct.Foo* @_ZN3Foo6getFooEv(%struct.Foo* %1), !dbg !31
>   %call2 = call %struct.Foo* @_ZN3Foo6getFooEv(%struct.Foo* %call1), !dbg
> !32
>
> or, in x86_64 assembler:
>
>   .loc 1 7 3 is_stmt 1 ## column_info.cpp:7:3
>   movq -24(%rbp), %rdi
>   .loc 1 7 8 is_stmt 0 ## column_info.cpp:7:8
>   callq __ZN3Foo6getFooEv
>   .loc 1 7 18 ## column_info.cpp:7:18
>
> The spurious (7:3) location is attached to an instruction that is the load
> of the variable from the stack slot, fused with moving that value into the
> register the ABI defines for $arg0.
>
> I’m postulating that the source location of the LLVM IR load is
> uninteresting and perhaps even harmful. It is uninteresting, because at
> -O0, the location does not refer to explicit code that the user wrote and
> thus causes unintuitive stepping, and with optimizations enabled, there is
> a high likelihood that the entire instruction is going to be eliminated
> because of mem2reg, so the effect on profiling should be minimal. Since the
> load is from an alloca, it also cannot crash under normal operation. The
> location is harmful, because loads (at least on a CISC instruction set) are
> often fused with other instructions and having conflicting locations will
> cause both locations to be dropped when merged.
>
> Based on all this I would most like to assign a form of “weak” source
> location to loads from allocas generated by the Clang frontend, that looses
> against any other source location when merged. The closest thing we have to
> this in LLVM IR is attaching no debug location. An instruction without a
> debug location either belongs to the function prologue, or will inherit
> whatever debug location the instruction before it has. In this particular
> case I think that no debug location is preferable over line 0 (which is how
> we usually denote compiler-generated code) because we don’t want the load
> instruction’s source location to erase any source location it may get
> merged with. One thing I need to check is what happens when an instruction
> without a location is the first instruction in a basic block. We may need
> to make an exception for that case.
>
> To summarize, I’m proposing to delete all debug locations from
> instructions generated in the Clang frontend for loads from allocas that
> are holding source variables to improve the debug experience at -O0. This
> will have little effect on optimized code.
>
> Let me know what you think!
> -- adrian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200429/5ff8d623/attachment-0001.html>