[PATCH] D24632: Preserve the debug location when sinking compare instructions

Dehao Chen via llvm-commits llvm-commits at lists.llvm.org
Mon Sep 26 10:11:17 PDT 2016


Sorry for the late reply, this fell out of my radar...

The patch looks good to me, but I'm not a maintainer of debug info, David,
could you help review/approve this patch?

Thanks,
Dehao

On Mon, Sep 19, 2016 at 5:27 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com
> wrote:

> Hi Dehao,
>
> sorry for the late reply (I was on holiday on friday, so I couldn't
> immediately reply).
>
>
> On Fri, Sep 16, 2016 at 1:10 AM, Dehao Chen <danielcdh at gmail.com> wrote:
>
>> I can reproduce the assembly. But looks to me the test instruction in
>> LBB0_3 should only matter if x == 1. And it will execute at most once every
>> time the function is called.
>>
>
>         .loc    1 3 3 is_stmt 0 discriminator 1 #
> test.cpp:3:3                                cmpl    $1, %edi
>         testb   $1, %dil
>                                             je      .LBB0_18
>         jne     .LBB0_3
>
> The TEST checks if the lower bit of EDI is set. So, we branch based on
> whether 'x' is even or odd, and _not_ if x == 1. The outer loop has been
> specialized for the case where there is a (non-zero) even number of
> iterations (the induction variable is incremented by 2 instead of 1). That
> said, I definitely agree that the above TEST is only performed "at most
> once" per function call, since the instruction is effectively part of the
> outer loop preheader.
>
> However, that is not the only instruction affected by this patch!
> There are two other important places where the debug location (line 5:1)
> was not correctly propagated. When I originally posted this bug in our
> internal bugzilla, I only mentioned this first case. In retrospect, I
> should have mentioned the other two (more important) cases.
>
> If you look carefully at the diff in the assembly before/after this patch,
> you would easily spot other two (very important!) places where we now
> propagate debug location for line 5 (discr. 1).
>
> This is probably the most important case:
>
> Before:
> .LBB0_8:                                # %for.cond1.preheader
>                                         # =>This Loop Header: Depth=1
>                                         #     Child Loop BB0_10 Depth 2
>                                         #     Child Loop BB0_15 Depth 2
>         testl   %esi, %esi
>         movl    $0, %r9d
>         je      .LBB0_13
> -----
>
> After:
>
> .LBB0_8:                                # %for.cond1.preheader
>                                         # =>This Loop Header: Depth=1
>                                         #     Child Loop BB0_10 Depth 2
>                                         #     Child Loop BB0_15 Depth 2
> .Ltmp10:
>         .loc    1 5 28 discriminator 1  # test.cpp:5:28
>         testl   %esi, %esi
>         movl    $0, %r9d
>         je      .LBB0_13
>
> ----
>
> Before this patch, the TEST was not counted against line 5 (discriminator
> 1). So we ended up losing coverage for line 5. To be more specific, every
> time this function was called, we ended up losing `x-1` counts related to
> line 5 discriminator 1.
>
> Note that the other important place is this one:
>
> -----
> Before:
>
> .LBB0_13:                               # %cleanup
>                                         #   in Loop: Header=BB0_8 Depth=1
>         .loc    1 11 12 is_stmt 1       # test.cpp:11:12
>         addl    %eax, %r9d
> .Ltmp12:
>         #DEBUG_VALUE: foo:Result <- %R9D
>         xorl    %eax, %eax
>         testl   %esi, %esi
>         je      .LBB0_17
>
>
> -----
> After:
> .LBB0_13:                               # %cleanup
>                                         #   in Loop: Header=BB0_8 Depth=1
>         .loc    1 11 12 is_stmt 1       # test.cpp:11:12
>         addl    %eax, %r9d
> .Ltmp14:
>         #DEBUG_VALUE: foo:Result <- %R9D
>         xorl    %eax, %eax
> .Ltmp15:
>         .loc    1 5 28 discriminator 1  # test.cpp:5:28
>         testl   %esi, %esi
>         je      .LBB0_17
>
>
> So I'd assume its sample count will always be 0, thus it should not
>> contribute to the max count of that line?
>>
>> Deha
>>
>>
>
> The missing debug location for the three compare instructions is
> definitely going to affect the count. This problem was noticed when
> analyzing the sample distribution obtained from autoFDO (by looking at the
> generated text format). The number of 'hits' on line 5 (discriminator 1)
> was surprisingly very low and it didn't match what I was expecting to see.
> In particular, it was way too small compared to the number of hits on the
> condition of the outer loop. That's how I spotted this issue (which I
> eventually tracked down to CodeGenPrepare).
>

Thanks for the explanation. I'm still confused why line 5 (discriminator 1)
has very small count. Without the patch, it appears in both and BB16, both
of which are in the depth-2 loop. The newly annotated place in this patch
(L13) is actually in the depth-1 loop?

# BB#11:                                #   in Loop: Header=BB0_10 Depth=2
        .loc    1 5 28 discriminator 1  # b.cc:5:28
        incq    %rcx
        .loc    1 5 5 is_stmt 0 discriminator 1 # b.cc:5:5
        cmpl    %esi, %ecx
        jb      .LBB0_10

....

# BB#16:                                #   in Loop: Header=BB0_15 Depth=2
        .loc    1 5 33 is_stmt 1 discriminator 3 # b.cc:5:33
        incl    %ecx
        xorl    %eax, %eax
        .loc    1 5 28 is_stmt 0 discriminator 1 # b.cc:5:28
        cmpl    %esi, %ecx
        jb      .LBB0_15



>
> P.s.:
> Regardless of sample pgo, this patch would be an improvement in terms of
> debuggability of optimized code. It fixes an obvious case where a
> transformation forgot to propagate the debug location on the "optimized"
> instructions.
>
> I hope this helps,
> - Andrea
>
>
>> On Thu, Sep 15, 2016 at 4:01 PM, Pieb, Wolfgang <wolfgang.pieb at sony.com>
>> wrote:
>>
>>> The original test case was the following:
>>>
>>>
>>>
>>> /////
>>>
>>> int foo(unsigned x, unsigned y, int *z, int *h) {
>>>
>>>   int Result = 0;
>>>
>>>   for (unsigned i = 0; i < x; i++) {    // line 3
>>>
>>>     int Val = 0;
>>>
>>>     for (unsigned j = 0; j < y; ++j) {  // line 5
>>>
>>>       if (Val == z[j]) {                // line 6
>>>
>>>         Val += 42;                      // line 7
>>>
>>>         break;
>>>
>>>       }
>>>
>>>     }
>>>
>>>     Result += Val;                      // line 9
>>>
>>>   }
>>>
>>>
>>>
>>>   return Result;
>>>
>>> }
>>>
>>> /////
>>>
>>>
>>>
>>> Quoting Andrea’s investigation:
>>>
>>>
>>>
>>> At –O2, the inner loop at line 5 is rotated. The comparison `j < y` at iteration 0 of the inner loop becomes a comparison between 'y' and zero.
>>>
>>> Since 'y' is a loop invariant (for both loops), the compiler moves the icmp between 'y' and zero in
>>>
>>> the outer loop preheader.
>>>
>>>
>>>
>>> Later on, CodeGenPrepare decides to sink the icmp within the basic block that uses it. So, we end
>>>
>>> up rematerializing the computation in its original place (after loop rotation).
>>>
>>> However, this time, the rematerialized instruction doesn't seem to be associated with any debug info.
>>>
>>>
>>>
>>> Compiling the test case with –O2 –g –s you see assembly code:
>>>
>>>
>>>
>>> .LBB0_3:                                # %for.cond1.preheader.prol
>>>
>>>         movl    $1, %r8d
>>>
>>>         xorl    %eax, %eax
>>>
>>>         testl   %esi, %esi
>>>
>>>         je      .LBB0_7
>>>
>>>
>>>
>>>
>>>
>>> Instead of:
>>>
>>>
>>>
>>> .LBB0_3:                                # %for.cond1.preheader.prol
>>>
>>>         movl    $1, %r8d
>>>
>>>         xorl    %eax, %eax
>>>
>>> .Ltmp4:
>>>
>>>         .loc    1 5 28 is_stmt 1 discriminator 1 # test.cpp:5:28
>>>
>>>         testl   %esi, %esi
>>>
>>>         je      .LBB0_7
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> -- wolfgang
>>>
>>>
>>>
>>>
>>>
>>> *From:* Dehao Chen [mailto:danielcdh at gmail.com]
>>> *Sent:* Thursday, September 15, 2016 3:32 PM
>>> *To:* David Blaikie
>>> *Cc:* reviews+D24632+public+5e33fe097ec6bcd7 at reviews.llvm.org; Pieb,
>>> Wolfgang; aprantl at apple.com; andrea.dibiagio at gmail.com; llvm-commits;
>>> Junbum Lim
>>> *Subject:* Re: [PATCH] D24632: Preserve the debug location when sinking
>>> compare instructions
>>>
>>>
>>>
>>> For AutoFDO, we want to remove(or set 0-line) debug info for the
>>> instructions that could be executed speculatively (i.e. instruction gets
>>> more count that it should). For this patch, it's more like code cloning
>>> without speculation. Basically the cloned code will split the profile
>>> counts among different copies, so its effect will be: make the profiled
>>> count smaller than it should have. When we post-process samples, we intend
>>> to use the max_count we get from all cloned instructions (well, technically
>>> not exactly "max", there is some voting to prevent from speculated case).
>>> As a result, we don't want to lose debug info for clones (as soon as it's
>>> not speculated clones) because it gives us better chance to capture the
>>> "max".
>>>
>>>
>>>
>>> The same applies to debugging. We don't want to see gdb go to speculated
>>> source locations as we don't expect it getting executed. But for clones, as
>>> soon as it's actually logically executed, we want to have the source
>>> location reserved.
>>>
>>>
>>>
>>> I'm wondering what's the source code that has been affected by this.
>>>
>>>
>>>
>>> Dehao
>>>
>>>
>>>
>>> On Thu, Sep 15, 2016 at 2:56 PM, David Blaikie <dblaikie at gmail.com>
>>> wrote:
>>>
>>> This seems like the opposite of the other recent change to remove/zero
>>> out the debug location of an instruction being commoned into a preceeding
>>> basic block.
>>>
>>> Adding Dehao.
>>>
>>> Wouldn't this hurt profile based optimizations by attributing code
>>> passing through other.bb to the entry? Making the Entry seem more
>>> common than it is?
>>>
>>> On Thu, Sep 15, 2016 at 2:52 PM Wolfgang Pieb <wolfgang.pieb at sony.com>
>>> wrote:
>>>
>>> wolfgangp created this revision.
>>> wolfgangp added reviewers: aprantl, dblaikie.
>>> wolfgangp added subscribers: llvm-commits, andreadb.
>>>
>>> When the CodeGenPrepare pass sinks a compare instruction into the basic
>>> block of a user, it should preserve its debug location. Not doing so
>>> negatively affects source line attribution for debugging and AutoFDO.
>>>
>>> Patch by Andrea Di Biagio
>>>
>>> https://reviews.llvm.org/D24632
>>>
>>> Files:
>>>   lib/CodeGen/CodeGenPrepare.cpp
>>>   test/DebugInfo/Generic/sunk-compare.ll
>>>
>>> Index: test/DebugInfo/Generic/sunk-compare.ll
>>> ===================================================================
>>> --- test/DebugInfo/Generic/sunk-compare.ll
>>> +++ test/DebugInfo/Generic/sunk-compare.ll
>>> @@ -0,0 +1,46 @@
>>> +; RUN: opt -S -codegenprepare < %s | FileCheck %s
>>> +;
>>> +; This test case has been generated by hand but is inspired by the
>>> +; observation that compares that are sunk into the basic blocks where
>>> +; their results are used did not retain their debug locs. This caused
>>> +; sample profiling to attribute code to the wrong source lines.
>>> +;
>>> +; We check that the compare instruction retains its debug loc after
>>> +; it is sunk into other.bb by the codegen prepare pass.
>>> +;
>>> +; CHECK:       other.bb:
>>> +; CHECK-NEXT:  icmp{{.*}}%x, 0, !dbg ![[MDHANDLE:[0-9]*]]
>>> +; CHECK:       ![[MDHANDLE]] = !DILocation(line: 2
>>> +;
>>> +target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
>>> +
>>> +define i32 @_Z3fooii(i32 %x, i32 %y) !dbg !5 {
>>> +entry:
>>> +  %cmp17 = icmp sgt i32 %x, 0, !dbg !6
>>> +  br label %other.bb, !dbg !6
>>> +
>>> +other.bb:
>>> +  br i1 %cmp17, label %exit1.bb, label %exit2.bb, !dbg !7
>>> +
>>> +exit1.bb:
>>> +  %0 = add i32 %y, 42, !dbg !8
>>> +  ret i32 %0, !dbg !8
>>> +
>>> +exit2.bb:
>>> +  ret i32 44, !dbg !9
>>> +
>>> +}
>>> +
>>> +!llvm.dbg.cu = !{!0}
>>> +!llvm.module.flags = !{!3, !4}
>>> +
>>> +!0 = distinct !DICompileUnit(language: DW_LANG_C, file: !1, producer:
>>> "clang", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug,
>>> enums: !2, globals: !2)
>>> +!1 = !DIFile(filename: "test.cpp", directory: "/debuginfo/bug/cgp")
>>> +!2 = !{}
>>> +!3 = !{i32 2, !"Dwarf Version", i32 4}
>>> +!4 = !{i32 2, !"Debug Info Version", i32 3}
>>> +!5 = distinct !DISubprogram(name: "foo", linkageName: "foo", scope: !1,
>>> file: !1, line: 1, isLocal: false, isDefinition: true, scopeLine: 1, flags:
>>> DIFlagPrototyped, isOptimized: false, unit: !0, variables: !2)
>>> +!6 = !DILocation(line: 2, column: 0, scope: !5)
>>> +!7 = !DILocation(line: 3, column: 0, scope: !5)
>>> +!8 = !DILocation(line: 4, column: 0, scope: !5)
>>> +!9 = !DILocation(line: 5, column: 0, scope: !5)
>>> Index: lib/CodeGen/CodeGenPrepare.cpp
>>> ===================================================================
>>> --- lib/CodeGen/CodeGenPrepare.cpp
>>> +++ lib/CodeGen/CodeGenPrepare.cpp
>>> @@ -926,6 +926,8 @@
>>>        InsertedCmp =
>>>            CmpInst::Create(CI->getOpcode(), CI->getPredicate(),
>>>                            CI->getOperand(0), CI->getOperand(1), "",
>>> &*InsertPt);
>>> +      // Propagate the debug info.
>>> +      InsertedCmp->setDebugLoc(CI->getDebugLoc());
>>>      }
>>>
>>>      // Replace a use of the cmp with a use of the new cmp.
>>>
>>>
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160926/c129be61/attachment.html>


More information about the llvm-commits mailing list