[PATCH] D24632: Preserve the debug location when sinking compare instructions

Andrea Di Biagio via llvm-commits llvm-commits at lists.llvm.org
Mon Sep 26 10:43:15 PDT 2016


On Mon, Sep 26, 2016 at 6:11 PM, Dehao Chen <danielcdh at gmail.com> wrote:

> Sorry for the late reply, this fell out of my radar...
>
> The patch looks good to me, but I'm not a maintainer of debug info, David,
> could you help review/approve this patch?
>
> Thanks,
> Dehao
>
> On Mon, Sep 19, 2016 at 5:27 AM, Andrea Di Biagio <
> andrea.dibiagio at gmail.com> wrote:
>
>> Hi Dehao,
>>
>> sorry for the late reply (I was on holiday on friday, so I couldn't
>> immediately reply).
>>
>>
>> On Fri, Sep 16, 2016 at 1:10 AM, Dehao Chen <danielcdh at gmail.com> wrote:
>>
>>> I can reproduce the assembly. But looks to me the test instruction in
>>> LBB0_3 should only matter if x == 1. And it will execute at most once every
>>> time the function is called.
>>>
>>
>>         .loc    1 3 3 is_stmt 0 discriminator 1 #
>> test.cpp:3:3                                cmpl    $1, %edi
>>         testb   $1, %dil
>>                                             je      .LBB0_18
>>         jne     .LBB0_3
>>
>> The TEST checks if the lower bit of EDI is set. So, we branch based on
>> whether 'x' is even or odd, and _not_ if x == 1. The outer loop has been
>> specialized for the case where there is a (non-zero) even number of
>> iterations (the induction variable is incremented by 2 instead of 1). That
>> said, I definitely agree that the above TEST is only performed "at most
>> once" per function call, since the instruction is effectively part of the
>> outer loop preheader.
>>
>> However, that is not the only instruction affected by this patch!
>> There are two other important places where the debug location (line 5:1)
>> was not correctly propagated. When I originally posted this bug in our
>> internal bugzilla, I only mentioned this first case. In retrospect, I
>> should have mentioned the other two (more important) cases.
>>
>> If you look carefully at the diff in the assembly before/after this
>> patch, you would easily spot other two (very important!) places where we
>> now propagate debug location for line 5 (discr. 1).
>>
>> This is probably the most important case:
>>
>> Before:
>> .LBB0_8:                                # %for.cond1.preheader
>>                                         # =>This Loop Header: Depth=1
>>                                         #     Child Loop BB0_10 Depth 2
>>                                         #     Child Loop BB0_15 Depth 2
>>         testl   %esi, %esi
>>         movl    $0, %r9d
>>         je      .LBB0_13
>> -----
>>
>> After:
>>
>> .LBB0_8:                                # %for.cond1.preheader
>>                                         # =>This Loop Header: Depth=1
>>                                         #     Child Loop BB0_10 Depth 2
>>                                         #     Child Loop BB0_15 Depth 2
>> .Ltmp10:
>>         .loc    1 5 28 discriminator 1  # test.cpp:5:28
>>         testl   %esi, %esi
>>         movl    $0, %r9d
>>         je      .LBB0_13
>>
>> ----
>>
>> Before this patch, the TEST was not counted against line 5 (discriminator
>> 1). So we ended up losing coverage for line 5. To be more specific, every
>> time this function was called, we ended up losing `x-1` counts related to
>> line 5 discriminator 1.
>>
>> Note that the other important place is this one:
>>
>> -----
>> Before:
>>
>> .LBB0_13:                               # %cleanup
>>                                         #   in Loop: Header=BB0_8 Depth=1
>>         .loc    1 11 12 is_stmt 1       # test.cpp:11:12
>>         addl    %eax, %r9d
>> .Ltmp12:
>>         #DEBUG_VALUE: foo:Result <- %R9D
>>         xorl    %eax, %eax
>>         testl   %esi, %esi
>>         je      .LBB0_17
>>
>>
>> -----
>> After:
>> .LBB0_13:                               # %cleanup
>>                                         #   in Loop: Header=BB0_8 Depth=1
>>         .loc    1 11 12 is_stmt 1       # test.cpp:11:12
>>         addl    %eax, %r9d
>> .Ltmp14:
>>         #DEBUG_VALUE: foo:Result <- %R9D
>>         xorl    %eax, %eax
>> .Ltmp15:
>>         .loc    1 5 28 discriminator 1  # test.cpp:5:28
>>         testl   %esi, %esi
>>         je      .LBB0_17
>>
>>
>> So I'd assume its sample count will always be 0, thus it should not
>>> contribute to the max count of that line?
>>>
>>> Deha
>>>
>>>
>>
>> The missing debug location for the three compare instructions is
>> definitely going to affect the count. This problem was noticed when
>> analyzing the sample distribution obtained from autoFDO (by looking at the
>> generated text format). The number of 'hits' on line 5 (discriminator 1)
>> was surprisingly very low and it didn't match what I was expecting to see.
>> In particular, it was way too small compared to the number of hits on the
>> condition of the outer loop. That's how I spotted this issue (which I
>> eventually tracked down to CodeGenPrepare).
>>
>
> Thanks for the explanation. I'm still confused why line 5 (discriminator
> 1) has very small count. Without the patch, it appears in both and BB16,
> both of which are in the depth-2 loop. The newly annotated place in this
> patch (L13) is actually in the depth-1 loop?
>
> # BB#11:                                #   in Loop: Header=BB0_10 Depth=2
>         .loc    1 5 28 discriminator 1  # b.cc:5:28
>         incq    %rcx
>         .loc    1 5 5 is_stmt 0 discriminator 1 # b.cc:5:5
>         cmpl    %esi, %ecx
>         jb      .LBB0_10
>
> ....
>
> # BB#16:                                #   in Loop: Header=BB0_15 Depth=2
>         .loc    1 5 33 is_stmt 1 discriminator 3 # b.cc:5:33
>         incl    %ecx
>         xorl    %eax, %eax
>         .loc    1 5 28 is_stmt 0 discriminator 1 # b.cc:5:28
>         cmpl    %esi, %ecx
>         jb      .LBB0_15
>
>
Yes, the newly annotated place is now in depth-1 loop.
I understand your confusion, but to be honest, I can't remember all the
details about this bug (it was spotted a while ago). Unfortunately, I can't
remember how many times the inner loop was executed in the original source
code. The only thing I can say is that definitely there was something "odd"
with the counts for that line.

-Andrea


>
>>
>> P.s.:
>> Regardless of sample pgo, this patch would be an improvement in terms of
>> debuggability of optimized code. It fixes an obvious case where a
>> transformation forgot to propagate the debug location on the "optimized"
>> instructions.
>>
>> I hope this helps,
>> - Andrea
>>
>>
>>> On Thu, Sep 15, 2016 at 4:01 PM, Pieb, Wolfgang <wolfgang.pieb at sony.com>
>>> wrote:
>>>
>>>> The original test case was the following:
>>>>
>>>>
>>>>
>>>> /////
>>>>
>>>> int foo(unsigned x, unsigned y, int *z, int *h) {
>>>>
>>>>   int Result = 0;
>>>>
>>>>   for (unsigned i = 0; i < x; i++) {    // line 3
>>>>
>>>>     int Val = 0;
>>>>
>>>>     for (unsigned j = 0; j < y; ++j) {  // line 5
>>>>
>>>>       if (Val == z[j]) {                // line 6
>>>>
>>>>         Val += 42;                      // line 7
>>>>
>>>>         break;
>>>>
>>>>       }
>>>>
>>>>     }
>>>>
>>>>     Result += Val;                      // line 9
>>>>
>>>>   }
>>>>
>>>>
>>>>
>>>>   return Result;
>>>>
>>>> }
>>>>
>>>> /////
>>>>
>>>>
>>>>
>>>> Quoting Andrea’s investigation:
>>>>
>>>>
>>>>
>>>> At –O2, the inner loop at line 5 is rotated. The comparison `j < y` at iteration 0 of the inner loop becomes a comparison between 'y' and zero.
>>>>
>>>> Since 'y' is a loop invariant (for both loops), the compiler moves the icmp between 'y' and zero in
>>>>
>>>> the outer loop preheader.
>>>>
>>>>
>>>>
>>>> Later on, CodeGenPrepare decides to sink the icmp within the basic block that uses it. So, we end
>>>>
>>>> up rematerializing the computation in its original place (after loop rotation).
>>>>
>>>> However, this time, the rematerialized instruction doesn't seem to be associated with any debug info.
>>>>
>>>>
>>>>
>>>> Compiling the test case with –O2 –g –s you see assembly code:
>>>>
>>>>
>>>>
>>>> .LBB0_3:                                # %for.cond1.preheader.prol
>>>>
>>>>         movl    $1, %r8d
>>>>
>>>>         xorl    %eax, %eax
>>>>
>>>>         testl   %esi, %esi
>>>>
>>>>         je      .LBB0_7
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Instead of:
>>>>
>>>>
>>>>
>>>> .LBB0_3:                                # %for.cond1.preheader.prol
>>>>
>>>>         movl    $1, %r8d
>>>>
>>>>         xorl    %eax, %eax
>>>>
>>>> .Ltmp4:
>>>>
>>>>         .loc    1 5 28 is_stmt 1 discriminator 1 # test.cpp:5:28
>>>>
>>>>         testl   %esi, %esi
>>>>
>>>>         je      .LBB0_7
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> -- wolfgang
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From:* Dehao Chen [mailto:danielcdh at gmail.com]
>>>> *Sent:* Thursday, September 15, 2016 3:32 PM
>>>> *To:* David Blaikie
>>>> *Cc:* reviews+D24632+public+5e33fe097ec6bcd7 at reviews.llvm.org; Pieb,
>>>> Wolfgang; aprantl at apple.com; andrea.dibiagio at gmail.com; llvm-commits;
>>>> Junbum Lim
>>>> *Subject:* Re: [PATCH] D24632: Preserve the debug location when
>>>> sinking compare instructions
>>>>
>>>>
>>>>
>>>> For AutoFDO, we want to remove(or set 0-line) debug info for the
>>>> instructions that could be executed speculatively (i.e. instruction gets
>>>> more count that it should). For this patch, it's more like code cloning
>>>> without speculation. Basically the cloned code will split the profile
>>>> counts among different copies, so its effect will be: make the profiled
>>>> count smaller than it should have. When we post-process samples, we intend
>>>> to use the max_count we get from all cloned instructions (well, technically
>>>> not exactly "max", there is some voting to prevent from speculated case).
>>>> As a result, we don't want to lose debug info for clones (as soon as it's
>>>> not speculated clones) because it gives us better chance to capture the
>>>> "max".
>>>>
>>>>
>>>>
>>>> The same applies to debugging. We don't want to see gdb go to
>>>> speculated source locations as we don't expect it getting executed. But for
>>>> clones, as soon as it's actually logically executed, we want to have the
>>>> source location reserved.
>>>>
>>>>
>>>>
>>>> I'm wondering what's the source code that has been affected by this.
>>>>
>>>>
>>>>
>>>> Dehao
>>>>
>>>>
>>>>
>>>> On Thu, Sep 15, 2016 at 2:56 PM, David Blaikie <dblaikie at gmail.com>
>>>> wrote:
>>>>
>>>> This seems like the opposite of the other recent change to remove/zero
>>>> out the debug location of an instruction being commoned into a preceeding
>>>> basic block.
>>>>
>>>> Adding Dehao.
>>>>
>>>> Wouldn't this hurt profile based optimizations by attributing code
>>>> passing through other.bb to the entry? Making the Entry seem more
>>>> common than it is?
>>>>
>>>> On Thu, Sep 15, 2016 at 2:52 PM Wolfgang Pieb <wolfgang.pieb at sony.com>
>>>> wrote:
>>>>
>>>> wolfgangp created this revision.
>>>> wolfgangp added reviewers: aprantl, dblaikie.
>>>> wolfgangp added subscribers: llvm-commits, andreadb.
>>>>
>>>> When the CodeGenPrepare pass sinks a compare instruction into the basic
>>>> block of a user, it should preserve its debug location. Not doing so
>>>> negatively affects source line attribution for debugging and AutoFDO.
>>>>
>>>> Patch by Andrea Di Biagio
>>>>
>>>> https://reviews.llvm.org/D24632
>>>>
>>>> Files:
>>>>   lib/CodeGen/CodeGenPrepare.cpp
>>>>   test/DebugInfo/Generic/sunk-compare.ll
>>>>
>>>> Index: test/DebugInfo/Generic/sunk-compare.ll
>>>> ===================================================================
>>>> --- test/DebugInfo/Generic/sunk-compare.ll
>>>> +++ test/DebugInfo/Generic/sunk-compare.ll
>>>> @@ -0,0 +1,46 @@
>>>> +; RUN: opt -S -codegenprepare < %s | FileCheck %s
>>>> +;
>>>> +; This test case has been generated by hand but is inspired by the
>>>> +; observation that compares that are sunk into the basic blocks where
>>>> +; their results are used did not retain their debug locs. This caused
>>>> +; sample profiling to attribute code to the wrong source lines.
>>>> +;
>>>> +; We check that the compare instruction retains its debug loc after
>>>> +; it is sunk into other.bb by the codegen prepare pass.
>>>> +;
>>>> +; CHECK:       other.bb:
>>>> +; CHECK-NEXT:  icmp{{.*}}%x, 0, !dbg ![[MDHANDLE:[0-9]*]]
>>>> +; CHECK:       ![[MDHANDLE]] = !DILocation(line: 2
>>>> +;
>>>> +target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
>>>> +
>>>> +define i32 @_Z3fooii(i32 %x, i32 %y) !dbg !5 {
>>>> +entry:
>>>> +  %cmp17 = icmp sgt i32 %x, 0, !dbg !6
>>>> +  br label %other.bb, !dbg !6
>>>> +
>>>> +other.bb:
>>>> +  br i1 %cmp17, label %exit1.bb, label %exit2.bb, !dbg !7
>>>> +
>>>> +exit1.bb:
>>>> +  %0 = add i32 %y, 42, !dbg !8
>>>> +  ret i32 %0, !dbg !8
>>>> +
>>>> +exit2.bb:
>>>> +  ret i32 44, !dbg !9
>>>> +
>>>> +}
>>>> +
>>>> +!llvm.dbg.cu = !{!0}
>>>> +!llvm.module.flags = !{!3, !4}
>>>> +
>>>> +!0 = distinct !DICompileUnit(language: DW_LANG_C, file: !1, producer:
>>>> "clang", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug,
>>>> enums: !2, globals: !2)
>>>> +!1 = !DIFile(filename: "test.cpp", directory: "/debuginfo/bug/cgp")
>>>> +!2 = !{}
>>>> +!3 = !{i32 2, !"Dwarf Version", i32 4}
>>>> +!4 = !{i32 2, !"Debug Info Version", i32 3}
>>>> +!5 = distinct !DISubprogram(name: "foo", linkageName: "foo", scope:
>>>> !1, file: !1, line: 1, isLocal: false, isDefinition: true, scopeLine: 1,
>>>> flags: DIFlagPrototyped, isOptimized: false, unit: !0, variables: !2)
>>>> +!6 = !DILocation(line: 2, column: 0, scope: !5)
>>>> +!7 = !DILocation(line: 3, column: 0, scope: !5)
>>>> +!8 = !DILocation(line: 4, column: 0, scope: !5)
>>>> +!9 = !DILocation(line: 5, column: 0, scope: !5)
>>>> Index: lib/CodeGen/CodeGenPrepare.cpp
>>>> ===================================================================
>>>> --- lib/CodeGen/CodeGenPrepare.cpp
>>>> +++ lib/CodeGen/CodeGenPrepare.cpp
>>>> @@ -926,6 +926,8 @@
>>>>        InsertedCmp =
>>>>            CmpInst::Create(CI->getOpcode(), CI->getPredicate(),
>>>>                            CI->getOperand(0), CI->getOperand(1), "",
>>>> &*InsertPt);
>>>> +      // Propagate the debug info.
>>>> +      InsertedCmp->setDebugLoc(CI->getDebugLoc());
>>>>      }
>>>>
>>>>      // Replace a use of the cmp with a use of the new cmp.
>>>>
>>>>
>>>>
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160926/527308fa/attachment-0001.html>


More information about the llvm-commits mailing list