[PATCH] D24632: Preserve the debug location when sinking compare instructions

Andrea Di Biagio via llvm-commits llvm-commits at lists.llvm.org
Mon Sep 19 05:27:06 PDT 2016


Hi Dehao,

sorry for the late reply (I was on holiday on friday, so I couldn't
immediately reply).


On Fri, Sep 16, 2016 at 1:10 AM, Dehao Chen <danielcdh at gmail.com> wrote:

> I can reproduce the assembly. But looks to me the test instruction in
> LBB0_3 should only matter if x == 1. And it will execute at most once every
> time the function is called.
>

        .loc    1 3 3 is_stmt 0 discriminator 1 #
test.cpp:3:3                                cmpl    $1, %edi
        testb   $1,
%dil
je      .LBB0_18
        jne     .LBB0_3

The TEST checks if the lower bit of EDI is set. So, we branch based on
whether 'x' is even or odd, and _not_ if x == 1. The outer loop has been
specialized for the case where there is a (non-zero) even number of
iterations (the induction variable is incremented by 2 instead of 1). That
said, I definitely agree that the above TEST is only performed "at most
once" per function call, since the instruction is effectively part of the
outer loop preheader.

However, that is not the only instruction affected by this patch!
There are two other important places where the debug location (line 5:1)
was not correctly propagated. When I originally posted this bug in our
internal bugzilla, I only mentioned this first case. In retrospect, I
should have mentioned the other two (more important) cases.

If you look carefully at the diff in the assembly before/after this patch,
you would easily spot other two (very important!) places where we now
propagate debug location for line 5 (discr. 1).

This is probably the most important case:

Before:
.LBB0_8:                                # %for.cond1.preheader
                                        # =>This Loop Header: Depth=1
                                        #     Child Loop BB0_10 Depth 2
                                        #     Child Loop BB0_15 Depth 2
        testl   %esi, %esi
        movl    $0, %r9d
        je      .LBB0_13
-----

After:

.LBB0_8:                                # %for.cond1.preheader
                                        # =>This Loop Header: Depth=1
                                        #     Child Loop BB0_10 Depth 2
                                        #     Child Loop BB0_15 Depth 2
.Ltmp10:
        .loc    1 5 28 discriminator 1  # test.cpp:5:28
        testl   %esi, %esi
        movl    $0, %r9d
        je      .LBB0_13

----

Before this patch, the TEST was not counted against line 5 (discriminator
1). So we ended up losing coverage for line 5. To be more specific, every
time this function was called, we ended up losing `x-1` counts related to
line 5 discriminator 1.

Note that the other important place is this one:

-----
Before:

.LBB0_13:                               # %cleanup
                                        #   in Loop: Header=BB0_8 Depth=1
        .loc    1 11 12 is_stmt 1       # test.cpp:11:12
        addl    %eax, %r9d
.Ltmp12:
        #DEBUG_VALUE: foo:Result <- %R9D
        xorl    %eax, %eax
        testl   %esi, %esi
        je      .LBB0_17


-----
After:
.LBB0_13:                               # %cleanup
                                        #   in Loop: Header=BB0_8 Depth=1
        .loc    1 11 12 is_stmt 1       # test.cpp:11:12
        addl    %eax, %r9d
.Ltmp14:
        #DEBUG_VALUE: foo:Result <- %R9D
        xorl    %eax, %eax
.Ltmp15:
        .loc    1 5 28 discriminator 1  # test.cpp:5:28
        testl   %esi, %esi
        je      .LBB0_17


So I'd assume its sample count will always be 0, thus it should not
> contribute to the max count of that line?
>
> Deha
>
>

The missing debug location for the three compare instructions is definitely
going to affect the count. This problem was noticed when analyzing the
sample distribution obtained from autoFDO (by looking at the generated text
format). The number of 'hits' on line 5 (discriminator 1) was surprisingly
very low and it didn't match what I was expecting to see. In particular, it
was way too small compared to the number of hits on the condition of the
outer loop. That's how I spotted this issue (which I eventually tracked
down to CodeGenPrepare).

P.s.:
Regardless of sample pgo, this patch would be an improvement in terms of
debuggability of optimized code. It fixes an obvious case where a
transformation forgot to propagate the debug location on the "optimized"
instructions.

I hope this helps,
- Andrea


> On Thu, Sep 15, 2016 at 4:01 PM, Pieb, Wolfgang <wolfgang.pieb at sony.com>
> wrote:
>
>> The original test case was the following:
>>
>>
>>
>> /////
>>
>> int foo(unsigned x, unsigned y, int *z, int *h) {
>>
>>   int Result = 0;
>>
>>   for (unsigned i = 0; i < x; i++) {    // line 3
>>
>>     int Val = 0;
>>
>>     for (unsigned j = 0; j < y; ++j) {  // line 5
>>
>>       if (Val == z[j]) {                // line 6
>>
>>         Val += 42;                      // line 7
>>
>>         break;
>>
>>       }
>>
>>     }
>>
>>     Result += Val;                      // line 9
>>
>>   }
>>
>>
>>
>>   return Result;
>>
>> }
>>
>> /////
>>
>>
>>
>> Quoting Andrea’s investigation:
>>
>>
>>
>> At –O2, the inner loop at line 5 is rotated. The comparison `j < y` at iteration 0 of the inner loop becomes a comparison between 'y' and zero.
>>
>> Since 'y' is a loop invariant (for both loops), the compiler moves the icmp between 'y' and zero in
>>
>> the outer loop preheader.
>>
>>
>>
>> Later on, CodeGenPrepare decides to sink the icmp within the basic block that uses it. So, we end
>>
>> up rematerializing the computation in its original place (after loop rotation).
>>
>> However, this time, the rematerialized instruction doesn't seem to be associated with any debug info.
>>
>>
>>
>> Compiling the test case with –O2 –g –s you see assembly code:
>>
>>
>>
>> .LBB0_3:                                # %for.cond1.preheader.prol
>>
>>         movl    $1, %r8d
>>
>>         xorl    %eax, %eax
>>
>>         testl   %esi, %esi
>>
>>         je      .LBB0_7
>>
>>
>>
>>
>>
>> Instead of:
>>
>>
>>
>> .LBB0_3:                                # %for.cond1.preheader.prol
>>
>>         movl    $1, %r8d
>>
>>         xorl    %eax, %eax
>>
>> .Ltmp4:
>>
>>         .loc    1 5 28 is_stmt 1 discriminator 1 # test.cpp:5:28
>>
>>         testl   %esi, %esi
>>
>>         je      .LBB0_7
>>
>>
>>
>>
>>
>>
>>
>> -- wolfgang
>>
>>
>>
>>
>>
>> *From:* Dehao Chen [mailto:danielcdh at gmail.com]
>> *Sent:* Thursday, September 15, 2016 3:32 PM
>> *To:* David Blaikie
>> *Cc:* reviews+D24632+public+5e33fe097ec6bcd7 at reviews.llvm.org; Pieb,
>> Wolfgang; aprantl at apple.com; andrea.dibiagio at gmail.com; llvm-commits;
>> Junbum Lim
>> *Subject:* Re: [PATCH] D24632: Preserve the debug location when sinking
>> compare instructions
>>
>>
>>
>> For AutoFDO, we want to remove(or set 0-line) debug info for the
>> instructions that could be executed speculatively (i.e. instruction gets
>> more count that it should). For this patch, it's more like code cloning
>> without speculation. Basically the cloned code will split the profile
>> counts among different copies, so its effect will be: make the profiled
>> count smaller than it should have. When we post-process samples, we intend
>> to use the max_count we get from all cloned instructions (well, technically
>> not exactly "max", there is some voting to prevent from speculated case).
>> As a result, we don't want to lose debug info for clones (as soon as it's
>> not speculated clones) because it gives us better chance to capture the
>> "max".
>>
>>
>>
>> The same applies to debugging. We don't want to see gdb go to speculated
>> source locations as we don't expect it getting executed. But for clones, as
>> soon as it's actually logically executed, we want to have the source
>> location reserved.
>>
>>
>>
>> I'm wondering what's the source code that has been affected by this.
>>
>>
>>
>> Dehao
>>
>>
>>
>> On Thu, Sep 15, 2016 at 2:56 PM, David Blaikie <dblaikie at gmail.com>
>> wrote:
>>
>> This seems like the opposite of the other recent change to remove/zero
>> out the debug location of an instruction being commoned into a preceeding
>> basic block.
>>
>> Adding Dehao.
>>
>> Wouldn't this hurt profile based optimizations by attributing code
>> passing through other.bb to the entry? Making the Entry seem more common
>> than it is?
>>
>> On Thu, Sep 15, 2016 at 2:52 PM Wolfgang Pieb <wolfgang.pieb at sony.com>
>> wrote:
>>
>> wolfgangp created this revision.
>> wolfgangp added reviewers: aprantl, dblaikie.
>> wolfgangp added subscribers: llvm-commits, andreadb.
>>
>> When the CodeGenPrepare pass sinks a compare instruction into the basic
>> block of a user, it should preserve its debug location. Not doing so
>> negatively affects source line attribution for debugging and AutoFDO.
>>
>> Patch by Andrea Di Biagio
>>
>> https://reviews.llvm.org/D24632
>>
>> Files:
>>   lib/CodeGen/CodeGenPrepare.cpp
>>   test/DebugInfo/Generic/sunk-compare.ll
>>
>> Index: test/DebugInfo/Generic/sunk-compare.ll
>> ===================================================================
>> --- test/DebugInfo/Generic/sunk-compare.ll
>> +++ test/DebugInfo/Generic/sunk-compare.ll
>> @@ -0,0 +1,46 @@
>> +; RUN: opt -S -codegenprepare < %s | FileCheck %s
>> +;
>> +; This test case has been generated by hand but is inspired by the
>> +; observation that compares that are sunk into the basic blocks where
>> +; their results are used did not retain their debug locs. This caused
>> +; sample profiling to attribute code to the wrong source lines.
>> +;
>> +; We check that the compare instruction retains its debug loc after
>> +; it is sunk into other.bb by the codegen prepare pass.
>> +;
>> +; CHECK:       other.bb:
>> +; CHECK-NEXT:  icmp{{.*}}%x, 0, !dbg ![[MDHANDLE:[0-9]*]]
>> +; CHECK:       ![[MDHANDLE]] = !DILocation(line: 2
>> +;
>> +target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
>> +
>> +define i32 @_Z3fooii(i32 %x, i32 %y) !dbg !5 {
>> +entry:
>> +  %cmp17 = icmp sgt i32 %x, 0, !dbg !6
>> +  br label %other.bb, !dbg !6
>> +
>> +other.bb:
>> +  br i1 %cmp17, label %exit1.bb, label %exit2.bb, !dbg !7
>> +
>> +exit1.bb:
>> +  %0 = add i32 %y, 42, !dbg !8
>> +  ret i32 %0, !dbg !8
>> +
>> +exit2.bb:
>> +  ret i32 44, !dbg !9
>> +
>> +}
>> +
>> +!llvm.dbg.cu = !{!0}
>> +!llvm.module.flags = !{!3, !4}
>> +
>> +!0 = distinct !DICompileUnit(language: DW_LANG_C, file: !1, producer:
>> "clang", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug,
>> enums: !2, globals: !2)
>> +!1 = !DIFile(filename: "test.cpp", directory: "/debuginfo/bug/cgp")
>> +!2 = !{}
>> +!3 = !{i32 2, !"Dwarf Version", i32 4}
>> +!4 = !{i32 2, !"Debug Info Version", i32 3}
>> +!5 = distinct !DISubprogram(name: "foo", linkageName: "foo", scope: !1,
>> file: !1, line: 1, isLocal: false, isDefinition: true, scopeLine: 1, flags:
>> DIFlagPrototyped, isOptimized: false, unit: !0, variables: !2)
>> +!6 = !DILocation(line: 2, column: 0, scope: !5)
>> +!7 = !DILocation(line: 3, column: 0, scope: !5)
>> +!8 = !DILocation(line: 4, column: 0, scope: !5)
>> +!9 = !DILocation(line: 5, column: 0, scope: !5)
>> Index: lib/CodeGen/CodeGenPrepare.cpp
>> ===================================================================
>> --- lib/CodeGen/CodeGenPrepare.cpp
>> +++ lib/CodeGen/CodeGenPrepare.cpp
>> @@ -926,6 +926,8 @@
>>        InsertedCmp =
>>            CmpInst::Create(CI->getOpcode(), CI->getPredicate(),
>>                            CI->getOperand(0), CI->getOperand(1), "",
>> &*InsertPt);
>> +      // Propagate the debug info.
>> +      InsertedCmp->setDebugLoc(CI->getDebugLoc());
>>      }
>>
>>      // Replace a use of the cmp with a use of the new cmp.
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160919/ad0605c5/attachment-0001.html>


More information about the llvm-commits mailing list