<div dir="ltr"><div><div>Hi Dehao,<br><br></div>sorry for the late reply (I was on holiday on friday, so I couldn't immediately reply).<br></div><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Sep 16, 2016 at 1:10 AM, Dehao Chen <span dir="ltr"><<a href="mailto:danielcdh@gmail.com" target="_blank">danielcdh@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">I can reproduce the assembly. But looks to me the test instruction in LBB0_3 should only matter if x == 1. And it will execute at most once every time the function is called.</div></blockquote><div><br> .loc 1 3 3 is_stmt 0 discriminator 1 # test.cpp:3:3 cmpl $1, %edi<br> testb $1, %dil je .LBB0_18<br> jne .LBB0_3<br><br></div><div>The TEST checks if the lower bit of EDI is set. So, we branch based on whether 'x' is even or odd, and _not_ if x == 1. The outer loop has been specialized for the case where there is a (non-zero) even number of iterations (the induction variable is incremented by 2 instead of 1). That said, I definitely agree that the above TEST is only performed "at most once" per function call, since the instruction is effectively part of the outer loop preheader.<br><br></div><div>However, that is not the only instruction affected by this patch!<br>There are two other important places where the debug location (line 5:1) was not correctly propagated. When I originally posted this bug in our internal bugzilla, I only mentioned this first case. In retrospect, I should have mentioned the other two (more important) cases.<br><br></div><div>If you look carefully at the diff in the assembly before/after this patch, you would easily spot other two (very important!) places where we now propagate debug location for line 5 (discr. 1).<br><br></div><div>This is probably the most important case:<br><br></div><div>Before:<br>.LBB0_8: # %for.cond1.preheader<br> # =>This Loop Header: Depth=1<br> # Child Loop BB0_10 Depth 2<br> # Child Loop BB0_15 Depth 2<br> testl %esi, %esi<br> movl $0, %r9d<br> je .LBB0_13<br>-----<br><br></div><div>After:<br><br></div><div>.LBB0_8: # %for.cond1.preheader<br> # =>This Loop Header: Depth=1<br> # Child Loop BB0_10 Depth 2<br> # Child Loop BB0_15 Depth 2<br>.Ltmp10:<br> .loc 1 5 28 discriminator 1 # test.cpp:5:28<br> testl %esi, %esi<br> movl $0, %r9d<br> je .LBB0_13<br></div><div><br>----<br><br></div><div>Before this patch, the TEST was not counted against line 5 (discriminator 1). So we ended up losing coverage for line 5. To be more specific, every time this function was called, we ended up losing `x-1` counts related to line 5 discriminator 1.<br><br></div><div>Note that the other important place is this one:<br><br>-----<br>Before:<br><br>.LBB0_13: # %cleanup<br> # in Loop: Header=BB0_8 Depth=1<br> .loc 1 11 12 is_stmt 1 # test.cpp:11:12<br> addl %eax, %r9d<br>.Ltmp12:<br> #DEBUG_VALUE: foo:Result <- %R9D<br> xorl %eax, %eax<br> testl %esi, %esi<br> je .LBB0_17<br></div><div><br><br>-----<br></div><div>After:<br>.LBB0_13: # %cleanup<br> # in Loop: Header=BB0_8 Depth=1<br> .loc 1 11 12 is_stmt 1 # test.cpp:11:12<br> addl %eax, %r9d<br>.Ltmp14:<br> #DEBUG_VALUE: foo:Result <- %R9D<br> xorl %eax, %eax<br>.Ltmp15:<br> .loc 1 5 28 discriminator 1 # test.cpp:5:28<br> testl %esi, %esi<br> je .LBB0_17<br><br></div><br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">So I'd assume its sample count will always be 0, thus it should not contribute to the max count of that line?<span class="gmail-HOEnZb"><font color="#888888"><div><br></div><div>Deha<br> </div></font></span></div></blockquote><div><br></div><div>The missing debug location for the three compare instructions is definitely going to affect the count. This problem was noticed when analyzing the sample distribution obtained from autoFDO (by looking at the generated text format). The number of 'hits' on line 5 (discriminator 1) was surprisingly very low and it didn't match what I was expecting to see. In particular, it was way too small compared to the number of hits on the condition of the outer loop. That's how I spotted this issue (which I eventually tracked down to CodeGenPrepare).<br></div><div><br></div><div>P.s.:<br></div><div>Regardless of sample pgo, this patch would be an improvement in terms of debuggability of optimized code. It fixes an obvious case where a transformation forgot to propagate the debug location on the "optimized" instructions.<br></div><div><br></div><div>I hope this helps,<br></div><div>- Andrea<br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="gmail-HOEnZb"><div class="gmail-h5"><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Sep 15, 2016 at 4:01 PM, Pieb, Wolfgang <span dir="ltr"><<a href="mailto:wolfgang.pieb@sony.com" target="_blank">wolfgang.pieb@sony.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div lang="EN-US">
<div>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)">The original test case was the following:<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:10pt;font-family:"courier new"">/////<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10pt;font-family:"courier new"">int foo(unsigned x, unsigned y, int *z, int *h) {<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10pt;font-family:"courier new""> int Result = 0;<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10pt;font-family:"courier new""> for (unsigned i = 0; i < x; i++) { // line 3<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10pt;font-family:"courier new""> int Val = 0;<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10pt;font-family:"courier new""> for (unsigned j = 0; j < y; ++j) { // line 5<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10pt;font-family:"courier new""> if (Val == z[j]) { // line 6<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10pt;font-family:"courier new""> Val += 42; // line 7<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10pt;font-family:"courier new""> break;<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10pt;font-family:"courier new""> }<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10pt;font-family:"courier new""> }<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10pt;font-family:"courier new""> Result += Val; // line 9<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10pt;font-family:"courier new""> }<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10pt;font-family:"courier new""><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:10pt;font-family:"courier new""> return Result;<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10pt;font-family:"courier new"">}<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:10pt;font-family:"courier new"">/////<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)">Quoting Andrea’s investigation:<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)"><u></u> <u></u></span></p>
<pre><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)">At –O2, the inner loop at line 5 is rotated. </span><span style="font-size:11pt;font-family:"calibri","sans-serif"">The comparison `j < y` at iteration 0 of the inner loop becomes a comparison between 'y' and zero.<u></u><u></u></span></pre>
<pre><span style="font-size:11pt;font-family:"calibri","sans-serif"">Since 'y' is a loop invariant (for both loops), the compiler moves the icmp between 'y' and zero in<u></u><u></u></span></pre>
<pre><span style="font-size:11pt;font-family:"calibri","sans-serif"">the outer loop preheader. <u></u><u></u></span></pre>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)"><u></u> <u></u></span></p>
<pre>Later on, CodeGenPrepare decides to sink the icmp within the basic block that uses it. So, we end<u></u><u></u></pre>
<pre>up rematerializing the computation in its original place (after loop rotation).<u></u><u></u></pre>
<pre>However, this time, the rematerialized instruction doesn't seem to be associated with any debug info.<u></u><u></u></pre>
<pre><u></u> <u></u></pre>
<pre>Compiling the test case with –O2 –g –s you see assembly code:<u></u><u></u></pre>
<pre><u></u> <u></u></pre>
<pre>.LBB0_3: <wbr> # %for.cond1.preheader.prol<u></u><u></u></pre>
<pre> movl $1, %r8d<u></u><u></u></pre>
<pre> xorl %eax, %eax<u></u><u></u></pre>
<pre> testl %esi, %esi<u></u><u></u></pre>
<pre> je .LBB0_7<u></u><u></u></pre>
<pre><u></u> <u></u></pre>
<pre><u></u> <u></u></pre>
<pre>Instead of:<u></u><u></u></pre>
<pre><u></u> <u></u></pre>
<pre>.LBB0_3: <wbr> # %for.cond1.preheader.prol<u></u><u></u></pre>
<pre> movl $1, %r8d<u></u><u></u></pre>
<pre> xorl %eax, %eax<u></u><u></u></pre>
<pre>.Ltmp4:<u></u><u></u></pre>
<pre> .loc 1 5 28 is_stmt 1 discriminator 1 # test.cpp:5:28<u></u><u></u></pre>
<pre> testl %esi, %esi<u></u><u></u></pre>
<pre> je .LBB0_7<u></u><u></u></pre>
<pre><u></u> <u></u></pre>
<pre><u></u> <u></u></pre>
<pre><u></u> <u></u></pre>
<pre>-- wolfgang<u></u><u></u></pre>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11pt;font-family:"calibri","sans-serif";color:rgb(31,73,125)"><u></u> <u></u></span></p>
<div style="border-width:medium medium medium 1.5pt;border-style:none none none solid;border-color:-moz-use-text-color -moz-use-text-color -moz-use-text-color blue;padding:0in 0in 0in 4pt">
<div>
<div style="border-width:1pt medium medium;border-style:solid none none;border-color:rgb(181,196,223) -moz-use-text-color -moz-use-text-color;padding:3pt 0in 0in">
<p class="MsoNormal"><b><span style="font-size:10pt;font-family:"tahoma","sans-serif"">From:</span></b><span style="font-size:10pt;font-family:"tahoma","sans-serif""> Dehao Chen [mailto:<a href="mailto:danielcdh@gmail.com" target="_blank">danielcdh@gmail.com</a>]
<br>
<b>Sent:</b> Thursday, September 15, 2016 3:32 PM<br>
<b>To:</b> David Blaikie<br>
<b>Cc:</b> <a href="mailto:reviews%2BD24632%2Bpublic%2B5e33fe097ec6bcd7@reviews.llvm.org" target="_blank">reviews+D24632+public+5e33fe09<wbr>7ec6bcd7@reviews.llvm.org</a>; Pieb, Wolfgang; <a href="mailto:aprantl@apple.com" target="_blank">aprantl@apple.com</a>; <a href="mailto:andrea.dibiagio@gmail.com" target="_blank">andrea.dibiagio@gmail.com</a>; llvm-commits; Junbum Lim<br>
<b>Subject:</b> Re: [PATCH] D24632: Preserve the debug location when sinking compare instructions<u></u><u></u></span></p>
</div>
</div><div><div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<div>
<p class="MsoNormal">For AutoFDO, we want to remove(or set 0-line) debug info for the instructions that could be executed speculatively (i.e. instruction gets more count that it should). For this patch, it's more like code cloning without speculation. Basically
the cloned code will split the profile counts among different copies, so its effect will be: make the profiled count smaller than it should have. When we post-process samples, we intend to use the max_count we get from all cloned instructions (well, technically
not exactly "max", there is some voting to prevent from speculated case). As a result, we don't want to lose debug info for clones (as soon as it's not speculated clones) because it gives us better chance to capture the "max".<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">The same applies to debugging. We don't want to see gdb go to speculated source locations as we don't expect it getting executed. But for clones, as soon as it's actually logically executed, we want to have the source location reserved.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">I'm wondering what's the source code that has been affected by this.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Dehao<u></u><u></u></p>
</div>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal">On Thu, Sep 15, 2016 at 2:56 PM, David Blaikie <<a href="mailto:dblaikie@gmail.com" target="_blank">dblaikie@gmail.com</a>> wrote:<u></u><u></u></p>
<div>
<p class="MsoNormal" style="margin-bottom:12pt">This seems like the opposite of the other recent change to remove/zero out the debug location of an instruction being commoned into a preceeding basic block.<br>
<br>
Adding Dehao.<br>
<br>
Wouldn't this hurt profile based optimizations by attributing code passing through
<a href="http://other.bb" target="_blank">other.bb</a> to the entry? Making the Entry seem more common than it is?<u></u><u></u></p>
<div>
<div>
<p class="MsoNormal">On Thu, Sep 15, 2016 at 2:52 PM Wolfgang Pieb <<a href="mailto:wolfgang.pieb@sony.com" target="_blank">wolfgang.pieb@sony.com</a>> wrote:<u></u><u></u></p>
</div>
<blockquote style="border-width:medium medium medium 1pt;border-style:none none none solid;border-color:-moz-use-text-color -moz-use-text-color -moz-use-text-color rgb(204,204,204);padding:0in 0in 0in 6pt;margin-left:4.8pt;margin-right:0in">
<p class="MsoNormal" style="margin-bottom:12pt">wolfgangp created this revision.<br>
wolfgangp added reviewers: aprantl, dblaikie.<br>
wolfgangp added subscribers: llvm-commits, andreadb.<br>
<br>
When the CodeGenPrepare pass sinks a compare instruction into the basic block of a user, it should preserve its debug location. Not doing so negatively affects source line attribution for debugging and AutoFDO.<br>
<br>
Patch by Andrea Di Biagio<br>
<br>
<a href="https://reviews.llvm.org/D24632" target="_blank">https://reviews.llvm.org/D2463<wbr>2</a><br>
<br>
Files:<br>
lib/CodeGen/CodeGenPrepare.cpp<br>
test/DebugInfo/Generic/sunk-co<wbr>mpare.ll<br>
<br>
Index: test/DebugInfo/Generic/sunk-co<wbr>mpare.ll<br>
==============================<wbr>==============================<wbr>=======<br>
--- test/DebugInfo/Generic/sunk-co<wbr>mpare.ll<br>
+++ test/DebugInfo/Generic/sunk-co<wbr>mpare.ll<br>
@@ -0,0 +1,46 @@<br>
+; RUN: opt -S -codegenprepare < %s | FileCheck %s<br>
+;<br>
+; This test case has been generated by hand but is inspired by the<br>
+; observation that compares that are sunk into the basic blocks where<br>
+; their results are used did not retain their debug locs. This caused<br>
+; sample profiling to attribute code to the wrong source lines.<br>
+;<br>
+; We check that the compare instruction retains its debug loc after<br>
+; it is sunk into <a href="http://other.bb" target="_blank">other.bb</a> by the codegen prepare pass.<br>
+;<br>
+; CHECK: <a href="http://other.bb" target="_blank">other.bb</a>:<br>
+; CHECK-NEXT: icmp{{.*}}%x, 0, !dbg ![[MDHANDLE:[0-9]*]]<br>
+; CHECK: ![[MDHANDLE]] = !DILocation(line: 2<br>
+;<br>
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32<wbr>:64-S128"<br>
+<br>
+define i32 @_Z3fooii(i32 %x, i32 %y) !dbg !5 {<br>
+entry:<br>
+ %cmp17 = icmp sgt i32 %x, 0, !dbg !6<br>
+ br label %<a href="http://other.bb" target="_blank">other.bb</a>, !dbg !6<br>
+<br>
+<a href="http://other.bb" target="_blank">other.bb</a>:<br>
+ br i1 %cmp17, label %<a href="http://exit1.bb" target="_blank">exit1.bb</a>, label %<a href="http://exit2.bb" target="_blank">exit2.bb</a>, !dbg !7<br>
+<br>
+<a href="http://exit1.bb" target="_blank">exit1.bb</a>:<br>
+ %0 = add i32 %y, 42, !dbg !8<br>
+ ret i32 %0, !dbg !8<br>
+<br>
+<a href="http://exit2.bb" target="_blank">exit2.bb</a>:<br>
+ ret i32 44, !dbg !9<br>
+<br>
+}<br>
+<br>
+!<a href="http://llvm.dbg.cu" target="_blank">llvm.dbg.cu</a> = !{!0}<br>
+!llvm.module.flags = !{!3, !4}<br>
+<br>
+!0 = distinct !DICompileUnit(language: DW_LANG_C, file: !1, producer: "clang", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, globals: !2)<br>
+!1 = !DIFile(filename: "test.cpp", directory: "/debuginfo/bug/cgp")<br>
+!2 = !{}<br>
+!3 = !{i32 2, !"Dwarf Version", i32 4}<br>
+!4 = !{i32 2, !"Debug Info Version", i32 3}<br>
+!5 = distinct !DISubprogram(name: "foo", linkageName: "foo", scope: !1, file: !1, line: 1, isLocal: false, isDefinition: true, scopeLine: 1, flags: DIFlagPrototyped, isOptimized: false, unit: !0, variables: !2)<br>
+!6 = !DILocation(line: 2, column: 0, scope: !5)<br>
+!7 = !DILocation(line: 3, column: 0, scope: !5)<br>
+!8 = !DILocation(line: 4, column: 0, scope: !5)<br>
+!9 = !DILocation(line: 5, column: 0, scope: !5)<br>
Index: lib/CodeGen/CodeGenPrepare.cpp<br>
==============================<wbr>==============================<wbr>=======<br>
--- lib/CodeGen/CodeGenPrepare.cpp<br>
+++ lib/CodeGen/CodeGenPrepare.cpp<br>
@@ -926,6 +926,8 @@<br>
InsertedCmp =<br>
CmpInst::Create(CI->getOpcode<wbr>(), CI->getPredicate(),<br>
CI->getOperand(0), CI->getOperand(1), "", &*InsertPt);<br>
+ // Propagate the debug info.<br>
+ InsertedCmp->setDebugLoc(CI->g<wbr>etDebugLoc());<br>
}<br>
<br>
// Replace a use of the cmp with a use of the new cmp.<br>
<br>
<u></u><u></u></p>
</blockquote>
</div>
</div>
</div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
</div></div></div>
</div>
</div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div></div>