[PATCH] Fix for inlining decision affected by debug intrinsics

Dario Domizioli dario.domizioli at gmail.com
Fri Jan 31 11:51:46 PST 2014


Thanks for the quick review!

I have now split the patch into two diff files, each rebased on r200567.
The "_fix" one contains the fix and the test, the "_debug" one contains the
debug prints.

Let me know if there's any issue, otherwise feel free to commit.

Cheers,
    Dario Domizioli
    SN Systems - Sony Computer Entertainment












On 31 January 2014 19:24, Eric Christopher <echristo at gmail.com> wrote:

> Agreed with what Manman says. Let's split it in two and then either of
> us can commit for you.
>
> Thanks!
>
> -eric
>
> On Fri, Jan 31, 2014 at 11:05 AM, Manman Ren <manman.ren at gmail.com> wrote:
> >
> > LGTM. Thanks for the detailed comments.
> >
> > I noticed a few debugging-related changes, it may be better to separate
> > those.
> > +  DEBUG_PRINT_STAT(Cost);
> > +  DEBUG_PRINT_STAT(Threshold);
> > +  DEBUG_PRINT_STAT(VectorBonus);
> >
> > Manman
> >
> >
> > On Fri, Jan 31, 2014 at 10:55 AM, Dario Domizioli
> > <dario.domizioli at gmail.com> wrote:
> >>
> >> Hello LLVM.
> >>
> >> I have come across a subtle and rare interaction between debug
> >> information, vector instructions, and the cost heuristic used by the
> >> SimpleInliner (i.e. the InlineCost analysis).
> >>
> >> It appears that debug intrinsics are counted in the number of
> instructions
> >> taken into consideration by the heuristic. They are also later
> discounted as
> >> being "simplified", so they do not directly affect the inline Cost, but
> they
> >> are taken into account when evaluating the fraction of instructions
> that are
> >> vector instructions. This fraction is used to determine whether to
> apply a
> >> hidden "VectorBonus" to the inlining Threshold, and therefore there is a
> >> subtle interaction which means that debug intrinsics can affect the
> inlining
> >> decision.
> >>
> >> The attached patch fixes the problem by making the inline cost heuristic
> >> skip debug intrinsics altogether when examining instructions.
> >> I have also added a couple of debug prints to the already existing ones,
> >> which I think are generally useful to debug the inline cost heuristic.
> >>
> >> I have included a test in which I have manufactured a situation where
> >> vector instructions would be most of the function's code but they are
> >> outnumbered by debug intrinsics when debug information is present. The
> test
> >> exposes the problem in the current trunk and passes with my fix.
> >>
> >> Please submit for me if you accept this patch, as I do not have commit
> >> access.
> >>
> >> Cheers,
> >>     Dario Domizioli
> >>     SN Systems - Sony Computer Entertainment
> >>
> >>
> >> _______________________________________________
> >> llvm-commits mailing list
> >> llvm-commits at cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >>
> >
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140131/b94a4496/attachment.html>
-------------- next part --------------
Index: lib/Analysis/IPA/InlineCost.cpp
===================================================================
--- lib/Analysis/IPA/InlineCost.cpp	(revision 200567)
+++ lib/Analysis/IPA/InlineCost.cpp	(working copy)
@@ -1178,6 +1178,9 @@
   DEBUG_PRINT_STAT(SROACostSavings);
   DEBUG_PRINT_STAT(SROACostSavingsLost);
   DEBUG_PRINT_STAT(ContainsNoDuplicateCall);
+  DEBUG_PRINT_STAT(Cost);
+  DEBUG_PRINT_STAT(Threshold);
+  DEBUG_PRINT_STAT(VectorBonus);
 #undef DEBUG_PRINT_STAT
 }
 #endif
-------------- next part --------------
Index: lib/Analysis/IPA/InlineCost.cpp
===================================================================
--- lib/Analysis/IPA/InlineCost.cpp	(revision 200567)
+++ lib/Analysis/IPA/InlineCost.cpp	(working copy)
@@ -872,6 +872,14 @@
 /// viable, and true if inlining remains viable.
 bool CallAnalyzer::analyzeBlock(BasicBlock *BB) {
   for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I) {
+    // We need to completely ignore debug intrinsics from the calculation. It
+    // is true that they would be considered "simplified" by the later check,
+    // but they must not affect the total number of instructions either,
+    // because NumInstructions has an effect on the inlining threshold via
+    // VectorBonus. It is just simpler (and more correct) to completely ignore
+    // them and do an early 'continue' here.
+    if (isa<DbgInfoIntrinsic>(I))
+      continue;
     ++NumInstructions;
     if (isa<ExtractElementInst>(I) || I->getType()->isVectorTy())
       ++NumVectorInstructions;
Index: test/Transforms/Inline/inline-threshold-unaffected-by-debug-intrinsics.ll
===================================================================
--- test/Transforms/Inline/inline-threshold-unaffected-by-debug-intrinsics.ll	(revision 0)
+++ test/Transforms/Inline/inline-threshold-unaffected-by-debug-intrinsics.ll	(working copy)
@@ -0,0 +1,560 @@
+; RUN: opt < %s -S -inline -inline-threshold=2 | FileCheck %s
+; RUN: opt < %s -S -strip-debug -inline -inline-threshold=2 | FileCheck %s
+
+;
+; This test was generated from the following source:
+;
+; typedef float Vec4 __attribute__((ext_vector_type(4)));
+;
+; Vec4 foo (Vec4 a, Vec4 b)
+; {
+;     Vec4 three = {3.0f, 3.0f, 3.0f, 3.0f};
+;     Vec4 five =  {5.0f, 5.0f, 5.0f, 5.0f};
+;     a = a * three;
+;     b = b * five;
+;     return a + b;
+; }
+;
+; float bar (Vec4 a, Vec4 b)
+; {
+;     Vec4 tmp = foo (a, b);
+;     return tmp[0] + tmp[1] + tmp[2] + tmp[3];
+; }
+;
+;
+; The command used was:
+;   clang -O0 -g -S -emit-llvm foo.cpp -o test.ll
+; And then:
+;   opt -sroa -S test.ll >inline-threshold-unaffected-by-debug-intrinsics.ll
+; (this is to reduce the size of the test case)
+;
+;
+; Check that foo() is inlined both with and without debug information. To
+; do so we run opt with and without -strip-debug and just running the inliner.
+;
+; CHECK: @_Z3barDv4_fS_
+; CHECK-NOT: %call = call <4 x float> @_Z3fooDv4_fS_(
+; CHECK: ret
+;
+; foo() is a function that has a lot of vector operations, which would trigger
+; the application of a large VectorBonus to the inlining threshold. This would
+; allow foo() to be inlined, which is the expected behaviour.
+;
+; However, if debug intrinsics were present, prior to the fix they would be
+; counted in the total number of instructions by the heuristic, causing the
+; relative fraction of vector instructions to fall below the value necessary to
+; be awarded the large VectorBonus.
+;
+; The fix makes sure that debug intrinsics are discounted, so foo() always
+; qualifies for the large VectorBonus.
+;
+; This test assumes that the inline heuristic does not change massively. If it
+; does, this test might start to fail (or pass as a false negative). However,
+; checking the debug traces for the actual values of the Cost, Threshold and
+; VectorBonus would make the test fail for any slight change to the inline
+; heuristic, and I do not think it would make a better test.
+;
+
+; The triple here just enforces a cost model for the instructions, the test does
+; not actually generate x86 code.
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+define <4 x float> @_Z3fooDv4_fS_(<4 x float> %a, <4 x float> %b) #0 {
+entry:
+  call void @llvm.dbg.value(metadata !{<4 x float> %a}, i64 0, metadata !19), !dbg !20
+  call void @llvm.dbg.value(metadata !{<4 x float> %b}, i64 0, metadata !21), !dbg !20
+  call void @llvm.dbg.value(metadata !22, i64 0, metadata !23), !dbg !24
+  call void @llvm.dbg.value(metadata !25, i64 0, metadata !26), !dbg !27
+  %mul = fmul <4 x float> %a, <float 3.000000e+00, float 3.000000e+00, float 3.000000e+00, float 3.000000e+00>, !dbg !28
+  call void @llvm.dbg.value(metadata !{<4 x float> %mul}, i64 0, metadata !19), !dbg !28
+  %mul1 = fmul <4 x float> %b, <float 5.000000e+00, float 5.000000e+00, float 5.000000e+00, float 5.000000e+00>, !dbg !29
+  call void @llvm.dbg.value(metadata !{<4 x float> %mul1}, i64 0, metadata !21), !dbg !29
+  %add = fadd <4 x float> %mul, %mul1, !dbg !30
+  ret <4 x float> %add, !dbg !30
+}
+
+declare void @llvm.dbg.declare(metadata, metadata) #1
+
+define float @_Z3barDv4_fS_(<4 x float> %a, <4 x float> %b) #0 {
+entry:
+  call void @llvm.dbg.value(metadata !{<4 x float> %a}, i64 0, metadata !31), !dbg !32
+  call void @llvm.dbg.value(metadata !{<4 x float> %b}, i64 0, metadata !33), !dbg !32
+  %call = call <4 x float> @_Z3fooDv4_fS_(<4 x float> %a, <4 x float> %b), !dbg !34
+  call void @llvm.dbg.value(metadata !{<4 x float> %call}, i64 0, metadata !35), !dbg !34
+  %vecext = extractelement <4 x float> %call, i32 0, !dbg !36
+  %vecext1 = extractelement <4 x float> %call, i32 1, !dbg !36
+  %add = fadd float %vecext, %vecext1, !dbg !36
+  %vecext2 = extractelement <4 x float> %call, i32 2, !dbg !36
+  %add3 = fadd float %add, %vecext2, !dbg !36
+  %vecext4 = extractelement <4 x float> %call, i32 3, !dbg !36
+  %add5 = fadd float %add3, %vecext4, !dbg !36
+  ret float %add5, !dbg !36
+}
+
+declare void @llvm.dbg.value(metadata, i64, metadata) #1
+
+attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
+attributes #1 = { nounwind readnone }
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!16, !17}
+!llvm.ident = !{!18}
+
+!0 = metadata !{i32 786449, metadata !1, i32 4, metadata !"clang version 3.5 (trunk 189234:200463M) (llvm/trunk 200463)", i1 false, metadata !"", i32 0, metadata !2, metadata !2, metadata !3, metadata !2, metadata !2, metadata !""} ; [ DW_TAG_compile_unit ] [/home/dario/dev/test/o059210/foo.cpp] [DW_LANG_C_plus_plus]
+!1 = metadata !{metadata !"foo.cpp", metadata !"/home/dario/dev/test/o059210"}
+!2 = metadata !{i32 0}
+!3 = metadata !{metadata !4, metadata !13}
+!4 = metadata !{i32 786478, metadata !1, metadata !5, metadata !"foo", metadata !"foo", metadata !"_Z3fooDv4_fS_", i32 5, metadata !6, i1 false, i1 true, i32 0, i32 0, null, i32 256, i1 false, <4 x float> (<4 x float>, <4 x float>)* @_Z3fooDv4_fS_, null, null, metadata !2, i32 6} ; [ DW_TAG_subprogram ] [line 5] [def] [scope 6] [foo]
+!5 = metadata !{i32 786473, metadata !1}          ; [ DW_TAG_file_type ] [/home/dario/dev/test/o059210/foo.cpp]
+!6 = metadata !{i32 786453, i32 0, null, metadata !"", i32 0, i64 0, i64 0, i64 0, i32 0, null, metadata !7, i32 0, null, null, null} ; [ DW_TAG_subroutine_type ] [line 0, size 0, align 0, offset 0] [from ]
+!7 = metadata !{metadata !8, metadata !8, metadata !8}
+!8 = metadata !{i32 786454, metadata !1, null, metadata !"Vec4", i32 2, i64 0, i64 0, i64 0, i32 0, metadata !9} ; [ DW_TAG_typedef ] [Vec4] [line 2, size 0, align 0, offset 0] [from ]
+!9 = metadata !{i32 786433, null, null, metadata !"", i32 0, i64 128, i64 128, i32 0, i32 2048, metadata !10, metadata !11, i32 0, null, null, null} ; [ DW_TAG_array_type ] [line 0, size 128, align 128, offset 0] [vector] [from float]
+!10 = metadata !{i32 786468, null, null, metadata !"float", i32 0, i64 32, i64 32, i64 0, i32 0, i32 4} ; [ DW_TAG_base_type ] [float] [line 0, size 32, align 32, offset 0, enc DW_ATE_float]
+!11 = metadata !{metadata !12}
+!12 = metadata !{i32 786465, i64 0, i64 4}        ; [ DW_TAG_subrange_type ] [0, 3]
+!13 = metadata !{i32 786478, metadata !1, metadata !5, metadata !"bar", metadata !"bar", metadata !"_Z3barDv4_fS_", i32 15, metadata !14, i1 false, i1 true, i32 0, i32 0, null, i32 256, i1 false, float (<4 x float>, <4 x float>)* @_Z3barDv4_fS_, null, null, metadata !2, i32 16} ; [ DW_TAG_subprogram ] [line 15] [def] [scope 16] [bar]
+!14 = metadata !{i32 786453, i32 0, null, metadata !"", i32 0, i64 0, i64 0, i64 0, i32 0, null, metadata !15, i32 0, null, null, null} ; [ DW_TAG_subroutine_type ] [line 0, size 0, align 0, offset 0] [from ]
+!15 = metadata !{metadata !10, metadata !8, metadata !8}
+!16 = metadata !{i32 2, metadata !"Dwarf Version", i32 4}
+!17 = metadata !{i32 1, metadata !"Debug Info Version", i32 1}
+!18 = metadata !{metadata !"clang version 3.5 (trunk 189234:200463M) (llvm/trunk 200463)"}
+!19 = metadata !{i32 786689, metadata !4, metadata !"a", metadata !5, i32 16777221, metadata !8, i32 0, i32 0} ; [ DW_TAG_arg_variable ] [a] [line 5]
+!20 = metadata !{i32 5, i32 0, metadata !4, null}
+!21 = metadata !{i32 786689, metadata !4, metadata !"b", metadata !5, i32 33554437, metadata !8, i32 0, i32 0} ; [ DW_TAG_arg_variable ] [b] [line 5]
+!22 = metadata !{<4 x float> <float 3.000000e+00, float 3.000000e+00, float 3.000000e+00, float 3.000000e+00>}
+!23 = metadata !{i32 786688, metadata !4, metadata !"three", metadata !5, i32 7, metadata !8, i32 0, i32 0} ; [ DW_TAG_auto_variable ] [three] [line 7]
+!24 = metadata !{i32 7, i32 0, metadata !4, null}
+!25 = metadata !{<4 x float> <float 5.000000e+00, float 5.000000e+00, float 5.000000e+00, float 5.000000e+00>}
+!26 = metadata !{i32 786688, metadata !4, metadata !"five", metadata !5, i32 8, metadata !8, i32 0, i32 0} ; [ DW_TAG_auto_variable ] [five] [line 8]
+!27 = metadata !{i32 8, i32 0, metadata !4, null} ; [ DW_TAG_imported_declaration ]
+!28 = metadata !{i32 9, i32 0, metadata !4, null}
+!29 = metadata !{i32 10, i32 0, metadata !4, null}
+!30 = metadata !{i32 11, i32 0, metadata !4, null}
+!31 = metadata !{i32 786689, metadata !13, metadata !"a", metadata !5, i32 16777231, metadata !8, i32 0, i32 0} ; [ DW_TAG_arg_variable ] [a] [line 15]
+!32 = metadata !{i32 15, i32 0, metadata !13, null}
+!33 = metadata !{i32 786689, metadata !13, metadata !"b", metadata !5, i32 33554447, metadata !8, i32 0, i32 0} ; [ DW_TAG_arg_variable ] [b] [line 15]
+!34 = metadata !{i32 17, i32 0, metadata !13, null}
+!35 = metadata !{i32 786688, metadata !13, metadata !"tmp", metadata !5, i32 17, metadata !8, i32 0, i32 0} ; [ DW_TAG_auto_variable ] [tmp] [line 17]
+!36 = metadata !{i32 18, i32 0, metadata !13, null}


More information about the llvm-commits mailing list