[PATCH] Fix for inlining decision affected by debug intrinsics

Dario Domizioli dario.domizioli at gmail.com
Fri Jan 31 10:55:18 PST 2014


Hello LLVM.

I have come across a subtle and rare interaction between debug information,
vector instructions, and the cost heuristic used by the SimpleInliner (i.e.
the InlineCost analysis).

It appears that debug intrinsics are counted in the number of instructions
taken into consideration by the heuristic. They are also later discounted
as being "simplified", so they do not directly affect the inline Cost, but
they are taken into account when evaluating the fraction of instructions
that are vector instructions. This fraction is used to determine whether to
apply a hidden "VectorBonus" to the inlining Threshold, and therefore there
is a subtle interaction which means that debug intrinsics can affect the
inlining decision.

The attached patch fixes the problem by making the inline cost heuristic
skip debug intrinsics altogether when examining instructions.
I have also added a couple of debug prints to the already existing ones,
which I think are generally useful to debug the inline cost heuristic.

I have included a test in which I have manufactured a situation where
vector instructions would be most of the function's code but they are
outnumbered by debug intrinsics when debug information is present. The test
exposes the problem in the current trunk and passes with my fix.

Please submit for me if you accept this patch, as I do not have commit
access.

Cheers,
    Dario Domizioli
    SN Systems - Sony Computer Entertainment
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140131/ae928944/attachment.html>
-------------- next part --------------
Index: lib/Analysis/IPA/InlineCost.cpp
===================================================================
--- lib/Analysis/IPA/InlineCost.cpp	(revision 200563)
+++ lib/Analysis/IPA/InlineCost.cpp	(working copy)
@@ -872,6 +872,14 @@
 /// viable, and true if inlining remains viable.
 bool CallAnalyzer::analyzeBlock(BasicBlock *BB) {
   for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I) {
+    // We need to completely ignore debug intrinsics from the calculation. It
+    // is true that they would be considered "simplified" by the later check,
+    // but they must not affect the total number of instructions either,
+    // because NumInstructions has an effect on the inlining threshold via
+    // VectorBonus. It is just simpler (and more correct) to completely ignore
+    // them and do an early 'continue' here.
+    if (isa<DbgInfoIntrinsic>(I))
+      continue;
     ++NumInstructions;
     if (isa<ExtractElementInst>(I) || I->getType()->isVectorTy())
       ++NumVectorInstructions;
@@ -1178,6 +1186,9 @@
   DEBUG_PRINT_STAT(SROACostSavings);
   DEBUG_PRINT_STAT(SROACostSavingsLost);
   DEBUG_PRINT_STAT(ContainsNoDuplicateCall);
+  DEBUG_PRINT_STAT(Cost);
+  DEBUG_PRINT_STAT(Threshold);
+  DEBUG_PRINT_STAT(VectorBonus);
 #undef DEBUG_PRINT_STAT
 }
 #endif
Index: test/Transforms/Inline/inline-threshold-unaffected-by-debug-intrinsics.ll
===================================================================
--- test/Transforms/Inline/inline-threshold-unaffected-by-debug-intrinsics.ll	(revision 0)
+++ test/Transforms/Inline/inline-threshold-unaffected-by-debug-intrinsics.ll	(revision 0)
@@ -0,0 +1,140 @@
+; RUN: opt < %s -S -inline -inline-threshold=2 | FileCheck %s
+; RUN: opt < %s -S -strip-debug -inline -inline-threshold=2 | FileCheck %s
+
+;
+; This test was generated from the following source:
+;
+; typedef float Vec4 __attribute__((ext_vector_type(4)));
+;
+; Vec4 foo (Vec4 a, Vec4 b)
+; {
+;     Vec4 three = {3.0f, 3.0f, 3.0f, 3.0f};
+;     Vec4 five =  {5.0f, 5.0f, 5.0f, 5.0f};
+;     a = a * three;
+;     b = b * five;
+;     return a + b;
+; }
+;
+; float bar (Vec4 a, Vec4 b)
+; {
+;     Vec4 tmp = foo (a, b);
+;     return tmp[0] + tmp[1] + tmp[2] + tmp[3];
+; }
+;
+;
+; The command used was:
+;   clang -O0 -g -S -emit-llvm foo.cpp -o test.ll
+; And then:
+;   opt -sroa -S test.ll >inline-threshold-unaffected-by-debug-intrinsics.ll
+; (this is to reduce the size of the test case)
+;
+;
+; Check that foo() is inlined both with and without debug information. To
+; do so we run opt with and without -strip-debug and just running the inliner.
+;
+; CHECK: @_Z3barDv4_fS_
+; CHECK-NOT: %call = call <4 x float> @_Z3fooDv4_fS_(
+; CHECK: ret
+;
+; foo() is a function that has a lot of vector operations, which would trigger
+; the application of a large VectorBonus to the inlining threshold. This would
+; allow foo() to be inlined, which is the expected behaviour.
+;
+; However, if debug intrinsics were present, prior to the fix they would be
+; counted in the total number of instructions by the heuristic, causing the
+; relative fraction of vector instructions to fall below the value necessary to
+; be awarded the large VectorBonus.
+;
+; The fix makes sure that debug intrinsics are discounted, so foo() always
+; qualifies for the large VectorBonus.
+;
+; This test assumes that the inline heuristic does not change massively. If it
+; does, this test might start to fail (or pass as a false negative). However,
+; checking the debug traces for the actual values of the Cost, Threshold and
+; VectorBonus would make the test fail for any slight change to the inline
+; heuristic, and I do not think it would make a better test.
+;
+
+; The triple here just enforces a cost model for the instructions, the test does
+; not actually generate x86 code.
+target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-unknown-linux-gnu"
+
+define <4 x float> @_Z3fooDv4_fS_(<4 x float> %a, <4 x float> %b) #0 {
+entry:
+  call void @llvm.dbg.value(metadata !{<4 x float> %a}, i64 0, metadata !19), !dbg !20
+  call void @llvm.dbg.value(metadata !{<4 x float> %b}, i64 0, metadata !21), !dbg !20
+  call void @llvm.dbg.value(metadata !22, i64 0, metadata !23), !dbg !24
+  call void @llvm.dbg.value(metadata !25, i64 0, metadata !26), !dbg !27
+  %mul = fmul <4 x float> %a, <float 3.000000e+00, float 3.000000e+00, float 3.000000e+00, float 3.000000e+00>, !dbg !28
+  call void @llvm.dbg.value(metadata !{<4 x float> %mul}, i64 0, metadata !19), !dbg !28
+  %mul1 = fmul <4 x float> %b, <float 5.000000e+00, float 5.000000e+00, float 5.000000e+00, float 5.000000e+00>, !dbg !29
+  call void @llvm.dbg.value(metadata !{<4 x float> %mul1}, i64 0, metadata !21), !dbg !29
+  %add = fadd <4 x float> %mul, %mul1, !dbg !30
+  ret <4 x float> %add, !dbg !30
+}
+
+declare void @llvm.dbg.declare(metadata, metadata) #1
+
+define float @_Z3barDv4_fS_(<4 x float> %a, <4 x float> %b) #0 {
+entry:
+  call void @llvm.dbg.value(metadata !{<4 x float> %a}, i64 0, metadata !31), !dbg !32
+  call void @llvm.dbg.value(metadata !{<4 x float> %b}, i64 0, metadata !33), !dbg !32
+  %call = call <4 x float> @_Z3fooDv4_fS_(<4 x float> %a, <4 x float> %b), !dbg !34
+  call void @llvm.dbg.value(metadata !{<4 x float> %call}, i64 0, metadata !35), !dbg !34
+  %vecext = extractelement <4 x float> %call, i32 0, !dbg !36
+  %vecext1 = extractelement <4 x float> %call, i32 1, !dbg !36
+  %add = fadd float %vecext, %vecext1, !dbg !36
+  %vecext2 = extractelement <4 x float> %call, i32 2, !dbg !36
+  %add3 = fadd float %add, %vecext2, !dbg !36
+  %vecext4 = extractelement <4 x float> %call, i32 3, !dbg !36
+  %add5 = fadd float %add3, %vecext4, !dbg !36
+  ret float %add5, !dbg !36
+}
+
+declare void @llvm.dbg.value(metadata, i64, metadata) #1
+
+attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
+attributes #1 = { nounwind readnone }
+
+!llvm.dbg.cu = !{!0}
+!llvm.module.flags = !{!16, !17}
+!llvm.ident = !{!18}
+
+!0 = metadata !{i32 786449, metadata !1, i32 4, metadata !"clang version 3.5 (trunk 189234:200463M) (llvm/trunk 200463)", i1 false, metadata !"", i32 0, metadata !2, metadata !2, metadata !3, metadata !2, metadata !2, metadata !""} ; [ DW_TAG_compile_unit ] [/home/dario/dev/test/o059210/foo.cpp] [DW_LANG_C_plus_plus]
+!1 = metadata !{metadata !"foo.cpp", metadata !"/home/dario/dev/test/o059210"}
+!2 = metadata !{i32 0}
+!3 = metadata !{metadata !4, metadata !13}
+!4 = metadata !{i32 786478, metadata !1, metadata !5, metadata !"foo", metadata !"foo", metadata !"_Z3fooDv4_fS_", i32 5, metadata !6, i1 false, i1 true, i32 0, i32 0, null, i32 256, i1 false, <4 x float> (<4 x float>, <4 x float>)* @_Z3fooDv4_fS_, null, null, metadata !2, i32 6} ; [ DW_TAG_subprogram ] [line 5] [def] [scope 6] [foo]
+!5 = metadata !{i32 786473, metadata !1}          ; [ DW_TAG_file_type ] [/home/dario/dev/test/o059210/foo.cpp]
+!6 = metadata !{i32 786453, i32 0, null, metadata !"", i32 0, i64 0, i64 0, i64 0, i32 0, null, metadata !7, i32 0, null, null, null} ; [ DW_TAG_subroutine_type ] [line 0, size 0, align 0, offset 0] [from ]
+!7 = metadata !{metadata !8, metadata !8, metadata !8}
+!8 = metadata !{i32 786454, metadata !1, null, metadata !"Vec4", i32 2, i64 0, i64 0, i64 0, i32 0, metadata !9} ; [ DW_TAG_typedef ] [Vec4] [line 2, size 0, align 0, offset 0] [from ]
+!9 = metadata !{i32 786433, null, null, metadata !"", i32 0, i64 128, i64 128, i32 0, i32 2048, metadata !10, metadata !11, i32 0, null, null, null} ; [ DW_TAG_array_type ] [line 0, size 128, align 128, offset 0] [vector] [from float]
+!10 = metadata !{i32 786468, null, null, metadata !"float", i32 0, i64 32, i64 32, i64 0, i32 0, i32 4} ; [ DW_TAG_base_type ] [float] [line 0, size 32, align 32, offset 0, enc DW_ATE_float]
+!11 = metadata !{metadata !12}
+!12 = metadata !{i32 786465, i64 0, i64 4}        ; [ DW_TAG_subrange_type ] [0, 3]
+!13 = metadata !{i32 786478, metadata !1, metadata !5, metadata !"bar", metadata !"bar", metadata !"_Z3barDv4_fS_", i32 15, metadata !14, i1 false, i1 true, i32 0, i32 0, null, i32 256, i1 false, float (<4 x float>, <4 x float>)* @_Z3barDv4_fS_, null, null, metadata !2, i32 16} ; [ DW_TAG_subprogram ] [line 15] [def] [scope 16] [bar]
+!14 = metadata !{i32 786453, i32 0, null, metadata !"", i32 0, i64 0, i64 0, i64 0, i32 0, null, metadata !15, i32 0, null, null, null} ; [ DW_TAG_subroutine_type ] [line 0, size 0, align 0, offset 0] [from ]
+!15 = metadata !{metadata !10, metadata !8, metadata !8}
+!16 = metadata !{i32 2, metadata !"Dwarf Version", i32 4}
+!17 = metadata !{i32 1, metadata !"Debug Info Version", i32 1}
+!18 = metadata !{metadata !"clang version 3.5 (trunk 189234:200463M) (llvm/trunk 200463)"}
+!19 = metadata !{i32 786689, metadata !4, metadata !"a", metadata !5, i32 16777221, metadata !8, i32 0, i32 0} ; [ DW_TAG_arg_variable ] [a] [line 5]
+!20 = metadata !{i32 5, i32 0, metadata !4, null}
+!21 = metadata !{i32 786689, metadata !4, metadata !"b", metadata !5, i32 33554437, metadata !8, i32 0, i32 0} ; [ DW_TAG_arg_variable ] [b] [line 5]
+!22 = metadata !{<4 x float> <float 3.000000e+00, float 3.000000e+00, float 3.000000e+00, float 3.000000e+00>}
+!23 = metadata !{i32 786688, metadata !4, metadata !"three", metadata !5, i32 7, metadata !8, i32 0, i32 0} ; [ DW_TAG_auto_variable ] [three] [line 7]
+!24 = metadata !{i32 7, i32 0, metadata !4, null}
+!25 = metadata !{<4 x float> <float 5.000000e+00, float 5.000000e+00, float 5.000000e+00, float 5.000000e+00>}
+!26 = metadata !{i32 786688, metadata !4, metadata !"five", metadata !5, i32 8, metadata !8, i32 0, i32 0} ; [ DW_TAG_auto_variable ] [five] [line 8]
+!27 = metadata !{i32 8, i32 0, metadata !4, null} ; [ DW_TAG_imported_declaration ]
+!28 = metadata !{i32 9, i32 0, metadata !4, null}
+!29 = metadata !{i32 10, i32 0, metadata !4, null}
+!30 = metadata !{i32 11, i32 0, metadata !4, null}
+!31 = metadata !{i32 786689, metadata !13, metadata !"a", metadata !5, i32 16777231, metadata !8, i32 0, i32 0} ; [ DW_TAG_arg_variable ] [a] [line 15]
+!32 = metadata !{i32 15, i32 0, metadata !13, null}
+!33 = metadata !{i32 786689, metadata !13, metadata !"b", metadata !5, i32 33554447, metadata !8, i32 0, i32 0} ; [ DW_TAG_arg_variable ] [b] [line 15]
+!34 = metadata !{i32 17, i32 0, metadata !13, null}
+!35 = metadata !{i32 786688, metadata !13, metadata !"tmp", metadata !5, i32 17, metadata !8, i32 0, i32 0} ; [ DW_TAG_auto_variable ] [tmp] [line 17]
+!36 = metadata !{i32 18, i32 0, metadata !13, null}


More information about the llvm-commits mailing list