[llvm] r286814 - [InlineCost] Remove skew when calculating call costs

James Molloy via llvm-commits llvm-commits at lists.llvm.org
Wed Nov 16 03:54:26 PST 2016


Hi David, Gerolf, Mikail,

Sorry I’m late to this thread; thanks for bringing my attention to this.

The change itself did gain consensus that it was a good change, in so far as it removed some fairly obvious brokenness. I benchmarked the change fairly heavily, but as it affects the inlining heuristics I did expect quite some fallout. The datapoints you pointed to are very useful.

The intent was always to follow up with a change to the thresholds; in particular to the -Oz threshold, which is currently 25 and should be around zero. I must say that I expected the impact on -Os and -O3 to be much lower - certainly at -O3 I expected almost no impact except for highly repetitive (such as auto generated) code, as the inline threshold is 225. Your data shows that isn’t the case, which is news to me!

It’s clear from the LNT data that we’re doing more inlining, even at -O3, than we were before. Unfortunately it doesn’t look like that extra inlining is actually improving performance, at least on the x86 numbers (the arm64 numbers didn’t have performance data available)!

Annoyingly, it’s very difficult for me to a-priori test my proposed threshold changes on all the architectures people care about, so I have two suggestions on how to move forward:
  1) I back out this change, merge it with a threshold update and ask people to test it if possible., OR
  2) I submit a proposed threshold change for review, submit it when accepted and monitor green dragon to see if the changes bring things back in line. If not, I’ll need to back out both and start from scratch.

Do you (Gerolf, David?) have suggestions on a preferred route forward?

Thanks,

James

On 16 Nov 2016, at 07:19, Xinliang David Li <xinliangli at gmail.com<mailto:xinliangli at gmail.com>> wrote:



On Tue, Nov 15, 2016 at 7:01 PM, Gerolf Hoflehner via llvm-commits <llvm-commits at lists.llvm.org<mailto:llvm-commits at lists.llvm.org>> wrote:
The compiler should certainly inline the test case, but there seems to be a high compile-time/code size cost on average for Os/O2. It would be great to have a clearer picture of the performance gains for the costs. I also hope to get a few eyes on the code review from inliner experts. It is not clear to me that the fix is the correct solution to the problem:

There are 3 uses of call penalty after this commit: when the call cost is estimated it is added, but for the benefits instr cost + call penalty is subtracted. Shouldn’t it be one of instr cost, call penalty or instr cost + call penalty in both cases?

In call cost computation, Instrcost is also added. See analyzeBlock at the callsite where instruction visitor is invoked: if it is not simplified, the cost is added.

David

This is all about raising awareness.  if the consensus is this is this fix is right thing to do cost is just something to be aware of and move on.

Thanks
Gerolf

PS (more out of curiosity):

Also, and this is somewhat orthogonal to the specific change, but I noticed the use of CallPenalty in the following case:

if (TTI.getFPOpCost(I->getType()) == TargetTransformInfo::TCC_Expensive || hasSoftFloatAttr) Cost += InlineConstants::CallPenalty; Why is CallPenalty a good cost estimate under this condition?
On Nov 14, 2016, at 5:48 PM, Mikhail Zolotukhin <mzolotukhin at apple.com<mailto:mzolotukhin at apple.com>> wrote:

Hi James,

Green dragon detected some big regressions, most probably from your change (up to 30% code size and compile time regressions on ARM64 [1] and x86 [2]). Did you plan to tweak the thresholds after the change?

Thanks,
Michael

[1] ARM64, Os: http://104.154.54.203/db_default/v4/nts/13248
[2] x86, O3+flto: http://104.154.54.203/db_default/v4/nts/13245


On Nov 14, 2016, at 3:14 AM, James Molloy via llvm-commits <llvm-commits at lists.llvm.org<mailto:llvm-commits at lists.llvm.org>> wrote:

Author: jamesm
Date: Mon Nov 14 05:14:41 2016
New Revision: 286814

URL: http://llvm.org/viewvc/llvm-project?rev=286814&view=rev
Log:
[InlineCost] Remove skew when calculating call costs

When calculating the cost of a call instruction we were applying a heuristic penalty as well as the cost of the instruction itself.

However, when calculating the benefit from inlining we weren't discounting the equivalent penalty for the call instruction that would be removed! This caused skew in the calculation and meant we wouldn't inline in the following, trivial case:

 int g() {
   h();
 }
 int f() {
   g();
 }

Modified:
   llvm/trunk/lib/Analysis/InlineCost.cpp
   llvm/trunk/test/Transforms/Inline/alloca-bonus.ll
   llvm/trunk/test/Transforms/Inline/inline-cold-callee.ll
   llvm/trunk/test/Transforms/Inline/inline-cold.ll
   llvm/trunk/test/Transforms/Inline/inline-hot-callee.ll
   llvm/trunk/test/Transforms/Inline/inline-hot-callsite.ll
   llvm/trunk/test/Transforms/Inline/inline-optsize.ll
   llvm/trunk/test/Transforms/Inline/inline_unreachable-2.ll
   llvm/trunk/test/Transforms/Inline/optimization-remarks-passed-yaml.ll
   llvm/trunk/test/Transforms/Inline/ptr-diff.ll

Modified: llvm/trunk/lib/Analysis/InlineCost.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/InlineCost.cpp?rev=286814&r1=286813&r2=286814&view=diff
==============================================================================
--- llvm/trunk/lib/Analysis/InlineCost.cpp (original)
+++ llvm/trunk/lib/Analysis/InlineCost.cpp Mon Nov 14 05:14:41 2016
@@ -1255,7 +1255,9 @@ bool CallAnalyzer::analyzeCall(CallSite
      Cost -= InlineConstants::InstrCost;
    }
  }
-
+  // The call instruction also disappears after inlining.
+  Cost -= InlineConstants::InstrCost + InlineConstants::CallPenalty;
+
  // If there is only one call of the function, and it has internal linkage,
  // the cost of inlining it drops dramatically.
  bool OnlyOneCallAndLocalLinkage =

Modified: llvm/trunk/test/Transforms/Inline/alloca-bonus.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/Inline/alloca-bonus.ll?rev=286814&r1=286813&r2=286814&view=diff
==============================================================================
--- llvm/trunk/test/Transforms/Inline/alloca-bonus.ll (original)
+++ llvm/trunk/test/Transforms/Inline/alloca-bonus.ll Mon Nov 14 05:14:41 2016
@@ -22,6 +22,7 @@ define void @inner1(i32 *%ptr) {
  %E = bitcast i32* %ptr to i8*
  %F = select i1 false, i32* %ptr, i32* @glbl
  call void @llvm.lifetime.start(i64 0, i8* %E)
+  call void @extern()
  ret void
}

@@ -42,6 +43,7 @@ define void @inner2(i32 *%ptr) {
  %E = bitcast i32* %ptr to i8*
  %F = select i1 false, i32* %ptr, i32* @glbl
  call void @llvm.lifetime.start(i64 0, i8* %E)
+  call void @extern()
  ret void
}

@@ -56,6 +58,7 @@ define void @outer3() {
define void @inner3(i32 *%ptr, i1 %x) {
  %A = icmp eq i32* %ptr, null
  %B = and i1 %x, %A
+  call void @extern()
  br i1 %A, label %bb.true, label %bb.false
bb.true:
  ; This block musn't be counted in the inline cost.
@@ -97,6 +100,7 @@ define void @outer4(i32 %A) {
define void @inner4(i32 *%ptr, i32 %A) {
  %B = getelementptr inbounds i32, i32* %ptr, i32 %A
  %C = icmp eq i32* %ptr, null
+  call void @extern()
  br i1 %C, label %bb.true, label %bb.false
bb.true:
  ; This block musn't be counted in the inline cost.
@@ -139,6 +143,7 @@ define void @outer5() {
define void @inner5(i1 %flag, i32 *%ptr) {
  %A = load i32, i32* %ptr
  store i32 0, i32* %ptr
+  call void @extern()
  %C = getelementptr inbounds i32, i32* %ptr, i32 0
  br i1 %flag, label %if.then, label %exit

@@ -153,3 +158,4 @@ exit:
  ret void
}

+declare void @extern()

Modified: llvm/trunk/test/Transforms/Inline/inline-cold-callee.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/Inline/inline-cold-callee.ll?rev=286814&r1=286813&r2=286814&view=diff
==============================================================================
--- llvm/trunk/test/Transforms/Inline/inline-cold-callee.ll (original)
+++ llvm/trunk/test/Transforms/Inline/inline-cold-callee.ll Mon Nov 14 05:14:41 2016
@@ -9,7 +9,7 @@ define i32 @callee1(i32 %x) !prof !21 {
  %x1 = add i32 %x, 1
  %x2 = add i32 %x1, 1
  %x3 = add i32 %x2, 1
-
+  call void @extern()
  ret i32 %x3
}

@@ -18,7 +18,7 @@ define i32 @callee2(i32 %x) !prof !22 {
  %x1 = add i32 %x, 1
  %x2 = add i32 %x1, 1
  %x3 = add i32 %x2, 1
-
+  call void @extern()
  ret i32 %x3
}

@@ -32,6 +32,8 @@ define i32 @caller2(i32 %y1) !prof !22 {
  ret i32 %y3
}

+declare void @extern()
+
!llvm.module.flags = !{!1}
!21 = !{!"function_entry_count", i64 100}
!22 = !{!"function_entry_count", i64 1}

Modified: llvm/trunk/test/Transforms/Inline/inline-cold.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/Inline/inline-cold.ll?rev=286814&r1=286813&r2=286814&view=diff
==============================================================================
--- llvm/trunk/test/Transforms/Inline/inline-cold.ll (original)
+++ llvm/trunk/test/Transforms/Inline/inline-cold.ll Mon Nov 14 05:14:41 2016
@@ -17,6 +17,7 @@
; Function Attrs: nounwind readnone uwtable
define i32 @simpleFunction(i32 %a) #0 {
entry:
+  call void @extern()
  %a1 = load volatile i32, i32* @a
  %x1 = add i32 %a1,  %a1
  %a2 = load volatile i32, i32* @a
@@ -54,6 +55,7 @@ define i32 @ColdFunction(i32 %a) #1 {
; DEFAULT-LABEL: @ColdFunction
; DEFAULT: ret
entry:
+  call void @extern()
  %a1 = load volatile i32, i32* @a
  %x1 = add i32 %a1,  %a1
  %a2 = load volatile i32, i32* @a
@@ -91,6 +93,7 @@ define i32 @ColdFunction2(i32 %a) #1 {
; DEFAULT-LABEL: @ColdFunction2
; DEFAULT: ret
entry:
+  call void @extern()
  %a1 = load volatile i32, i32* @a
  %x1 = add i32 %a1,  %a1
  %a2 = load volatile i32, i32* @a
@@ -196,5 +199,6 @@ entry:
  ret i32 %add
}

+declare void @extern()
attributes #0 = { nounwind readnone uwtable }
attributes #1 = { nounwind cold readnone uwtable }

Modified: llvm/trunk/test/Transforms/Inline/inline-hot-callee.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/Inline/inline-hot-callee.ll?rev=286814&r1=286813&r2=286814&view=diff
==============================================================================
--- llvm/trunk/test/Transforms/Inline/inline-hot-callee.ll (original)
+++ llvm/trunk/test/Transforms/Inline/inline-hot-callee.ll Mon Nov 14 05:14:41 2016
@@ -9,7 +9,7 @@ define i32 @callee1(i32 %x) !prof !21 {
  %x1 = add i32 %x, 1
  %x2 = add i32 %x1, 1
  %x3 = add i32 %x2, 1
-
+  call void @extern()
  ret i32 %x3
}

@@ -18,7 +18,7 @@ define i32 @callee2(i32 %x) !prof !22 {
  %x1 = add i32 %x, 1
  %x2 = add i32 %x1, 1
  %x3 = add i32 %x2, 1
-
+  call void @extern()
  ret i32 %x3
}

@@ -32,6 +32,8 @@ define i32 @caller2(i32 %y1) !prof !22 {
  ret i32 %y3
}

+declare void @extern()
+
!llvm.module.flags = !{!1}
!21 = !{!"function_entry_count", i64 300}
!22 = !{!"function_entry_count", i64 1}

Modified: llvm/trunk/test/Transforms/Inline/inline-hot-callsite.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/Inline/inline-hot-callsite.ll?rev=286814&r1=286813&r2=286814&view=diff
==============================================================================
--- llvm/trunk/test/Transforms/Inline/inline-hot-callsite.ll (original)
+++ llvm/trunk/test/Transforms/Inline/inline-hot-callsite.ll Mon Nov 14 05:14:41 2016
@@ -9,7 +9,7 @@ define i32 @callee1(i32 %x) {
  %x1 = add i32 %x, 1
  %x2 = add i32 %x1, 1
  %x3 = add i32 %x2, 1
-
+  call void @extern()
  ret i32 %x3
}

@@ -18,7 +18,7 @@ define i32 @callee2(i32 %x) {
  %x1 = add i32 %x, 1
  %x2 = add i32 %x1, 1
  %x3 = add i32 %x2, 1
-
+  call void @extern()
  ret i32 %x3
}

@@ -32,6 +32,8 @@ define i32 @caller2(i32 %y1) {
  ret i32 %y3
}

+declare void @extern()
+
!llvm.module.flags = !{!1}
!21 = !{!"branch_weights", i64 300}
!22 = !{!"branch_weights", i64 1}

Modified: llvm/trunk/test/Transforms/Inline/inline-optsize.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/Inline/inline-optsize.ll?rev=286814&r1=286813&r2=286814&view=diff
==============================================================================
--- llvm/trunk/test/Transforms/Inline/inline-optsize.ll (original)
+++ llvm/trunk/test/Transforms/Inline/inline-optsize.ll Mon Nov 14 05:14:41 2016
@@ -12,6 +12,7 @@
; This function should be larger than the inline threshold for -Oz (25), but
; smaller than the inline threshold for optsize (75).
define i32 @inner() {
+  call void @extern()
  %a1 = load volatile i32, i32* @a
  %x1 = add i32 %a1,  %a1
  %a2 = load volatile i32, i32* @a
@@ -42,3 +43,5 @@ define i32 @outer2() minsize {
   %r = call i32 @inner()
   ret i32 %r
}
+
+declare void @extern()
\ No newline at end of file

Modified: llvm/trunk/test/Transforms/Inline/inline_unreachable-2.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/Inline/inline_unreachable-2.ll?rev=286814&r1=286813&r2=286814&view=diff
==============================================================================
--- llvm/trunk/test/Transforms/Inline/inline_unreachable-2.ll (original)
+++ llvm/trunk/test/Transforms/Inline/inline_unreachable-2.ll Mon Nov 14 05:14:41 2016
@@ -8,6 +8,7 @@ define void @caller(i32 %a, i1 %b) #0 {
}

define void @callee(i32 %a, i1 %b) {
+  call void @extern()
  call void asm sideeffect "", ""()
  br i1 %b, label %bb1, label %bb2
bb1:
@@ -17,3 +18,5 @@ bb2:
  call void asm sideeffect "", ""()
  ret void
}
+
+declare void @extern()
\ No newline at end of file

Modified: llvm/trunk/test/Transforms/Inline/optimization-remarks-passed-yaml.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/Inline/optimization-remarks-passed-yaml.ll?rev=286814&r1=286813&r2=286814&view=diff
==============================================================================
--- llvm/trunk/test/Transforms/Inline/optimization-remarks-passed-yaml.ll (original)
+++ llvm/trunk/test/Transforms/Inline/optimization-remarks-passed-yaml.ll Mon Nov 14 05:14:41 2016
@@ -12,7 +12,7 @@
;  4       return foo();
;  5     }

-; CHECK:      remark: /tmp/s.c:4:10: foo can be inlined into bar with cost={{[0-9]+}} (threshold={{[0-9]+}}) (hotness: 30)
+; CHECK:      remark: /tmp/s.c:4:10: foo can be inlined into bar with cost={{[0-9\-]+}} (threshold={{[0-9]+}}) (hotness: 30)
; CHECK-NEXT: remark: /tmp/s.c:4:10: foo inlined into bar (hotness: 30)

; YAML:      --- !Analysis
@@ -28,7 +28,7 @@
; YAML-NEXT:   - Caller: bar
; YAML-NEXT:     DebugLoc:        { File: /tmp/s.c, Line: 3, Column: 0 }
; YAML-NEXT:   - String: ' with cost='
-; YAML-NEXT:   - Cost: '{{[0-9]+}}'
+; YAML-NEXT:   - Cost: '{{[0-9\-]+}}'
; YAML-NEXT:   - String: ' (threshold='
; YAML-NEXT:   - Threshold: '{{[0-9]+}}'
; YAML-NEXT:   - String: ')'

Modified: llvm/trunk/test/Transforms/Inline/ptr-diff.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/Inline/ptr-diff.ll?rev=286814&r1=286813&r2=286814&view=diff
==============================================================================
--- llvm/trunk/test/Transforms/Inline/ptr-diff.ll (original)
+++ llvm/trunk/test/Transforms/Inline/ptr-diff.ll Mon Nov 14 05:14:41 2016
@@ -4,7 +4,7 @@ target datalayout = "p:32:32-p1:64:64-p2

define i32 @outer1() {
; CHECK-LABEL: @outer1(
-; CHECK-NOT: call
+; CHECK-NOT: call i32
; CHECK: ret i32

  %ptr = alloca i32
@@ -15,6 +15,7 @@ define i32 @outer1() {
}

define i32 @inner1(i32* %begin, i32* %end) {
+  call void @extern()
  %begin.i = ptrtoint i32* %begin to i32
  %end.i = ptrtoint i32* %end to i32
  %distance = sub i32 %end.i, %begin.i
@@ -43,6 +44,7 @@ define i32 @outer2(i32* %ptr) {
}

define i32 @inner2(i32* %begin, i32* %end) {
+  call void @extern()
  %begin.i = ptrtoint i32* %begin to i32
  %end.i = ptrtoint i32* %end to i32
  %distance = sub i32 %end.i, %begin.i
@@ -60,6 +62,7 @@ else:
; The inttoptrs are free since it is a smaller integer to a larger
; pointer size
define i32 @inttoptr_free_cost(i32 %a, i32 %b, i32 %c) {
+  call void @extern()
  %p1 = inttoptr i32 %a to i32 addrspace(1)*
  %p2 = inttoptr i32 %b to i32 addrspace(1)*
  %p3 = inttoptr i32 %c to i32 addrspace(1)*
@@ -73,7 +76,7 @@ define i32 @inttoptr_free_cost(i32 %a, i

define i32 @inttoptr_free_cost_user(i32 %begin, i32 %end) {
; CHECK-LABEL: @inttoptr_free_cost_user(
-; CHECK-NOT: call
+; CHECK-NOT: call i32
  %x = call i32 @inttoptr_free_cost(i32 %begin, i32 %end, i32 9)
  ret i32 %x
}
@@ -81,6 +84,7 @@ define i32 @inttoptr_free_cost_user(i32
; The inttoptrs have a cost since it is a larger integer to a smaller
; pointer size
define i32 @inttoptr_cost_smaller_ptr(i32 %a, i32 %b, i32 %c) {
+  call void @extern()
  %p1 = inttoptr i32 %a to i32 addrspace(2)*
  %p2 = inttoptr i32 %b to i32 addrspace(2)*
  %p3 = inttoptr i32 %c to i32 addrspace(2)*
@@ -94,8 +98,9 @@ define i32 @inttoptr_cost_smaller_ptr(i3

define i32 @inttoptr_cost_smaller_ptr_user(i32 %begin, i32 %end) {
; CHECK-LABEL: @inttoptr_cost_smaller_ptr_user(
-; CHECK: call
+; CHECK: call i32
  %x = call i32 @inttoptr_cost_smaller_ptr(i32 %begin, i32 %end, i32 9)
  ret i32 %x
}

+declare void @extern()
\ No newline at end of file


_______________________________________________
llvm-commits mailing list
llvm-commits at lists.llvm.org<mailto:llvm-commits at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits



_______________________________________________
llvm-commits mailing list
llvm-commits at lists.llvm.org<mailto:llvm-commits at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20161116/51c919b5/attachment.html>


More information about the llvm-commits mailing list