[llvm] r286814 - [InlineCost] Remove skew when calculating call costs

Xinliang David Li via llvm-commits llvm-commits at lists.llvm.org
Wed Nov 16 11:49:12 PST 2016


On Wed, Nov 16, 2016 at 3:54 AM, James Molloy <James.Molloy at arm.com> wrote:

> Hi David, Gerolf, Mikail,
>
> Sorry I’m late to this thread; thanks for bringing my attention to this.
>
> The change itself did gain consensus that it was a good change, in so far
> as it removed some fairly obvious brokenness. I benchmarked the change
> fairly heavily, but as it affects the inlining heuristics I did expect
> quite some fallout. The datapoints you pointed to are very useful.
>
> The intent was always to follow up with a change to the thresholds; in
> particular to the -Oz threshold, which is currently 25 and should be around
> zero. I must say that I expected the impact on -Os and -O3 to be much lower
> - certainly at -O3 I expected almost no impact except for highly repetitive
> (such as auto generated) code, as the inline threshold is 225. Your data
> shows that isn’t the case, which is news to me!
>
> It’s clear from the LNT data that we’re doing more inlining, even at -O3,
> than we were before. Unfortunately it doesn’t look like that extra inlining
> is actually improving performance, at least on the x86 numbers (the arm64
> numbers didn’t have performance data available)!
>
> Annoyingly, it’s very difficult for me to a-priori test my proposed
> threshold changes on all the architectures people care about, so I have two
> suggestions on how to move forward:
>   1) I back out this change, merge it with a threshold update and ask
> people to test it if possible., OR
>   2) I submit a proposed threshold change for review, submit it when
> accepted and monitor green dragon to see if the changes bring things back
> in line. If not, I’ll need to back out both and start from scratch.
>
> Do you (Gerolf, David?) have suggestions on a preferred route forward?
>

Rolling forward seems like a better approach to me.

David



>
> Thanks,
>
> James
>
> On 16 Nov 2016, at 07:19, Xinliang David Li <xinliangli at gmail.com> wrote:
>
>
>
> On Tue, Nov 15, 2016 at 7:01 PM, Gerolf Hoflehner via llvm-commits <
> llvm-commits at lists.llvm.org> wrote:
>
>> The compiler should certainly inline the test case, but there seems to be
>> a high compile-time/code size cost on average for Os/O2. It would be great
>> to have a clearer picture of the performance gains for the costs. I also
>> hope to get a few eyes on the code review from inliner experts. It is not
>> clear to me that the fix is the correct solution to the problem:
>>
>> There are 3 uses of call penalty after this commit: when the call cost is
>> estimated it is added, but for the benefits instr cost + call penalty is
>> subtracted. Shouldn’t it be one of instr cost, call penalty or instr cost +
>> call penalty in both cases?
>>
>
> In call cost computation, Instrcost is also added. See analyzeBlock at the
> callsite where instruction visitor is invoked: if it is not simplified, the
> cost is added.
>
> David
>
>>
>> This is all about raising awareness.  if the consensus is this is this
>> fix is right thing to do cost is just something to be aware of and move on.
>>
>> Thanks
>> Gerolf
>>
>> PS (more out of curiosity):
>>
>> Also, and this is somewhat orthogonal to the specific change, but I
>> noticed the use of CallPenalty in the following case:
>>
>> if (TTI.getFPOpCost(I->getType()) == TargetTransformInfo::TCC_Expensive ||
>> hasSoftFloatAttr) Cost += InlineConstants::CallPenalty; Why is
>> CallPenalty a good cost estimate under this condition?
>>
>> On Nov 14, 2016, at 5:48 PM, Mikhail Zolotukhin <mzolotukhin at apple.com>
>> wrote:
>>
>> Hi James,
>>
>> Green dragon detected some big regressions, most probably from your
>> change (up to 30% code size and compile time regressions on ARM64 [1] and
>> x86 [2]). Did you plan to tweak the thresholds after the change?
>>
>> Thanks,
>> Michael
>>
>> [1] ARM64, Os: http://104.154.54.203/db_default/v4/nts/13248
>> [2] x86, O3+flto: http://104.154.54.203/db_default/v4/nts/13245
>>
>>
>> On Nov 14, 2016, at 3:14 AM, James Molloy via llvm-commits <
>> llvm-commits at lists.llvm.org> wrote:
>>
>> Author: jamesm
>> Date: Mon Nov 14 05:14:41 2016
>> New Revision: 286814
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=286814&view=rev
>> Log:
>> [InlineCost] Remove skew when calculating call costs
>>
>> When calculating the cost of a call instruction we were applying a
>> heuristic penalty as well as the cost of the instruction itself.
>>
>> However, when calculating the benefit from inlining we weren't
>> discounting the equivalent penalty for the call instruction that would be
>> removed! This caused skew in the calculation and meant we wouldn't inline
>> in the following, trivial case:
>>
>>  int g() {
>>    h();
>>  }
>>  int f() {
>>    g();
>>  }
>>
>> Modified:
>>    llvm/trunk/lib/Analysis/InlineCost.cpp
>>    llvm/trunk/test/Transforms/Inline/alloca-bonus.ll
>>    llvm/trunk/test/Transforms/Inline/inline-cold-callee.ll
>>    llvm/trunk/test/Transforms/Inline/inline-cold.ll
>>    llvm/trunk/test/Transforms/Inline/inline-hot-callee.ll
>>    llvm/trunk/test/Transforms/Inline/inline-hot-callsite.ll
>>    llvm/trunk/test/Transforms/Inline/inline-optsize.ll
>>    llvm/trunk/test/Transforms/Inline/inline_unreachable-2.ll
>>    llvm/trunk/test/Transforms/Inline/optimization-remarks-passed-yaml.ll
>>    llvm/trunk/test/Transforms/Inline/ptr-diff.ll
>>
>> Modified: llvm/trunk/lib/Analysis/InlineCost.cpp
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/An
>> alysis/InlineCost.cpp?rev=286814&r1=286813&r2=286814&view=diff
>> ============================================================
>> ==================
>> --- llvm/trunk/lib/Analysis/InlineCost.cpp (original)
>> +++ llvm/trunk/lib/Analysis/InlineCost.cpp Mon Nov 14 05:14:41 2016
>> @@ -1255,7 +1255,9 @@ bool CallAnalyzer::analyzeCall(CallSite
>>       Cost -= InlineConstants::InstrCost;
>>     }
>>   }
>> -
>> +  // The call instruction also disappears after inlining.
>> +  Cost -= InlineConstants::InstrCost + InlineConstants::CallPenalty;
>> +
>>   // If there is only one call of the function, and it has internal
>> linkage,
>>   // the cost of inlining it drops dramatically.
>>   bool OnlyOneCallAndLocalLinkage =
>>
>> Modified: llvm/trunk/test/Transforms/Inline/alloca-bonus.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/T
>> ransforms/Inline/alloca-bonus.ll?rev=286814&r1=286813&r2=286814&view=diff
>> ============================================================
>> ==================
>> --- llvm/trunk/test/Transforms/Inline/alloca-bonus.ll (original)
>> +++ llvm/trunk/test/Transforms/Inline/alloca-bonus.ll Mon Nov 14
>> 05:14:41 2016
>> @@ -22,6 +22,7 @@ define void @inner1(i32 *%ptr) {
>>   %E = bitcast i32* %ptr to i8*
>>   %F = select i1 false, i32* %ptr, i32* @glbl
>>   call void @llvm.lifetime.start(i64 0, i8* %E)
>> +  call void @extern()
>>   ret void
>> }
>>
>> @@ -42,6 +43,7 @@ define void @inner2(i32 *%ptr) {
>>   %E = bitcast i32* %ptr to i8*
>>   %F = select i1 false, i32* %ptr, i32* @glbl
>>   call void @llvm.lifetime.start(i64 0, i8* %E)
>> +  call void @extern()
>>   ret void
>> }
>>
>> @@ -56,6 +58,7 @@ define void @outer3() {
>> define void @inner3(i32 *%ptr, i1 %x) {
>>   %A = icmp eq i32* %ptr, null
>>   %B = and i1 %x, %A
>> +  call void @extern()
>>   br i1 %A, label %bb.true, label %bb.false
>> bb.true:
>>   ; This block musn't be counted in the inline cost.
>> @@ -97,6 +100,7 @@ define void @outer4(i32 %A) {
>> define void @inner4(i32 *%ptr, i32 %A) {
>>   %B = getelementptr inbounds i32, i32* %ptr, i32 %A
>>   %C = icmp eq i32* %ptr, null
>> +  call void @extern()
>>   br i1 %C, label %bb.true, label %bb.false
>> bb.true:
>>   ; This block musn't be counted in the inline cost.
>> @@ -139,6 +143,7 @@ define void @outer5() {
>> define void @inner5(i1 %flag, i32 *%ptr) {
>>   %A = load i32, i32* %ptr
>>   store i32 0, i32* %ptr
>> +  call void @extern()
>>   %C = getelementptr inbounds i32, i32* %ptr, i32 0
>>   br i1 %flag, label %if.then, label %exit
>>
>> @@ -153,3 +158,4 @@ exit:
>>   ret void
>> }
>>
>> +declare void @extern()
>>
>> Modified: llvm/trunk/test/Transforms/Inline/inline-cold-callee.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/T
>> ransforms/Inline/inline-cold-callee.ll?rev=286814&r1=286813&
>> r2=286814&view=diff
>> ============================================================
>> ==================
>> --- llvm/trunk/test/Transforms/Inline/inline-cold-callee.ll (original)
>> +++ llvm/trunk/test/Transforms/Inline/inline-cold-callee.ll Mon Nov 14
>> 05:14:41 2016
>> @@ -9,7 +9,7 @@ define i32 @callee1(i32 %x) !prof !21 {
>>   %x1 = add i32 %x, 1
>>   %x2 = add i32 %x1, 1
>>   %x3 = add i32 %x2, 1
>> -
>> +  call void @extern()
>>   ret i32 %x3
>> }
>>
>> @@ -18,7 +18,7 @@ define i32 @callee2(i32 %x) !prof !22 {
>>   %x1 = add i32 %x, 1
>>   %x2 = add i32 %x1, 1
>>   %x3 = add i32 %x2, 1
>> -
>> +  call void @extern()
>>   ret i32 %x3
>> }
>>
>> @@ -32,6 +32,8 @@ define i32 @caller2(i32 %y1) !prof !22 {
>>   ret i32 %y3
>> }
>>
>> +declare void @extern()
>> +
>> !llvm.module.flags = !{!1}
>> !21 = !{!"function_entry_count", i64 100}
>> !22 = !{!"function_entry_count", i64 1}
>>
>> Modified: llvm/trunk/test/Transforms/Inline/inline-cold.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/T
>> ransforms/Inline/inline-cold.ll?rev=286814&r1=286813&r2=286814&view=diff
>> ============================================================
>> ==================
>> --- llvm/trunk/test/Transforms/Inline/inline-cold.ll (original)
>> +++ llvm/trunk/test/Transforms/Inline/inline-cold.ll Mon Nov 14 05:14:41
>> 2016
>> @@ -17,6 +17,7 @@
>> ; Function Attrs: nounwind readnone uwtable
>> define i32 @simpleFunction(i32 %a) #0 {
>> entry:
>> +  call void @extern()
>>   %a1 = load volatile i32, i32* @a
>>   %x1 = add i32 %a1,  %a1
>>   %a2 = load volatile i32, i32* @a
>> @@ -54,6 +55,7 @@ define i32 @ColdFunction(i32 %a) #1 {
>> ; DEFAULT-LABEL: @ColdFunction
>> ; DEFAULT: ret
>> entry:
>> +  call void @extern()
>>   %a1 = load volatile i32, i32* @a
>>   %x1 = add i32 %a1,  %a1
>>   %a2 = load volatile i32, i32* @a
>> @@ -91,6 +93,7 @@ define i32 @ColdFunction2(i32 %a) #1 {
>> ; DEFAULT-LABEL: @ColdFunction2
>> ; DEFAULT: ret
>> entry:
>> +  call void @extern()
>>   %a1 = load volatile i32, i32* @a
>>   %x1 = add i32 %a1,  %a1
>>   %a2 = load volatile i32, i32* @a
>> @@ -196,5 +199,6 @@ entry:
>>   ret i32 %add
>> }
>>
>> +declare void @extern()
>> attributes #0 = { nounwind readnone uwtable }
>> attributes #1 = { nounwind cold readnone uwtable }
>>
>> Modified: llvm/trunk/test/Transforms/Inline/inline-hot-callee.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/T
>> ransforms/Inline/inline-hot-callee.ll?rev=286814&r1=286813&
>> r2=286814&view=diff
>> ============================================================
>> ==================
>> --- llvm/trunk/test/Transforms/Inline/inline-hot-callee.ll (original)
>> +++ llvm/trunk/test/Transforms/Inline/inline-hot-callee.ll Mon Nov 14
>> 05:14:41 2016
>> @@ -9,7 +9,7 @@ define i32 @callee1(i32 %x) !prof !21 {
>>   %x1 = add i32 %x, 1
>>   %x2 = add i32 %x1, 1
>>   %x3 = add i32 %x2, 1
>> -
>> +  call void @extern()
>>   ret i32 %x3
>> }
>>
>> @@ -18,7 +18,7 @@ define i32 @callee2(i32 %x) !prof !22 {
>>   %x1 = add i32 %x, 1
>>   %x2 = add i32 %x1, 1
>>   %x3 = add i32 %x2, 1
>> -
>> +  call void @extern()
>>   ret i32 %x3
>> }
>>
>> @@ -32,6 +32,8 @@ define i32 @caller2(i32 %y1) !prof !22 {
>>   ret i32 %y3
>> }
>>
>> +declare void @extern()
>> +
>> !llvm.module.flags = !{!1}
>> !21 = !{!"function_entry_count", i64 300}
>> !22 = !{!"function_entry_count", i64 1}
>>
>> Modified: llvm/trunk/test/Transforms/Inline/inline-hot-callsite.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/T
>> ransforms/Inline/inline-hot-callsite.ll?rev=286814&r1=286813
>> &r2=286814&view=diff
>> ============================================================
>> ==================
>> --- llvm/trunk/test/Transforms/Inline/inline-hot-callsite.ll (original)
>> +++ llvm/trunk/test/Transforms/Inline/inline-hot-callsite.ll Mon Nov 14
>> 05:14:41 2016
>> @@ -9,7 +9,7 @@ define i32 @callee1(i32 %x) {
>>   %x1 = add i32 %x, 1
>>   %x2 = add i32 %x1, 1
>>   %x3 = add i32 %x2, 1
>> -
>> +  call void @extern()
>>   ret i32 %x3
>> }
>>
>> @@ -18,7 +18,7 @@ define i32 @callee2(i32 %x) {
>>   %x1 = add i32 %x, 1
>>   %x2 = add i32 %x1, 1
>>   %x3 = add i32 %x2, 1
>> -
>> +  call void @extern()
>>   ret i32 %x3
>> }
>>
>> @@ -32,6 +32,8 @@ define i32 @caller2(i32 %y1) {
>>   ret i32 %y3
>> }
>>
>> +declare void @extern()
>> +
>> !llvm.module.flags = !{!1}
>> !21 = !{!"branch_weights", i64 300}
>> !22 = !{!"branch_weights", i64 1}
>>
>> Modified: llvm/trunk/test/Transforms/Inline/inline-optsize.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/T
>> ransforms/Inline/inline-optsize.ll?rev=286814&r1=286813&r2=
>> 286814&view=diff
>> ============================================================
>> ==================
>> --- llvm/trunk/test/Transforms/Inline/inline-optsize.ll (original)
>> +++ llvm/trunk/test/Transforms/Inline/inline-optsize.ll Mon Nov 14
>> 05:14:41 2016
>> @@ -12,6 +12,7 @@
>> ; This function should be larger than the inline threshold for -Oz (25),
>> but
>> ; smaller than the inline threshold for optsize (75).
>> define i32 @inner() {
>> +  call void @extern()
>>   %a1 = load volatile i32, i32* @a
>>   %x1 = add i32 %a1,  %a1
>>   %a2 = load volatile i32, i32* @a
>> @@ -42,3 +43,5 @@ define i32 @outer2() minsize {
>>    %r = call i32 @inner()
>>    ret i32 %r
>> }
>> +
>> +declare void @extern()
>> \ No newline at end of file
>>
>> Modified: llvm/trunk/test/Transforms/Inline/inline_unreachable-2.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/T
>> ransforms/Inline/inline_unreachable-2.ll?rev=286814&r1=
>> 286813&r2=286814&view=diff
>> ============================================================
>> ==================
>> --- llvm/trunk/test/Transforms/Inline/inline_unreachable-2.ll (original)
>> +++ llvm/trunk/test/Transforms/Inline/inline_unreachable-2.ll Mon Nov 14
>> 05:14:41 2016
>> @@ -8,6 +8,7 @@ define void @caller(i32 %a, i1 %b) #0 {
>> }
>>
>> define void @callee(i32 %a, i1 %b) {
>> +  call void @extern()
>>   call void asm sideeffect "", ""()
>>   br i1 %b, label %bb1, label %bb2
>> bb1:
>> @@ -17,3 +18,5 @@ bb2:
>>   call void asm sideeffect "", ""()
>>   ret void
>> }
>> +
>> +declare void @extern()
>> \ No newline at end of file
>>
>> Modified: llvm/trunk/test/Transforms/Inline/optimization-remarks-passe
>> d-yaml.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/T
>> ransforms/Inline/optimization-remarks-passed-yaml.ll?rev=
>> 286814&r1=286813&r2=286814&view=diff
>> ============================================================
>> ==================
>> --- llvm/trunk/test/Transforms/Inline/optimization-remarks-passed-yaml.ll
>> (original)
>> +++ llvm/trunk/test/Transforms/Inline/optimization-remarks-passed-yaml.ll
>> Mon Nov 14 05:14:41 2016
>> @@ -12,7 +12,7 @@
>> ;  4       return foo();
>> ;  5     }
>>
>> -; CHECK:      remark: /tmp/s.c:4:10: foo can be inlined into bar with
>> cost={{[0-9]+}} (threshold={{[0-9]+}}) (hotness: 30)
>> +; CHECK:      remark: /tmp/s.c:4:10: foo can be inlined into bar with
>> cost={{[0-9\-]+}} (threshold={{[0-9]+}}) (hotness: 30)
>> ; CHECK-NEXT: remark: /tmp/s.c:4:10: foo inlined into bar (hotness: 30)
>>
>> ; YAML:      --- !Analysis
>> @@ -28,7 +28,7 @@
>> ; YAML-NEXT:   - Caller: bar
>> ; YAML-NEXT:     DebugLoc:        { File: /tmp/s.c, Line: 3, Column: 0 }
>> ; YAML-NEXT:   - String: ' with cost='
>> -; YAML-NEXT:   - Cost: '{{[0-9]+}}'
>> +; YAML-NEXT:   - Cost: '{{[0-9\-]+}}'
>> ; YAML-NEXT:   - String: ' (threshold='
>> ; YAML-NEXT:   - Threshold: '{{[0-9]+}}'
>> ; YAML-NEXT:   - String: ')'
>>
>> Modified: llvm/trunk/test/Transforms/Inline/ptr-diff.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/T
>> ransforms/Inline/ptr-diff.ll?rev=286814&r1=286813&r2=286814&view=diff
>> ============================================================
>> ==================
>> --- llvm/trunk/test/Transforms/Inline/ptr-diff.ll (original)
>> +++ llvm/trunk/test/Transforms/Inline/ptr-diff.ll Mon Nov 14 05:14:41
>> 2016
>> @@ -4,7 +4,7 @@ target datalayout = "p:32:32-p1:64:64-p2
>>
>> define i32 @outer1() {
>> ; CHECK-LABEL: @outer1(
>> -; CHECK-NOT: call
>> +; CHECK-NOT: call i32
>> ; CHECK: ret i32
>>
>>   %ptr = alloca i32
>> @@ -15,6 +15,7 @@ define i32 @outer1() {
>> }
>>
>> define i32 @inner1(i32* %begin, i32* %end) {
>> +  call void @extern()
>>   %begin.i = ptrtoint i32* %begin to i32
>>   %end.i = ptrtoint i32* %end to i32
>>   %distance = sub i32 %end.i, %begin.i
>> @@ -43,6 +44,7 @@ define i32 @outer2(i32* %ptr) {
>> }
>>
>> define i32 @inner2(i32* %begin, i32* %end) {
>> +  call void @extern()
>>   %begin.i = ptrtoint i32* %begin to i32
>>   %end.i = ptrtoint i32* %end to i32
>>   %distance = sub i32 %end.i, %begin.i
>> @@ -60,6 +62,7 @@ else:
>> ; The inttoptrs are free since it is a smaller integer to a larger
>> ; pointer size
>> define i32 @inttoptr_free_cost(i32 %a, i32 %b, i32 %c) {
>> +  call void @extern()
>>   %p1 = inttoptr i32 %a to i32 addrspace(1)*
>>   %p2 = inttoptr i32 %b to i32 addrspace(1)*
>>   %p3 = inttoptr i32 %c to i32 addrspace(1)*
>> @@ -73,7 +76,7 @@ define i32 @inttoptr_free_cost(i32 %a, i
>>
>> define i32 @inttoptr_free_cost_user(i32 %begin, i32 %end) {
>> ; CHECK-LABEL: @inttoptr_free_cost_user(
>> -; CHECK-NOT: call
>> +; CHECK-NOT: call i32
>>   %x = call i32 @inttoptr_free_cost(i32 %begin, i32 %end, i32 9)
>>   ret i32 %x
>> }
>> @@ -81,6 +84,7 @@ define i32 @inttoptr_free_cost_user(i32
>> ; The inttoptrs have a cost since it is a larger integer to a smaller
>> ; pointer size
>> define i32 @inttoptr_cost_smaller_ptr(i32 %a, i32 %b, i32 %c) {
>> +  call void @extern()
>>   %p1 = inttoptr i32 %a to i32 addrspace(2)*
>>   %p2 = inttoptr i32 %b to i32 addrspace(2)*
>>   %p3 = inttoptr i32 %c to i32 addrspace(2)*
>> @@ -94,8 +98,9 @@ define i32 @inttoptr_cost_smaller_ptr(i3
>>
>> define i32 @inttoptr_cost_smaller_ptr_user(i32 %begin, i32 %end) {
>> ; CHECK-LABEL: @inttoptr_cost_smaller_ptr_user(
>> -; CHECK: call
>> +; CHECK: call i32
>>   %x = call i32 @inttoptr_cost_smaller_ptr(i32 %begin, i32 %end, i32 9)
>>   ret i32 %x
>> }
>>
>> +declare void @extern()
>> \ No newline at end of file
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>
>>
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
>
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20161116/f83044ac/attachment.html>


More information about the llvm-commits mailing list