[llvm-dev] Register Spill Caused by the Reassociation pass
Sanjay Patel via llvm-dev
llvm-dev at lists.llvm.org
Fri Oct 2 16:09:20 PDT 2015
The test case in the bug report exposes at least one problem, but it's not
the presumed problem of spilling.
Reduced example based on the PR attachment:
define double @foo_calls_bar_4_times_and_sums_the_results() {
%a = call double @bar()
%b = call double @bar()
%t0 = fadd double %a, %b
%c = call double @bar()
%t1 = fadd double %t0, %c
%d = call double @bar()
%t2 = fadd double %t1, %d
ret double %t2
}
I don't think we're ever going to induce any extra spilling in a case like
this. The default (any?) x86-64 ABI requires spilling because no SSE
registers are preserved across function calls. So we get 3 spills
regardless of any reassociation of the adds:
$ ./llc -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=avx -o -
25016.ll
callq bar
vmovsd %xmm0, (%rsp) # 8-byte Spill
callq bar
vaddsd (%rsp), %xmm0, %xmm0 # 8-byte Folded Reload
vmovsd %xmm0, (%rsp) # 8-byte Spill
callq bar
vaddsd (%rsp), %xmm0, %xmm0 # 8-byte Folded Reload
vmovsd %xmm0, (%rsp) # 8-byte Spill
callq bar
vaddsd (%rsp), %xmm0, %xmm0 # 8-byte Folded Reload
If we enable reassociation via -enable-unsafe-fp-math, we still have 3
spills:
callq bar
vmovsd %xmm0, 16(%rsp) # 8-byte Spill
callq bar
vmovsd %xmm0, 8(%rsp) # 8-byte Spill
callq bar
vaddsd 8(%rsp), %xmm0, %xmm0 # 8-byte Folded Reload
vmovsd %xmm0, 8(%rsp) # 8-byte Spill
callq bar
vaddsd 8(%rsp), %xmm0, %xmm0 # 8-byte Folded Reload
vaddsd 16(%rsp), %xmm0, %xmm0 # 8-byte Folded Reload
This looks like what is described in the original problem: the adds got
reassociated for no benefit (and possibly some harm, although it may be
out-of-scope for the MachineCombiner pass).
We wanted to add the results of the first 2 function calls, add the results
of the last 2 function calls, and then add those 2 results to reduce the
critical path. Instead, we got:
((b + c) + d) + a
This shows that either the cost calculation in the MachineCombiner is wrong
or the results coming back from MachineTraceMetrics are wrong. Or maybe
MachineCombiner should be bailing out of a situation like this in the first
place - are we even allowed to move instructions around those function
calls?
Here's where it gets worse - if the adds are already arranged to reduce the
critical path:
define double @foo4_reassociated() {
%a = call double @bar()
%b = call double @bar()
%c = call double @bar()
%d = call double @bar()
%t0 = fadd double %a, %b
%t1 = fadd double %c, %d
%t2 = fadd double %t0, %t1
ret double %t2
}
The MachineCombiner is *increasing* the critical path by reassociating the
operands:
callq bar
vmovsd %xmm0, 16(%rsp) # 8-byte Spill
callq bar
vmovsd %xmm0, 8(%rsp) # 8-byte Spill
callq bar
vmovsd %xmm0, (%rsp) # 8-byte Spill
callq bar
vaddsd (%rsp), %xmm0, %xmm0 # 8-byte Folded Reload
vaddsd 8(%rsp), %xmm0, %xmm0 # 8-byte Folded Reload
vaddsd 16(%rsp), %xmm0, %xmm0 # 8-byte Folded Reload
(a + b) + (c + d) --> ((d + c) + b) + a
I think this is a problem calculating and/or using the "instruction slack"
in MachineTraceMetrics.
On Fri, Oct 2, 2015 at 11:40 AM, Gerolf Hoflehner <ghoflehner at apple.com>
wrote:
> This conflict is with many optimizations incl. copy prop, coalescing,
> hoisting etc. Each could increase register pressure and with similar
> impact. Attempts to control the register pressure locally (within an
> optimization pass) tend to get hard to tune and maintain. Would it be a
> better way to describe eg in metadata how to undo an optimization?
> Optimizations that attempt to reduce pressure like splitting or remat could
> be hooked up and call an undo routine based on a cost model.
>
> I think there is time to do something longer term. This particular
> instance can only be an issue under -fast-math.
>
> Cheers
> Gerolf
>
> On Oct 1, 2015, at 9:27 AM, Sanjay Patel via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Hi Haicheng,
>
> We need to prevent the transform if it causes spilling, but I'm not sure
> yet what mechanism/heuristic we can use to do that.
> Can you file a bug report with a reduced test case?
>
> Thanks!
>
> On Thu, Oct 1, 2015 at 9:10 AM, Haicheng Wu <haicheng at codeaurora.com>
> wrote:
>
>> Hi Sanjay,
>>
>>
>>
>> I observed some extra register spills when applying the reassociation
>> pass on spec2006 benchmarks and I would like to listen to your advice.
>>
>>
>>
>> For example, function get_new_point_on_quad() of tria_boundary.cc in
>> spec2006/dealII has a sequences of code like this
>>
>>
>>
>> …
>>
>> X=a+b
>>
>> …
>>
>> Y=X+c
>>
>> …
>>
>> Z=Y+d
>>
>> …
>>
>>
>>
>> There are many other instructions between these float adds. The
>> reassociation pass first swaps a and c when checking the second add, and
>> then swaps a and d when checking the third add. The transformed code looks
>> like
>>
>>
>>
>> …
>>
>> X=c+b
>>
>> …
>>
>> Y=X+d
>>
>> …
>>
>> Z=Y+a
>>
>>
>>
>> a is pushed all the way down to the bottom and its live range is much
>> larger now.
>>
>>
>>
>> Best,
>>
>>
>>
>> Haicheng
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151002/4e897ba4/attachment.html>
More information about the llvm-dev
mailing list