[llvm-dev] Register Spill Caused by the Reassociation pass

Fri Oct 2 16:09:20 PDT 2015

The test case in the bug report exposes at least one problem, but it's not
the presumed problem of spilling.

Reduced example based on the PR attachment:

  define double @foo_calls_bar_4_times_and_sums_the_results() {
   %a = call double @bar()
   %b = call double @bar()
   %t0 = fadd double %a, %b
   %c = call double @bar()
   %t1 = fadd double %t0, %c
   %d = call double @bar()
   %t2 = fadd double %t1, %d
   ret double %t2
 }

I don't think we're ever going to induce any extra spilling in a case like
this. The default (any?) x86-64 ABI requires spilling because no SSE
registers are preserved across function calls. So we get 3 spills
regardless of any reassociation of the adds:

$ ./llc -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=avx -o -
25016.ll
    callq    bar
    vmovsd    %xmm0, (%rsp)           # 8-byte Spill
    callq    bar
    vaddsd    (%rsp), %xmm0, %xmm0    # 8-byte Folded Reload
    vmovsd    %xmm0, (%rsp)           # 8-byte Spill
    callq    bar
    vaddsd    (%rsp), %xmm0, %xmm0    # 8-byte Folded Reload
    vmovsd    %xmm0, (%rsp)           # 8-byte Spill
    callq    bar
    vaddsd    (%rsp), %xmm0, %xmm0    # 8-byte Folded Reload

If we enable reassociation via -enable-unsafe-fp-math, we still have 3
spills:

    callq    bar
    vmovsd    %xmm0, 16(%rsp)         # 8-byte Spill
    callq    bar
    vmovsd    %xmm0, 8(%rsp)          # 8-byte Spill
    callq    bar
    vaddsd    8(%rsp), %xmm0, %xmm0   # 8-byte Folded Reload
    vmovsd    %xmm0, 8(%rsp)          # 8-byte Spill
    callq    bar
    vaddsd    8(%rsp), %xmm0, %xmm0   # 8-byte Folded Reload
    vaddsd    16(%rsp), %xmm0, %xmm0  # 8-byte Folded Reload

This looks like what is described in the original problem: the adds got
reassociated for no benefit (and possibly some harm, although it may be
out-of-scope for the MachineCombiner pass).

We wanted to add the results of the first 2 function calls, add the results
of the last 2 function calls, and then add those 2 results to reduce the
critical path. Instead, we got:

((b + c) + d) + a

This shows that either the cost calculation in the MachineCombiner is wrong
or the results coming back from MachineTraceMetrics are wrong. Or maybe
MachineCombiner should be bailing out of a situation like this in the first
place - are we even allowed to move instructions around those function
calls?

Here's where it gets worse - if the adds are already arranged to reduce the
critical path:

  define double @foo4_reassociated() {
    %a = call double @bar()
    %b = call double @bar()
    %c = call double @bar()
    %d = call double @bar()
    %t0 = fadd double %a, %b
    %t1 = fadd double %c, %d
    %t2 = fadd double %t0, %t1
    ret double %t2
  }

The MachineCombiner is *increasing* the critical path by reassociating the
operands:

    callq    bar
    vmovsd    %xmm0, 16(%rsp)         # 8-byte Spill
    callq    bar
    vmovsd    %xmm0, 8(%rsp)          # 8-byte Spill
    callq    bar
    vmovsd    %xmm0, (%rsp)           # 8-byte Spill
    callq    bar
    vaddsd    (%rsp), %xmm0, %xmm0    # 8-byte Folded Reload
    vaddsd    8(%rsp), %xmm0, %xmm0   # 8-byte Folded Reload
    vaddsd    16(%rsp), %xmm0, %xmm0  # 8-byte Folded Reload

(a + b) + (c + d) --> ((d + c) + b) + a

I think this is a problem calculating and/or using the "instruction slack"
in MachineTraceMetrics.

On Fri, Oct 2, 2015 at 11:40 AM, Gerolf Hoflehner <ghoflehner at apple.com>
wrote:

> This conflict is with many optimizations incl. copy prop, coalescing,
> hoisting etc. Each could increase register pressure and with similar
> impact.  Attempts to control the register pressure locally (within an
> optimization pass) tend to get hard to tune and maintain. Would it be a
> better way to describe eg in metadata how to undo an optimization?
> Optimizations that attempt to reduce pressure like splitting or remat could
> be hooked up and call an undo routine based on a cost model.
>
> I think there is time to do something longer term. This particular
> instance can only be an issue under -fast-math.
>
> Cheers
> Gerolf
>
> On Oct 1, 2015, at 9:27 AM, Sanjay Patel via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Hi Haicheng,
>
> We need to prevent the transform if it causes spilling, but I'm not sure
> yet what mechanism/heuristic we can use to do that.
> Can you file a bug report with a reduced test case?
>
> Thanks!
>
> On Thu, Oct 1, 2015 at 9:10 AM, Haicheng Wu <haicheng at codeaurora.com>
> wrote:
>
>> Hi Sanjay,
>>
>>
>>
>> I observed some extra register spills when applying the reassociation
>> pass on spec2006 benchmarks and I would like to listen to your advice.
>>
>>
>>
>> For example, function get_new_point_on_quad() of tria_boundary.cc in
>> spec2006/dealII has a sequences of code like this
>>
>>
>>
>> …
>>
>> X=a+b
>>
>> …
>>
>> Y=X+c
>>
>> …
>>
>> Z=Y+d
>>
>> …
>>
>>
>>
>> There are many other instructions between these float adds.  The
>> reassociation pass first swaps a and c when checking the second add, and
>> then swaps a and d when checking the third add.  The transformed code looks
>> like
>>
>>
>>
>> …
>>
>> X=c+b
>>
>> …
>>
>> Y=X+d
>>
>> …
>>
>> Z=Y+a
>>
>>
>>
>> a is pushed all the way down to the bottom and its live range is much
>> larger now.
>>
>>
>>
>> Best,
>>
>>
>>
>> Haicheng
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151002/4e897ba4/attachment.html>