<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">The machine combiner does not see spills. Perhaps there is a phase ordering issue. From the analysis here I don’t see an explanation for a performance loss (the potential increase in register pressure did make sense to me, though). <div class=""><br class=""></div><div class="">-Gerolf</div><div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Oct 2, 2015, at 4:09 PM, Sanjay Patel <<a href="mailto:spatel@rotateright.com" class="">spatel@rotateright.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class=""><div class="">The test case in the bug report exposes at least one problem, but it's not the presumed problem of spilling.<br class=""><br class=""></div>Reduced example based on the PR attachment:<br class=""><br class=""> define double @foo_calls_bar_4_times_and_sums_the_results() {<br class=""> %a = call double @bar()<br class=""> %b = call double @bar()<br class=""> %t0 = fadd double %a, %b<br class=""> %c = call double @bar()<br class=""> %t1 = fadd double %t0, %c<br class=""> %d = call double @bar()<br class=""> %t2 = fadd double %t1, %d<br class=""> ret double %t2<br class=""> }<br class=""><br class=""></div>I don't think we're ever going to induce any extra spilling in a case like this. The default (any?) x86-64 ABI requires spilling because no SSE registers are preserved across function calls. So we get 3 spills regardless of any reassociation of the adds:<br class=""><br class="">$ ./llc -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=avx -o - 25016.ll<br class=""><div class=""> callq bar<br class=""> vmovsd %xmm0, (%rsp) # 8-byte Spill<br class=""> callq bar<br class=""> vaddsd (%rsp), %xmm0, %xmm0 # 8-byte Folded Reload<br class=""> vmovsd %xmm0, (%rsp) # 8-byte Spill<br class=""> callq bar<br class=""> vaddsd (%rsp), %xmm0, %xmm0 # 8-byte Folded Reload<br class=""> vmovsd %xmm0, (%rsp) # 8-byte Spill<br class=""> callq bar<br class=""> vaddsd (%rsp), %xmm0, %xmm0 # 8-byte Folded Reload<br class=""><br class=""><br class=""></div><div class="">If we enable reassociation via -enable-unsafe-fp-math, we still have 3 spills:<br class=""><br class=""> callq bar<br class=""> vmovsd %xmm0, 16(%rsp) # 8-byte Spill<br class=""> callq bar<br class=""> vmovsd %xmm0, 8(%rsp) # 8-byte Spill<br class=""> callq bar<br class=""> vaddsd 8(%rsp), %xmm0, %xmm0 # 8-byte Folded Reload<br class=""> vmovsd %xmm0, 8(%rsp) # 8-byte Spill<br class=""> callq bar<br class=""> vaddsd 8(%rsp), %xmm0, %xmm0 # 8-byte Folded Reload<br class=""> vaddsd 16(%rsp), %xmm0, %xmm0 # 8-byte Folded Reload<br class=""><br class=""></div><div class="">This looks like what is described in the original problem: the adds got reassociated for no benefit (and possibly some harm, although it may be out-of-scope for the MachineCombiner pass). <br class=""><br class="">We wanted to add the results of the first 2 function calls, add the results of the last 2 function calls, and then add those 2 results to reduce the critical path. Instead, we got:<br class=""><br class="">((b + c) + d) + a <br class=""></div><div class=""><br class=""></div><div class="">This shows that either the cost calculation in the MachineCombiner is wrong or the results coming back from MachineTraceMetrics are wrong. Or maybe MachineCombiner should be bailing out of a situation like this in the first place - are we even allowed to move instructions around those function calls?<br class=""><br class=""></div><div class="">Here's where it gets worse - if the adds are already arranged to reduce the critical path:<br class=""><br class=""> define double @foo4_reassociated() {<br class=""> %a = call double @bar()<br class=""> %b = call double @bar()<br class=""> %c = call double @bar()<br class=""> %d = call double @bar()<br class=""> %t0 = fadd double %a, %b<br class=""> %t1 = fadd double %c, %d<br class=""> %t2 = fadd double %t0, %t1<br class=""> ret double %t2<br class=""> }<br class=""><br class=""></div><div class="">The MachineCombiner is *increasing* the critical path by reassociating the operands:<br class=""><br class=""> callq bar<br class=""> vmovsd %xmm0, 16(%rsp) # 8-byte Spill<br class=""> callq bar<br class=""> vmovsd %xmm0, 8(%rsp) # 8-byte Spill<br class=""> callq bar<br class=""> vmovsd %xmm0, (%rsp) # 8-byte Spill<br class=""> callq bar<br class=""> vaddsd (%rsp), %xmm0, %xmm0 # 8-byte Folded Reload<br class=""> vaddsd 8(%rsp), %xmm0, %xmm0 # 8-byte Folded Reload<br class=""> vaddsd 16(%rsp), %xmm0, %xmm0 # 8-byte Folded Reload<br class=""><br class=""></div><div class="">(a + b) + (c + d) --> ((d + c) + b) + a<br class=""></div><div class=""><br class=""></div><div class="">I think this is a problem calculating and/or using the "instruction slack" in MachineTraceMetrics.<br class=""></div><div class=""><br class=""></div><div class="gmail_extra"><br class=""><div class="gmail_quote">On Fri, Oct 2, 2015 at 11:40 AM, Gerolf Hoflehner <span dir="ltr" class=""><<a href="mailto:ghoflehner@apple.com" target="_blank" class="">ghoflehner@apple.com</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="word-wrap:break-word" class="">This conflict is with many optimizations incl. copy prop, coalescing, hoisting etc. Each could increase register pressure and with similar impact. Attempts to control the register pressure locally (within an optimization pass) tend to get hard to tune and maintain. Would it be a better way to describe eg in metadata how to undo an optimization? Optimizations that attempt to reduce pressure like splitting or remat could be hooked up and call an undo routine based on a cost model.<div class=""><br class=""></div><div class="">I think there is time to do something longer term. This particular instance can only be an issue under -fast-math.</div><div class=""><br class=""></div><div class="">Cheers</div><div class="">Gerolf</div><div class=""><br class=""><div class=""><blockquote type="cite" class=""><div class=""><div class=""><div class="">On Oct 1, 2015, at 9:27 AM, Sanjay Patel via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.org</a>> wrote:</div><br class=""></div></div><div class=""><div class=""><div class=""><div dir="ltr" class=""><div class=""><div class=""><div class="">Hi Haicheng,<br class=""><br class=""></div>We need to prevent the transform if it causes spilling, but I'm not sure yet what mechanism/heuristic we can use to do that.<br class=""></div>Can you file a bug report with a reduced test case?<br class=""><br class=""></div>Thanks!<br class=""></div><div class="gmail_extra"><br class=""><div class="gmail_quote">On Thu, Oct 1, 2015 at 9:10 AM, Haicheng Wu <span dir="ltr" class=""><<a href="mailto:haicheng@codeaurora.com" target="_blank" class="">haicheng@codeaurora.com</a>></span> wrote:<br class=""><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div link="#0563C1" vlink="#954F72" lang="EN-US" class=""><div class=""><p class="MsoNormal">Hi Sanjay,<u class=""></u><u class=""></u></p><p class="MsoNormal"><u class=""></u> <u class=""></u></p><p class="MsoNormal">I observed some extra register spills when applying the reassociation pass on spec2006 benchmarks and I would like to listen to your advice. <u class=""></u><u class=""></u></p><p class="MsoNormal"><u class=""></u> <u class=""></u></p><p class="MsoNormal">For example, function get_new_point_on_quad() of tria_boundary.cc in spec2006/dealII has a sequences of code like this<u class=""></u><u class=""></u></p><p class="MsoNormal"><u class=""></u> <u class=""></u></p><p class="MsoNormal">…<u class=""></u><u class=""></u></p><p class="MsoNormal">X=a+b<u class=""></u><u class=""></u></p><p class="MsoNormal">…<u class=""></u><u class=""></u></p><p class="MsoNormal">Y=X+c<u class=""></u><u class=""></u></p><p class="MsoNormal">…<u class=""></u><u class=""></u></p><p class="MsoNormal">Z=Y+d<u class=""></u><u class=""></u></p><p class="MsoNormal">…<u class=""></u><u class=""></u></p><p class="MsoNormal"><u class=""></u> <u class=""></u></p><p class="MsoNormal">There are many other instructions between these float adds. The reassociation pass first swaps a and c when checking the second add, and then swaps a and d when checking the third add. The transformed code looks like<u class=""></u><u class=""></u></p><p class="MsoNormal"><u class=""></u> <u class=""></u></p><p class="MsoNormal">…<u class=""></u><u class=""></u></p><p class="MsoNormal">X=c+b<u class=""></u><u class=""></u></p><p class="MsoNormal">…<u class=""></u><u class=""></u></p><p class="MsoNormal">Y=X+d<u class=""></u><u class=""></u></p><p class="MsoNormal">…<u class=""></u><u class=""></u></p><p class="MsoNormal">Z=Y+a<u class=""></u><u class=""></u></p><p class="MsoNormal"><u class=""></u> <u class=""></u></p><p class="MsoNormal">a is pushed all the way down to the bottom and its live range is much larger now. <u class=""></u><u class=""></u></p><p class="MsoNormal"><u class=""></u> <u class=""></u></p><p class="MsoNormal">Best,<span class=""><font color="#888888" class=""><u class=""></u><u class=""></u></font></span></p><span class=""><font color="#888888" class=""><p class="MsoNormal"><u class=""></u> <u class=""></u></p><p class="MsoNormal">Haicheng<u class=""></u><u class=""></u></p></font></span></div></div></blockquote></div><br class=""></div></div></div>
_______________________________________________<br class="">LLVM Developers mailing list<br class=""><a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.org</a><br class=""><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank" class="">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br class=""></div></blockquote></div><br class=""></div></div></blockquote></div><br class=""></div></div>
</div></blockquote></div><br class=""></div></body></html>