[PATCH] D65354: [X86] Let MachineCombiner reassociate adds for ILP

Tue Jul 30 14:32:57 PDT 2019

craig.topper added a comment.

In D65354#1607205 <https://reviews.llvm.org/D65354#1607205>, @spatel wrote:

> In D65354#1606794 <https://reviews.llvm.org/D65354#1606794>, @reames wrote:
>
> > In D65354#1606210 <https://reviews.llvm.org/D65354#1606210>, @spatel wrote:
> >
> > > IIRC, MachineCombiner has the potential to cause spills (it runs without real knowledge of register pressure)...and if we spill, then we have likely killed all hope for a perf improvement due to better throughput of math/logic. This is true for any reassociable opcode, but x86 scalar 'add' has more potential chance for trouble due to prevalence and so few available registers ( (as seen by the number of diffs here?).
> > >
> > > So it's correct that there was no principled reason to leave 'add' out, but the practical concern was spilling. I have not looked at the test diffs here, but if there's evidence of extra instructions, then we're probably going to see perf regressions rather than improvements?
> >
> >
> > I hear the concern here, but not reassociating on adds also means we miss some semi major performance opportunities.  Any suggestions on how to break the impasse with a tractable amount of work?  As in, any suggestions on how to implement a reasonable register pressure heuristic in MachineCombiner?
>
>
> I have no expertise at this level of codegen, so it's probably better to raise this question on llvm-dev for a more informed answer. I see some facility for estimating pressure used in MachineLICM before register allocation:
>
>   // Estimate register pressure during pre-regalloc pass.
>
> ...but I have no idea how much work it would take to wire that up within MachineCombiner.
>
> Before taking on that effort, I think we should ask: what kind of perf gains could we expect if MachineCombiner was working optimally on these patterns? Maybe we can hack a limited solution and get some stats?
>
> I could be completely wrong, but it seems unlikely to me that any recent OOO x86 would benefit much from reassociating scalar adds before it ran out of GPRs.

But we're missing out on building trees to avoid serial dependency chains. With trees we can do adds in parallel.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65354/new/

https://reviews.llvm.org/D65354