[PATCH] D65354: [X86] Let MachineCombiner reassociate adds for ILP

Fri Aug 2 09:50:52 PDT 2019

andreadb added a comment.

In D65354#1611323 <https://reviews.llvm.org/D65354#1611323>, @reames wrote:

> Skimming through the RegisterPressure class and the approach MachineLICM uses, the mechanics of adding regrester pressure tracking to MachineCombiner don't seem too bad.  I'm fairly confident we can do so without too much work, though I haven't fully worked it through much less written the code, so that might be a gotcha.  Thanks everyone for the pointers.
>
> The question left is what would we use the register pressure *for*?  As in, what heuristics would make sense to impose the machine combiner?
>
> Should we only do transforms which don't increase register pressure?  Only increase it by an amount under the register class limit?  The later is tempting, but then we might inhibit other transforms which also want to increase register pressure. What's the right tradeoff?  I think register pressure preserving is probably too restrictive, but I'm not sure where the sweat spot is.

I agree that preserving register pressure is probably too restrictive. Using register classes to limit register pressure is a good idea. Not sure if the cost heuristic should also account for callee saved registers though. Ideally, the machine scheduler that runs before regalloc could try hard(er) to reorder instructions for register pressure. That would potentially give to the machine combiner a bit more slack.

We could also use information from the scheduling models (when available) to further limit reassociation.
Let say that we have a chain of vector adds. If we know that the target processor has only two vector ALU pipes, then we can reorganize the expression tree so that there are always at most two parallel adds. More parallelism would not be exploited in practice, and it would only lead to a potential increase in register pressure for no good reason.

> Additionally, if we have many combine opportunities within a BB, and running all of them would exceed our reg pressure limit, how do we prioritize which ones to perform?  Is a simple greedy algorithm from the front of the block "good enough"?

Good question. Unfortunately I don't have a good answer...
I suggest to start experimenting with a greedy algorithm first.

> This is outside my practical experience, so I'd welcome input anyone might have.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65354/new/

https://reviews.llvm.org/D65354