[PATCH] D65354: [X86] Let MachineCombiner reassociate adds for ILP

Thu Aug 15 12:07:05 PDT 2019

fhahn added a comment.

In D65354#1612516 <https://reviews.llvm.org/D65354#1612516>, @andreadb wrote:

> In D65354#1611323 <https://reviews.llvm.org/D65354#1611323>, @reames wrote:
>
> > Skimming through the RegisterPressure class and the approach MachineLICM uses, the mechanics of adding regrester pressure tracking to MachineCombiner don't seem too bad.  I'm fairly confident we can do so without too much work, though I haven't fully worked it through much less written the code, so that might be a gotcha.  Thanks everyone for the pointers.
> >
> > The question left is what would we use the register pressure *for*?  As in, what heuristics would make sense to impose the machine combiner?
> >
> > Should we only do transforms which don't increase register pressure?  Only increase it by an amount under the register class limit?  The later is tempting, but then we might inhibit other transforms which also want to increase register pressure. What's the right tradeoff?  I think register pressure preserving is probably too restrictive, but I'm not sure where the sweat spot is.
>
>
> I agree that preserving register pressure is probably too restrictive. Using register classes to limit register pressure is a good idea. Not sure if the cost heuristic should also account for callee saved registers though. Ideally, the machine scheduler that runs before regalloc could try hard(er) to reorder instructions for register pressure. That would potentially give to the machine combiner a bit more slack.

The machine scheduler already schedules quite aggressively for register pressure.

I think we start out with restricting the combiner to substitutions that do not increase register pressure. We can then loosen it and tune the thresholds.

> We could also use information from the scheduling models (when available) to further limit reassociation.
>  Let say that we have a chain of vector adds. If we know that the target processor has only two vector ALU pipes, then we can reorganize the expression tree so that there are always at most two parallel adds. More parallelism would not be exploited in practice, and it would only lead to a potential increase in register pressure for no good reason.
> 
>> Additionally, if we have many combine opportunities within a BB, and running all of them would exceed our reg pressure limit, how do we prioritize which ones to perform?  Is a simple greedy algorithm from the front of the block "good enough"?
> 
> Good question. Unfortunately I don't have a good answer...
>  I suggest to start experimenting with a greedy algorithm first.

Initially a greedy approach should be fine: choose applicable candidates, until we exceed a threshold. If we start with limiting us to substitutions that do not increase pressure first, this should not be an issue at all. Doing anything more will require some additional re-structuring of the code to compute the set of candidates and rank them first, instead of applying them directly.

I think the case we want to try hard to avoid is doing combines that increase register pressure outside a loop, which in turn cause spills in a loop body. The machine scheduler cannot do much about such cases unfortunately.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65354/new/

https://reviews.llvm.org/D65354