[PATCH] D59295: [AMDGPU] Pre-allocate WWM registers to reduce VGPR pressure.

Wed Mar 13 11:16:32 PDT 2019

cwabbott added a comment.

In D59295#1427734 <https://reviews.llvm.org/D59295#1427734>, @arsenm wrote:

> I really don't like introducing new, dynamically reserved registers for this. It's going to introduce hell for dealing with any kind of ABI, and reserved registers are generally a bad idea. There's also nothing guaranteeing there are any free registers available to reserve, since you are just grabbing totally unused ones. This is going to just hit some variant of the problem I've been working on solving for handling SGPR->VGPR spills. Can WWM code be moved into a bundle or something?

No, since the problem the current pass and this new pass are trying to solve affects register allocation for code that is arbitrarily far away from the original WWM sequence. For a more detailed explanation, I can't do any better than the comment at the start of SIFixWWMLiveness.cpp. (There will also be problems if RA decides to split a live interval inside a WWM sequence, which can be fixed by bundling it, but that's a completely different problem).

While it might seem dangerous, in practice this works out, since WWM sequences that the frontends currently emit only require a few registers. Hence this pass is guaranteed to succeed, even if there's very high register pressure. If allocating the registers needed for the WWM sequence fails and RA decides to spill something inside there, then you're pretty much toast anyways since the same invalid-lane-clobbering concerns would reappear.

One idea would be to add a way to tell RA that a certain live range absolutely cannot be split (and probably boost its priority as well, lest we fail to allocate it), pre-allocate one or more of these unsplittable registers for WWM, make every definition in the WWM sequence a partial definition, and add fake definitions of the WWM registers in the closest block with uniform control flow that dominates the WWM sequence in order to prevent definitions whose invalid lanes could be clobbered from using the WWM registers. This gives RA a little more flexibility and means that potentially some other operations could use the WWM registers, but you still basically wind up preallocating them.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59295/new/

https://reviews.llvm.org/D59295