[PATCH] D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence.

Mon Apr 8 08:10:53 PDT 2019

alex-t added a comment.

In D59990#1457613 <https://reviews.llvm.org/D59990#1457613>, @nhaehnle wrote:

>

> After a bit of reflection, I think that in part what's happening here is that the uniform/divergent axis and the register bank axis is getting confused. See @efriedma's question and my comment on `isDivergentRegClass`. I wonder if some parts of this change would not be better expressed in terms of register banks. For example, in the uses of `isDivergentRegClass`, aren't we really looking for another register class in the same bank (SGPR vs. VGPR)? Similarly, maybe `requiresUniformRegister` can be rephrased as returning a required register bank (and nullptr by default)?
> 
> Why do we still need to move PHIs to VALU after this change? (Looking at SIFixSGPRCopies) Shouldn't the PHI be selected with the correct register class already? What's an example where this doesn't happen?

Trying to answer all the above now...    The reason for `isDivergentRegClass` hook and moving PHIs to VALU is the same.
Divergence driven ISel would work fine in case we really have SALU alternative for every VALU instruction. Unfortunately we have no.
This leads to insertion unnecessary moves and v_readfirstlane around the naturally uniform code.

Let's imagine we have a uniform loop that multiplies and adds floating point array elements.
We only have VALU floating point v_fmad  that accepts VGPRs and produce VGPR. Yes all the values are uniform but they still need to be in VGPRs!
If we keep PHI in the loop header uniform we would end up moving SGPRs to VGPRs before v_fma and v_readfirstlane the resulting VGPR to just make it go round the loop to be moved back to VGPR.

So, we need some interface to query if the given register class is considered uniform or divergent in the given target.

Look how we use it:

  const TargetRegisterClass *RC =
    TRI->getAllocatableClass(TII->getRegClass(II, i, TRI, *MF));

  if (i < NumResults && TLI->isTypeLegal(Node->getSimpleValueType(i))) {
    const TargetRegisterClass *VTRC = TLI->getRegClassFor(
        Node->getSimpleValueType(i),
        Node->isDivergent() || (RC && TRI->isDivergentRegClass(RC)));

RC above - register class retrieved from the instruction description. That means we must have VGPR operand in this concrete position.
Thus, the condition in "getRegClassFor" is really "if SDNode is divergent itself or we have to assign VGPR because we only have VALU form of the instruction".

Same for the PHIs.  By the point SIFixSGPRCopies works we already selected everything according the divergence. 
If we have uniform PHI with VGPR input or the uniform PHIs user requires VGPR that means we just have no SALU form for the defining instruction or the user instruction. 
In this case we have to convert PHI back to VALU.

I understand that my solution looking disgusting.  And yes I'm thinking of further changing the getRegClassFor interface to incorporate all the target related hacks to the target specific code.
The main problem is that in several places in LLVM core code the getRegClassFor is called from the context that only have type or register class but has no Value.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59990/new/

https://reviews.llvm.org/D59990