[PATCH] D53496: AMDGPU: Rewrite SILowerI1Copies to always stay on SALU

Nicolai Hähnle via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Oct 31 05:37:45 PDT 2018


nhaehnle added a comment.

In https://reviews.llvm.org/D53496#1277330, @alex-t wrote:

> It seems like we have to further develop this approach to deal with the scalar comparison instructions.
>  For instance, S_CMP_* does not produce any result but implicitly defines SCC. 
>  Thus, InstrEmitter will insert the copies all the time. 
>  Since DAG operator SETCC produces i1 value there will be the SCC to VReg_1 copies.
>  I not trying to invent a method to lower that copies.
>  First issue:  in case all the uses are not divergent I don't need the V_CND_MASK -1,0 -> V_CMP_NE 0 pair
>  I need S_CSELECT -1, 0 immediately after the definition  (to save SCC) and  S_CMP_NE 0 just before use to rematerialize SCC
>  Second issue: I only need to save/restore if there are SCC defs in between.
>  So, we need to take into account not divergent flow as well.


It's probably not worth trying to preserve SCC across basic block boundaries, at least not in the beginning, unless it happens to be easy. There are just too many SALU instructions that overwrite it -- basically all of them except for move-like instructions. So usually you'd just have an S_CSELECT.

Now if the value is uniform, you really technically only need an S_CSELECT_B32, though for compatibility you may have to use S_CSELECT_B64 instead so that the value is also a valid lane mask. Going with https://reviews.llvm.org/B64 all the time is easier, and has the same performance in terms of number of cycles, though obviously it has higher SGPR pressure. I don't know how difficult it'd be to distinguish those cases. In the worst case we might need an SReg_1 in addition to VReg_1. Haven't really thought about it though.



================
Comment at: lib/Target/AMDGPU/SILowerI1Copies.cpp:517
+      for (unsigned i = 1; i < MI.getNumOperands(); i += 2) {
+        assert(i + 1 < MI.getNumOperands());
+        unsigned IncomingReg = MI.getOperand(i).getReg();
----------------
alex-t wrote:
> How do you suppose this assert can be hit? Given well formed PHI node.
Yeah, that's why it's an assert. Admittedly not the most important one.


================
Comment at: lib/Target/AMDGPU/SILowerI1Copies.cpp:520
+        MachineBasicBlock *IncomingMBB = MI.getOperand(i + 1).getMBB();
+        MachineInstr *IncomingDef = MRI->getUniqueVRegDef(IncomingReg);
+
----------------
alex-t wrote:
> Are you sure that IncomingDef will never be null? What if IncomingReg is the subreg?
> 
Neither of those should ever happen since the input is in machine SSA after isel.


Repository:
  rL LLVM

https://reviews.llvm.org/D53496





More information about the llvm-commits mailing list