[PATCH] D42838: [AMDGPU] added writelane intrinsic

Wed Feb 21 09:10:29 PST 2018

nhaehnle added a comment.

I think there should be two more tests: one where the `vdst_in` comes from a uniform value in an SGPR (e.g. a shader where it comes from an inreg) and one where it's a constant.

================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:3155-3178
+  if (Opc == AMDGPU::V_WRITELANE_B32) {
+    int Src0Idx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::src0);
+    MachineOperand &Src0 = MI.getOperand(Src0Idx);
+    const DebugLoc &DL = MI.getDebugLoc();
+    if (Src0.isReg() && RI.isVGPR(MRI, Src0.getReg())) {
+      unsigned Reg = MRI.createVirtualRegister(&AMDGPU::SReg_32_XM0RegClass);
+      BuildMI(*MI.getParent(), MI, DL, get(AMDGPU::V_READFIRSTLANE_B32), Reg)
----------------
arsenm wrote:
> tpr wrote:
> > arsenm wrote:
> > > All of this code should be unnecessary. The instruction is already broken if it gets here. VGPRs are not allowed in src0/src1, and required for src2. The selection pattern should already be respecting these constraints. Since you're using a straightforward tablegen pattern, there shouldn't be an issue. The main purpose of the legalizeOperands* functions is to handle legalizing operand constraints that require looking at the entire instruction, and not just a single operand (i.e. the constant bus restriction)
> > Hi Matt. I tried removing this code, and the verifier falls over due to src1 being a vgpr_32 in the test I am trying. Also I see that readlane has analogous special case code here in legalizeOperandsVOP2.
> The src2 verification shouldn't be necessary.
> 
> The readlane handling is surprising to me, I would expect we would need to handle this case by waterfalling
For context, the GLSL readlane intrinsic is defined to require a uniform lane selection. So, like other intrinsics which have the same requirement, not doing automatic waterfalling is the right thing to do here.

================
Comment at: lib/Target/AMDGPU/SIRegisterInfo.cpp:748
       BuildMI(*MBB, MI, DL,
-              TII->getMCOpcodeFromPseudo(AMDGPU::V_WRITELANE_B32),
+              TII->get(AMDGPU::V_WRITELANE_B32),
               Spill.VGPR)
----------------
tpr wrote:
> arsenm wrote:
> > dstuttard wrote:
> > > I notice that you've left this as the pseudo whereas the original lowered to an actual MCOpcode - I guess it's more consistent to leave that final lowering until later.
> > There is a reason for this, but I don't remember what it was exactly. I think there was some related change between SI and VI. I think you should leave this as a separate change. I know I've tried to fix this before and hit some issue.
> Yes; gfx6 and gfx7 have vop2 and gfx8 has vop3. But I saw that my new use of writelane does not lower the pseudo until later, and my testing with this change (with llpc only, not other users of the backend) passed. But good idea to make it a separate change.  
What is actually breaking without the RegState::Undef?

When we allocate a VGPR for spilling, we add it as LiveIn to all basic blocks, and the V_READLANE_B32 generated for restoring a spilled SGPR should never set the VGPR to killed. So I don't see where the machine verifier would complain. An example would be good.

Repository:
  rL LLVM

https://reviews.llvm.org/D42838