[PATCH] D35524: [AMDGPU] Add support for Whole Wavefront Mode

Wed Aug 2 11:28:52 PDT 2017

cwabbott added inline comments.

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:3300-3301
+    SDValue Src = Op.getOperand(1);
+    return SDValue(DAG.getMachineNode(AMDGPU::WWM, DL, Src.getValueType(), Src),
+                   0);
+  }
----------------
arsenm wrote:
> Shouldn't this have a chain?
No, because it's really just a copy instruction that happens to also enable WWM for its sources -- it doesn't have any side-effects.

================
Comment at: lib/Target/AMDGPU/SIWholeQuadMode.cpp:619-625
+  assert(SaveOrig);
+  MI = BuildMI(MBB, Before, DebugLoc(), TII->get(AMDGPU::COPY), SaveOrig)
+           .addReg(AMDGPU::EXEC);
+  LIS->InsertMachineInstrInMaps(*MI);
+  MI = BuildMI(MBB, Before, DebugLoc(), TII->get(AMDGPU::S_MOV_B64), AMDGPU::EXEC)
+           .addImm(-1);
+  LIS->InsertMachineInstrInMaps(*MI);
----------------
arsenm wrote:
> cwabbott wrote:
> > nhaehnle wrote:
> > > This could be an S_OR_SAVEEXEC_B64.
> > This is a little tricky, since S_OR_SAVEEXEC_B64 clobbers SCC while the S_MOV_B64 doesn't, so we need to be more careful about where we put it. But I didn't like how I was handling prepareInsertion() in the face of WWM anyways,  so I've refactored it to make using S_OR_SAVEEXEC_B64 possible.
> I think this pass should probably not be producing the save exec instructions directly. We re-form these later and this may be a problem with -O0 
The existing pass already uses S_AND_SAVEEXEC_B64, so I don't see a reason not to use it here; I was just trying to be consistent. If we want to let that pass to handle it, it should probably be a separate change. Also, because of that, any problem with -O0 is probably an already-existing bug (although I doubt this pass got much testing with -O0 before), so I don't want to block anything on that. I can try adding an -O0 line to the test though.

================
Comment at: lib/Target/AMDGPU/SIWholeQuadMode.cpp:713
 bool SIWholeQuadMode::runOnMachineFunction(MachineFunction &MF) {
-  if (MF.getFunction()->getCallingConv() != CallingConv::AMDGPU_PS)
-    return false;
----------------
arsenm wrote:
> Why remove this? It should definitely exit early for compute
Because OpenGL compute shaders (and geometry shaders, and tesselation shaders etc.) can all use WWM for doing non-uniform wavefront reductions, so we at least need to scan the program for WWM instructions. ROCm will probably want it at some point too.

https://reviews.llvm.org/D35524