[PATCH] D24216: AMDGPU: Partially fix control flow at -O0

Fri Sep 9 01:21:57 PDT 2016

nhaehnle added inline comments.

================
Comment at: lib/Target/AMDGPU/AMDGPUTargetMachine.cpp:561-564
@@ -560,1 +560,6 @@
+
+  // This must be run immediately after phi elimination and before
+  // TwoAddressInstructions, otherwise the processing of the tied operand of
+  // SI_ELSE will introduce a copy of the tied operand source after the else.
+  insertPass(&PHIEliminationID, &SILowerControlFlowID, false);
 
----------------
arsenm wrote:
> nhaehnle wrote:
> > Since the pass now actually runs even before machine scheduling, wouldn't it be better to add this elsewhere, e.g. addPreRegAlloc? That would also eliminate the duplicate code below.
> No, this needs to be run as part of regalloc to be post-SSA. It's easier to think about the two reg alloc paths as completely different
What gets me is that this is before machine scheduling now, so do people think about machine scheduling as part of register allocation as well? Anyway, it's not a big deal.

================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:853-871
@@ -852,4 +852,21 @@
   default: return AMDGPUInstrInfo::expandPostRAPseudo(MI);
-
+  case AMDGPU::S_MOV_B64_term: {
+    // This is only a terminator to get the correct spill code placement during
+    // register allocation.
+    MI.setDesc(get(AMDGPU::S_MOV_B64));
+    break;
+  }
+  case AMDGPU::S_XOR_B64_term: {
+    // This is only a terminator to get the correct spill code placement during
+    // register allocation.
+    MI.setDesc(get(AMDGPU::S_XOR_B64));
+    break;
+  }
+  case AMDGPU::S_ANDN2_B64_term: {
+    // This is only a terminator to get the correct spill code placement during
+    // register allocation.
+    MI.setDesc(get(AMDGPU::S_ANDN2_B64));
+    break;
+  }
   case AMDGPU::V_MOV_B64_PSEUDO: {
     unsigned Dst = MI.getOperand(0).getReg();
----------------
arsenm wrote:
> nhaehnle wrote:
> > This looks redundant with what's in the exec-optimize pass.
> It's not, these are still required in the no-opt cases when the pass doesn't run
Ah yes, that makes sense. Maybe the code can still be shared somehow? But I have to admit that I don't see a great way to do so...

================
Comment at: lib/Target/AMDGPU/SILowerControlFlow.cpp:212-216
@@ -169,1 +211,7 @@
 
+  // We are running before TwoAddressInstructions, and si_else's operands are
+  // tied. In order to correctly tie the registers, split this into a copy of
+  // the src like it does.
+  BuildMI(MBB, Start, DL, TII->get(AMDGPU::COPY), DstReg)
+    .addOperand(MI.getOperand(1)); // Saved EXEC
+
----------------
arsenm wrote:
> nhaehnle wrote:
> > What is the reason for SI_ELSE's operands being tied? I think I remember why they were tied while lowering happened after regalloc: it was a trick to make sure the live ranges of the register after lowering were subsets of the live ranges before lowering.
> > 
> > But now that the lowering happens before regalloc, that seems unnecessary. Am I missing something?
> I'm not sure, this was just preserving what it already did
Logically speaking, I think SI_ELSE should no longer use tied operands. Does anything break if the tie is removed?


https://reviews.llvm.org/D24216