[PATCH] D24216: AMDGPU: Partially fix control flow at -O0

Thu Sep 8 12:49:49 PDT 2016

arsenm added inline comments.

================
Comment at: lib/Target/AMDGPU/AMDGPUTargetMachine.cpp:561-564
@@ -560,1 +560,6 @@
+
+  // This must be run immediately after phi elimination and before
+  // TwoAddressInstructions, otherwise the processing of the tied operand of
+  // SI_ELSE will introduce a copy of the tied operand source after the else.
+  insertPass(&PHIEliminationID, &SILowerControlFlowID, false);
 
----------------
nhaehnle wrote:
> Since the pass now actually runs even before machine scheduling, wouldn't it be better to add this elsewhere, e.g. addPreRegAlloc? That would also eliminate the duplicate code below.
No, this needs to be run as part of regalloc to be post-SSA. It's easier to think about the two reg alloc paths as completely different

================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:853-871
@@ -852,4 +852,21 @@
   default: return AMDGPUInstrInfo::expandPostRAPseudo(MI);
-
+  case AMDGPU::S_MOV_B64_term: {
+    // This is only a terminator to get the correct spill code placement during
+    // register allocation.
+    MI.setDesc(get(AMDGPU::S_MOV_B64));
+    break;
+  }
+  case AMDGPU::S_XOR_B64_term: {
+    // This is only a terminator to get the correct spill code placement during
+    // register allocation.
+    MI.setDesc(get(AMDGPU::S_XOR_B64));
+    break;
+  }
+  case AMDGPU::S_ANDN2_B64_term: {
+    // This is only a terminator to get the correct spill code placement during
+    // register allocation.
+    MI.setDesc(get(AMDGPU::S_ANDN2_B64));
+    break;
+  }
   case AMDGPU::V_MOV_B64_PSEUDO: {
     unsigned Dst = MI.getOperand(0).getReg();
----------------
nhaehnle wrote:
> This looks redundant with what's in the exec-optimize pass.
It's not, these are still required in the no-opt cases when the pass doesn't run

================
Comment at: lib/Target/AMDGPU/SILowerControlFlow.cpp:212-216
@@ -169,1 +211,7 @@
 
+  // We are running before TwoAddressInstructions, and si_else's operands are
+  // tied. In order to correctly tie the registers, split this into a copy of
+  // the src like it does.
+  BuildMI(MBB, Start, DL, TII->get(AMDGPU::COPY), DstReg)
+    .addOperand(MI.getOperand(1)); // Saved EXEC
+
----------------
nhaehnle wrote:
> What is the reason for SI_ELSE's operands being tied? I think I remember why they were tied while lowering happened after regalloc: it was a trick to make sure the live ranges of the register after lowering were subsets of the live ranges before lowering.
> 
> But now that the lowering happens before regalloc, that seems unnecessary. Am I missing something?
I'm not sure, this was just preserving what it already did


https://reviews.llvm.org/D24216