[PATCH] D24216: AMDGPU: Partially fix control flow at -O0

Tue Sep 6 03:08:49 PDT 2016

nhaehnle added a comment.

Thanks for doing this. I need it to fix up https://reviews.llvm.org/D22195 for current trunk.


================
Comment at: lib/Target/AMDGPU/AMDGPUTargetMachine.cpp:561-564
@@ -560,1 +560,6 @@
+
+  // This must be run immediately after phi elimination and before
+  // TwoAddressInstructions, otherwise the processing of the tied operand of
+  // SI_ELSE will introduce a copy of the tied operand source after the else.
+  insertPass(&PHIEliminationID, &SILowerControlFlowID, false);
 
----------------
Since the pass now actually runs even before machine scheduling, wouldn't it be better to add this elsewhere, e.g. addPreRegAlloc? That would also eliminate the duplicate code below.

================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:853-871
@@ -852,4 +852,21 @@
   default: return AMDGPUInstrInfo::expandPostRAPseudo(MI);
-
+  case AMDGPU::S_MOV_B64_term: {
+    // This is only a terminator to get the correct spill code placement during
+    // register allocation.
+    MI.setDesc(get(AMDGPU::S_MOV_B64));
+    break;
+  }
+  case AMDGPU::S_XOR_B64_term: {
+    // This is only a terminator to get the correct spill code placement during
+    // register allocation.
+    MI.setDesc(get(AMDGPU::S_XOR_B64));
+    break;
+  }
+  case AMDGPU::S_ANDN2_B64_term: {
+    // This is only a terminator to get the correct spill code placement during
+    // register allocation.
+    MI.setDesc(get(AMDGPU::S_ANDN2_B64));
+    break;
+  }
   case AMDGPU::V_MOV_B64_PSEUDO: {
     unsigned Dst = MI.getOperand(0).getReg();
----------------
This looks redundant with what's in the exec-optimize pass.

================
Comment at: lib/Target/AMDGPU/SILowerControlFlow.cpp:212-216
@@ -169,1 +211,7 @@
 
+  // We are running before TwoAddressInstructions, and si_else's operands are
+  // tied. In order to correctly tie the registers, split this into a copy of
+  // the src like it does.
+  BuildMI(MBB, Start, DL, TII->get(AMDGPU::COPY), DstReg)
+    .addOperand(MI.getOperand(1)); // Saved EXEC
+
----------------
What is the reason for SI_ELSE's operands being tied? I think I remember why they were tied while lowering happened after regalloc: it was a trick to make sure the live ranges of the register after lowering were subsets of the live ranges before lowering.

But now that the lowering happens before regalloc, that seems unnecessary. Am I missing something?

================
Comment at: lib/Target/AMDGPU/SILowerControlFlow.cpp:221-224
@@ -172,5 +220,6 @@
   MachineInstr *OrSaveExec =
     BuildMI(MBB, Start, DL, TII->get(AMDGPU::S_OR_SAVEEXEC_B64), DstReg)
-    .addOperand(MI.getOperand(1)); // Saved EXEC
+    .addReg(DstReg);
+
   MachineBasicBlock *DestBB = MI.getOperand(2).getMBB();
 
----------------
Did you run clang-format-diff on this? I think it will complain about the formatting. (I'm personally rather annoyed at the fact that it does, but we should play by the rules...)

================
Comment at: lib/Target/AMDGPU/SIOptimizeExecMasking.cpp:38
@@ +37,3 @@
+  const char *getPassName() const override {
+    return "SI optimize scalar bit operations";
+  }
----------------
That name doesn't match with what's used elsewhere.

================
Comment at: lib/Target/AMDGPU/SIOptimizeExecMasking.cpp:274-281
@@ +273,10 @@
+
+    if (Src0.isReg() && Src0.getReg() == CopyFromExec) {
+      CopyOp = &Src0;
+      OtherOp = &Src1;
+    } else if (Src1.isReg() && Src1.getReg() == CopyFromExec) {
+      CopyOp = &Src1;
+      OtherOp = &Src0;
+    } else
+      llvm_unreachable("unexpected");
+
----------------
ANDN2 and ORN2 are non-commutative, so we have to bail here for those opcodes.


https://reviews.llvm.org/D24216