[PATCH] D16603: AMDGPU/SI: Detect uniform branches and emit s_cbranch instructions

Tue Jan 26 19:14:27 PST 2016

arsenm added a comment.

Thanks for finally doing this. Does this fix infinite loops? I didn't see one in the new tests, and I think there are a couple of XFAILed ones right now.

I think we should put an assert somewhere that the immediate for the branch target isn't overflowing. I'm worried that we could end up  hitting that limit someday.


================
Comment at: lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp:42
@@ +41,3 @@
+static bool isCBranchSCC(const SDNode *N) {
+  assert (N->getOpcode() == ISD::BRCOND);
+  if (!N->hasOneUse())
----------------
Extra space before (

================
Comment at: lib/Target/AMDGPU/SIFixSGPRLiveRanges.cpp:113
@@ +112,3 @@
+                               const SIInstrInfo *TII) {
+    for (MachineBasicBlock::const_iterator I = MBB->getFirstTerminator(),
+                                          E = MBB->end(); I != E; ++I) {
----------------
Indentation looks off

================
Comment at: lib/Target/AMDGPU/SIFixSGPRLiveRanges.cpp:115
@@ +114,3 @@
+                                          E = MBB->end(); I != E; ++I) {
+      if (!TII->isSOPP(I->getOpcode()))
+        return false;
----------------
You should use TII->isSOPP(*I) here.

I think there should be an assert that I->isBranch()

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:1168
@@ -1165,2 +1167,3 @@
 
-  assert(Intr->getOpcode() == ISD::INTRINSIC_W_CHAIN);
+  if (Intr->getOpcode() != ISD::INTRINSIC_W_CHAIN) {
+    // This is a uniform branch so we don't need to leaglize.
----------------
It seems like there should be an assert or something that this is actually a branch intrinsic

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:1169
@@ +1168,3 @@
+  if (Intr->getOpcode() != ISD::INTRINSIC_W_CHAIN) {
+    // This is a uniform branch so we don't need to leaglize.
+    return BRCOND;
----------------
Typo leaglize

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:1172-1177
@@ -1167,1 +1171,8 @@
+  }
+
+  if (SetCC) {
+    assert(SetCC->getConstantOperandVal(1) == 1);
+    assert(cast<CondCodeSDNode>(SetCC->getOperand(2).getNode())->get() ==
+           ISD::SETNE);
+  }
 
----------------
The SetCC in the if should be folded into a single assert, or this whole block be put under an #ifndef NDEBUG block

================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:2913-2914
@@ -2889,1 +2912,4 @@
 
+void SIInstrInfo::addSCBranchUsersToVALUWorklist(MachineInstr *SCCDefInst,
+                              SmallVectorImpl<MachineInstr *> &Worklist) const {
+  // This assumes that all the S_CBRANCH* users of SCC are in the same block
----------------
I think this could use a better name. How about something like
addSCCDefUsersToVALUWorklist?

================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:2915-2916
@@ +2914,4 @@
+                              SmallVectorImpl<MachineInstr *> &Worklist) const {
+  // This assumes that all the S_CBRANCH* users of SCC are in the same block
+  // as the SCC def.
+  for (MachineBasicBlock::iterator I = SCCDefInst,
----------------
We should probably have a verifier check for this

================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:2924-2925
@@ +2923,4 @@
+
+    if (I->getOpcode() == AMDGPU::S_CBRANCH_SCC0 ||
+        I->getOpcode() == AMDGPU::S_CBRANCH_SCC1)
+      Worklist.push_back(I);
----------------
Why is this limited to branches? What if you had another user like a s_cselect? I would expect any user to need processing

================
Comment at: lib/Target/AMDGPU/SIInstrInfo.td:252
@@ +251,3 @@
+  (ops node:$lhs, node:$rhs, node:$cond),
+  (setcc  node:$lhs, node:$rhs, node:$cond), [{
+  for (SDNode *Use : N->uses()) {
----------------
Extra space

================
Comment at: lib/Target/AMDGPU/SIInstrInfo.td:855
@@ -829,2 +854,3 @@
 }
+      //[(set i1:$dst, (setcc P.Src0VT:$src0, P.Src1VT:$src1, cond))]),
 
----------------
Commented out code

================
Comment at: lib/Target/AMDGPU/SIInstructions.td:465
@@ -464,1 +464,3 @@
+  "s_cbranch_vccnz $simm16",
+  []
 >;
----------------
Looks unnecessary

================
Comment at: lib/Target/AMDGPU/SIInstructions.td:3118-3119
@@ -3115,4 +3117,4 @@
   (i64 (sext i32:$src)),
     (REG_SEQUENCE SReg_64, $src, sub0,
-    (S_ASHR_I32 $src, 31), sub1)
+    (i32 (COPY_TO_REGCLASS (S_ASHR_I32 $src, 31), SGPR_32)), sub1)
 >;
----------------
This looks like the workaround for a tablegen bug with multiple outputs to reg_sequence. Maybe note that in a FIXME?

================
Comment at: lib/Target/AMDGPU/SIRegisterInfo.cpp:413
@@ -412,2 +412,3 @@
   static const TargetRegisterClass *const BaseClasses[] = {
+    &AMDGPU::SCC_CLASSRegClass,
     &AMDGPU::VGPR_32RegClass,
----------------
This should be the last in the array. The more common types should be first

================
Comment at: test/CodeGen/AMDGPU/si-annotate-cf.ll:43-46
@@ -41,5 +42,6 @@
 
-define void @phi_cond_outside_loop(i32 %a, i32 %b) {
+define void @phi_cond_outside_loop(i32 %b) {
 entry:
-  %0 = icmp eq i32 %a , 0
+  %tid = call i32 @llvm.amdgcn.mbcnt.lo(i32 -1, i32 0) #0
+  %0 = icmp eq i32 %tid , 0
   br i1 %0, label %if, label %else
----------------
Why this test change and the others like it?

================
Comment at: test/CodeGen/AMDGPU/uniform-cfg.ll:362
@@ +361,3 @@
+}
+
+declare i32 @llvm.r600.read.tidig.x() #0
----------------
I think there should be a test where an icmp is the branch condition in a different block (there might be on here that I missed). One with the icmp used as the local branch's condition and another block's as well.

There are a few other perverse cases that might be nice to test, like branch condition on inline asm output.


http://reviews.llvm.org/D16603