[PATCH] D16603: AMDGPU/SI: Detect uniform branches and emit s_cbranch instructions
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Tue Jan 26 19:14:27 PST 2016
arsenm added a comment.
Thanks for finally doing this. Does this fix infinite loops? I didn't see one in the new tests, and I think there are a couple of XFAILed ones right now.
I think we should put an assert somewhere that the immediate for the branch target isn't overflowing. I'm worried that we could end up hitting that limit someday.
================
Comment at: lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp:42
@@ +41,3 @@
+static bool isCBranchSCC(const SDNode *N) {
+ assert (N->getOpcode() == ISD::BRCOND);
+ if (!N->hasOneUse())
----------------
Extra space before (
================
Comment at: lib/Target/AMDGPU/SIFixSGPRLiveRanges.cpp:113
@@ +112,3 @@
+ const SIInstrInfo *TII) {
+ for (MachineBasicBlock::const_iterator I = MBB->getFirstTerminator(),
+ E = MBB->end(); I != E; ++I) {
----------------
Indentation looks off
================
Comment at: lib/Target/AMDGPU/SIFixSGPRLiveRanges.cpp:115
@@ +114,3 @@
+ E = MBB->end(); I != E; ++I) {
+ if (!TII->isSOPP(I->getOpcode()))
+ return false;
----------------
You should use TII->isSOPP(*I) here.
I think there should be an assert that I->isBranch()
================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:1168
@@ -1165,2 +1167,3 @@
- assert(Intr->getOpcode() == ISD::INTRINSIC_W_CHAIN);
+ if (Intr->getOpcode() != ISD::INTRINSIC_W_CHAIN) {
+ // This is a uniform branch so we don't need to leaglize.
----------------
It seems like there should be an assert or something that this is actually a branch intrinsic
================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:1169
@@ +1168,3 @@
+ if (Intr->getOpcode() != ISD::INTRINSIC_W_CHAIN) {
+ // This is a uniform branch so we don't need to leaglize.
+ return BRCOND;
----------------
Typo leaglize
================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:1172-1177
@@ -1167,1 +1171,8 @@
+ }
+
+ if (SetCC) {
+ assert(SetCC->getConstantOperandVal(1) == 1);
+ assert(cast<CondCodeSDNode>(SetCC->getOperand(2).getNode())->get() ==
+ ISD::SETNE);
+ }
----------------
The SetCC in the if should be folded into a single assert, or this whole block be put under an #ifndef NDEBUG block
================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:2913-2914
@@ -2889,1 +2912,4 @@
+void SIInstrInfo::addSCBranchUsersToVALUWorklist(MachineInstr *SCCDefInst,
+ SmallVectorImpl<MachineInstr *> &Worklist) const {
+ // This assumes that all the S_CBRANCH* users of SCC are in the same block
----------------
I think this could use a better name. How about something like
addSCCDefUsersToVALUWorklist?
================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:2915-2916
@@ +2914,4 @@
+ SmallVectorImpl<MachineInstr *> &Worklist) const {
+ // This assumes that all the S_CBRANCH* users of SCC are in the same block
+ // as the SCC def.
+ for (MachineBasicBlock::iterator I = SCCDefInst,
----------------
We should probably have a verifier check for this
================
Comment at: lib/Target/AMDGPU/SIInstrInfo.cpp:2924-2925
@@ +2923,4 @@
+
+ if (I->getOpcode() == AMDGPU::S_CBRANCH_SCC0 ||
+ I->getOpcode() == AMDGPU::S_CBRANCH_SCC1)
+ Worklist.push_back(I);
----------------
Why is this limited to branches? What if you had another user like a s_cselect? I would expect any user to need processing
================
Comment at: lib/Target/AMDGPU/SIInstrInfo.td:252
@@ +251,3 @@
+ (ops node:$lhs, node:$rhs, node:$cond),
+ (setcc node:$lhs, node:$rhs, node:$cond), [{
+ for (SDNode *Use : N->uses()) {
----------------
Extra space
================
Comment at: lib/Target/AMDGPU/SIInstrInfo.td:855
@@ -829,2 +854,3 @@
}
+ //[(set i1:$dst, (setcc P.Src0VT:$src0, P.Src1VT:$src1, cond))]),
----------------
Commented out code
================
Comment at: lib/Target/AMDGPU/SIInstructions.td:465
@@ -464,1 +464,3 @@
+ "s_cbranch_vccnz $simm16",
+ []
>;
----------------
Looks unnecessary
================
Comment at: lib/Target/AMDGPU/SIInstructions.td:3118-3119
@@ -3115,4 +3117,4 @@
(i64 (sext i32:$src)),
(REG_SEQUENCE SReg_64, $src, sub0,
- (S_ASHR_I32 $src, 31), sub1)
+ (i32 (COPY_TO_REGCLASS (S_ASHR_I32 $src, 31), SGPR_32)), sub1)
>;
----------------
This looks like the workaround for a tablegen bug with multiple outputs to reg_sequence. Maybe note that in a FIXME?
================
Comment at: lib/Target/AMDGPU/SIRegisterInfo.cpp:413
@@ -412,2 +412,3 @@
static const TargetRegisterClass *const BaseClasses[] = {
+ &AMDGPU::SCC_CLASSRegClass,
&AMDGPU::VGPR_32RegClass,
----------------
This should be the last in the array. The more common types should be first
================
Comment at: test/CodeGen/AMDGPU/si-annotate-cf.ll:43-46
@@ -41,5 +42,6 @@
-define void @phi_cond_outside_loop(i32 %a, i32 %b) {
+define void @phi_cond_outside_loop(i32 %b) {
entry:
- %0 = icmp eq i32 %a , 0
+ %tid = call i32 @llvm.amdgcn.mbcnt.lo(i32 -1, i32 0) #0
+ %0 = icmp eq i32 %tid , 0
br i1 %0, label %if, label %else
----------------
Why this test change and the others like it?
================
Comment at: test/CodeGen/AMDGPU/uniform-cfg.ll:362
@@ +361,3 @@
+}
+
+declare i32 @llvm.r600.read.tidig.x() #0
----------------
I think there should be a test where an icmp is the branch condition in a different block (there might be on here that I missed). One with the icmp used as the local branch's condition and another block's as well.
There are a few other perverse cases that might be nice to test, like branch condition on inline asm output.
http://reviews.llvm.org/D16603
More information about the llvm-commits
mailing list