[llvm] Improve selection of conditional branch on amdgcn.ballot!=0 condition in SelectionDAG. (PR #68714)
via llvm-commits
llvm-commits at lists.llvm.org
Wed Oct 11 11:30:29 PDT 2023
================
@@ -13584,6 +13585,56 @@ SDValue SITargetLowering::performClampCombine(SDNode *N,
return SDValue(CSrc, 0);
}
+SDValue SITargetLowering::performBRCondCombine(SDNode *N,
+ DAGCombinerInfo &DCI) const {
+ if (!DCI.isAfterLegalizeDAG())
+ return SDValue(N, 0);
+
+ SDValue Cond = N->getOperand(1);
+ if (Cond.getOpcode() == ISD::SETCC &&
+ Cond->getOperand(0)->getOpcode() == AMDGPUISD::SETCC) {
+
+ // %VCMP = i32/i64 AMDGPUISD::SETCC ...
+ // %C = ISD::SETCC %VCMP, 0, setne/seteq
+ // BRCOND %BB, %C
+ // =>
+ // %VCMP = i32/i64 AMDGPUISD::SETCC ...
+ // BRCONDZ %BB, %VCMP, setne/seteq
+
+ auto CC = cast<CondCodeSDNode>(Cond->getOperand(2))->get();
+ auto *CRHS = dyn_cast<ConstantSDNode>(Cond->getOperand(1));
+ if ((CC == ISD::SETEQ || CC == ISD::SETNE) && CRHS && CRHS->isZero()) {
+
+ auto VCMP = Cond->getOperand(0);
+ auto VCMP_CC = cast<CondCodeSDNode>(VCMP.getOperand(2))->get();
+ auto *VCMP_CRHS = dyn_cast<ConstantSDNode>(VCMP.getOperand(1));
+ auto Src = VCMP;
+ if (VCMP_CC == ISD::SETNE && VCMP_CRHS && VCMP_CRHS->isZero()) {
+
+ // Special case for amdgcn.ballot:
+ // %VCMPSrc = ISD::SETCC or a logical combination of ISD::SETCCs
+ // %VCMP = i32/i64 AMDGPUISD::SETCC (ext %VCMPSrc), 0, setne
+ // %C = ISD::SETCC %VCMP, 0, setne/seteq
+ // BRCOND %BB, %C
+ // =>
+ // BRCONDZ %BB, %VCMPSrc, setne/seteq
----------------
vpykhtin wrote:
Thank you Jay, these're good suggestions and they showed me that BRCONDZ has vague semantics.
The problem is that when we get rid of AMDGPU::SETCC (as in your second pattern) we're loosing the semantic of a boolean mask accross all lanes.
This confusion comes from the description of BRCONDZ node that can accept i1 SETCC value or logical combination of SETCCs. Instead it should be defined as BRCONDZ only accepts the result of AMDGPU::SETCC node which means that BRCONDZ compares uniform boolean mask resulted from the SETCC operation to zero.
For example:
```
%Mask = i64 AMDGPU::SETCC i1 %c, 42, setle
BRCONDZ i64 %Mask, BB2, setne
=>
v_cmp_le vcc, v0, 42
c_cbranch_vccnz bb2
```
Now the confusing case with ballot:
```
%c = i1 SETCC ...
%Mask = i64 AMDGPU::SETCC i32 (zext i32 %c), 0, setne ; <- this is lowered ballot
BRCONDZ i64 %Mask, BB2, setne
=>
v_cmp_ vcc, ... ; <- SETCC
s_cmp_ne vcc, 0 ; <- ballot
c_cbranch_scc1 bb2
```
Here we want to remove second comparison because vcc is a required boolean mask across all lanes resulted from the first v_cmp_ instruction and it can be compared to zero with the branch. In SelectionDAG we know that SETCC for a divergent value will be selected as v_cmp instruction and we can use its result directly and that is why SETCC input to ballot is delivered to BRCONDZ. This is probably wrong.
I'm not really sure how to resolve this yet.
https://github.com/llvm/llvm-project/pull/68714
More information about the llvm-commits
mailing list