[llvm] AMDGPU/GlobalISel: Fix inst-selection of ballot (PR #109986)

Petar Avramovic via llvm-commits llvm-commits at lists.llvm.org
Mon Sep 30 05:55:01 PDT 2024


================
@@ -1429,34 +1429,129 @@ bool AMDGPUInstructionSelector::selectBallot(MachineInstr &I) const {
   std::optional<ValueAndVReg> Arg =
       getIConstantVRegValWithLookThrough(I.getOperand(2).getReg(), *MRI);
 
-  const auto BuildCopy = [&](Register SrcReg) {
-    if (Size == STI.getWavefrontSize()) {
-      BuildMI(*BB, &I, DL, TII.get(AMDGPU::COPY), DstReg)
-          .addReg(SrcReg);
-      return;
+  const auto getCmpInput = [&]() -> MachineInstr * {
+    MachineInstr *SrcMI = getDefIgnoringCopies(I.getOperand(2).getReg(), *MRI);
+    // Try to fold sgpr compare.
+    if (SrcMI->getOpcode() == AMDGPU::G_TRUNC)
+      SrcMI = MRI->getVRegDef(SrcMI->getOperand(1).getReg());
+
+    if (SrcMI->getOpcode() == AMDGPU::G_ICMP ||
+        SrcMI->getOpcode() == AMDGPU::G_FCMP)
+      return SrcMI;
+    return nullptr;
+  };
+
+  const auto FoldCmp = [&](Register Dst, MachineInstr *CmpMI) -> bool {
+    // Fold ballot of a compare. Active lanes when the ballot is executed need
----------------
petar-avramovic wrote:

We can do better. There is no need to check anything if compare is used directly. 

For example:
```
%bb.entry
%cmp = ...
s_and_saveexec_b32 ...

%bb.divergent_block
 %res = ballot(%cmp)
```
Works because active lanes in `%bb.divergent_block` are subset of active lanes in `%bb.entry`, so "sinking compare" to other block works according to ballot description (put 0 in inactive lanes).

the only potential problem is when 

```
%bb.entry
s_and_saveexec_b32 ...

%bb.divergent_block
 %cmp = ...

; %bb.merge.control.flow
 s_or_b32 %exec.from.bb.divergent_block, ...
 %res = ballot(%cmp)
```
Lane could be inactive in `%bb.divergent_block`
but active in `%bb.merge.control.flow,` and ballot result could be 1 instead of 0.
However that case is not written correctly. ballot can't use such compare directly, it has to use phi.

```
%bb.entry
s_and_saveexec_b32 ...

%bb.divergent_block
 **%cmp** = ...
...
; %bb.merge.control.flow
 s_or_b32 %exec.from.bb.divergent_block, ...
 %phi = phi i1 [ **%cmp**, %bb.divergent_block ], [ ..., ...]
 %res = ballot(%phi)
```

So in summary we do same thing as sdag if input is compare (sink it). Otherwise select and with exec.
This is better then what selection dag does when input is not compare (sdag does select 0, 1 (lane mask to vgpr) + compare vgpr with 0)

https://github.com/llvm/llvm-project/pull/109986


More information about the llvm-commits mailing list