[PATCH] D146287: [AMDGPU][GISel] Add inverse ballot intrinsic

Nicolai Hähnle via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Mar 28 12:30:14 PDT 2023


nhaehnle added inline comments.


================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp:2540
+    default:
+      llvm_unreachable_internal("Unsupported size for inverse ballot mask.");
+    }
----------------
arsenm wrote:
> Should let it fall through to fail to select instead
Probably better to error out explicitly, no? A fall-through is easily missed by future changes.


================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4502-4523
+    for (unsigned i = 0; i < NumParts; ++i) {
+      Register DstPart = MRI.createVirtualRegister(&AMDGPU::SReg_32RegClass);
+      // 64-bit masks need to be split into two 32-bit parts.
+      // We need to address the subregs for the readfirstlanes
+      int SubReg = Is32Mask ? 0 : (AMDGPU::sub0 + i * 8);
+      BuildMI(*BB, &MI, DL, TII->get(AMDGPU::V_READFIRSTLANE_B32),
+              Is32Mask ? DstReg : DstPart)
----------------
This entire sequence could probably be replaced by just calling SIInstrInfo::readlaneVGPRToSGPR.


================
Comment at: llvm/lib/Target/AMDGPU/SIInstructions.td:193
 
+let usesCustomInserter = 1, Defs = [VCC] in {
+def S_INVERSE_BALLOT_U32 : SPseudoInstSI <(outs SReg_32:$sdst), (ins SSrc_b32:$mask)>;
----------------
Remove the Defs = [VCC]. Defs is used to designate implicit defs of physical registers, but these instructions don't do that, they only use virtual registers.


================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.inverse.ballot.i32.ll:118-129
+; GISEL-LABEL: inverse_ballot_branch:
+; GISEL:       ; %bb.0: ; %entry
+; GISEL-NEXT:    s_xor_b32 s2, s1, -1
+; GISEL-NEXT:    s_and_saveexec_b32 s1, s2
+; GISEL-NEXT:  ; %bb.1: ; %if
+; GISEL-NEXT:    s_add_i32 s0, s0, 1
+; GISEL-NEXT:  ; %bb.2: ; %endif
----------------
This code is wrong. That's a problem that's orthogonal to your change and has to do with GlobalISel not correctly taking uniformity analysis into account (yet). But please add a comment to make a note of it (same in the corresponding wave64 test).

The SDAG code is correct.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146287/new/

https://reviews.llvm.org/D146287



More information about the llvm-commits mailing list