[PATCH] D146287: [AMDGPU][GISel] Add inverse ballot intrinsic
Thomas Symalla via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Mar 20 01:18:19 PDT 2023
tsymalla added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4489
+
+ bool Is32 = (MI.getOpcode() == AMDGPU::S_INVERSE_BALLOT_U32);
+
----------------
Use a better name, like `Is32BitMask`.
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4504
+ Register DstPart = MRI.createVirtualRegister(&AMDGPU::SReg_32RegClass);
+ int SubReg = Is32 ? 0 : (AMDGPU::sub0 + i * 8);
+ BuildMI(*BB, &MI, DL, TII->get(AMDGPU::V_READFIRSTLANE_B32),
----------------
Add a comment on why `0` is used in 32-bit mode.
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:4513
+ if (!Is32) {
+ BuildMI(*BB, &MI, DL, TII->get(AMDGPU::REG_SEQUENCE), DstReg)
+ .addReg(DstParts[0])
----------------
Could also use a comment.
================
Comment at: llvm/lib/Target/AMDGPU/SIInstructions.td:194
+let usesCustomInserter = 1, Defs = [VCC] in {
+def S_INVERSE_BALLOT_U32 : SPseudoInstSI <(outs SReg_32:$sdst), (ins unknown:$src)>;
+
----------------
Can `unknown` be changed to a more constrained register class instead of allowing arbitrary inputs?
================
Comment at: llvm/lib/Target/AMDGPU/SIInstructions.td:194
+let usesCustomInserter = 1, Defs = [VCC] in {
+def S_INVERSE_BALLOT_U32 : SPseudoInstSI <(outs SReg_32:$sdst), (ins unknown:$src)>;
+
----------------
tsymalla wrote:
> Can `unknown` be changed to a more constrained register class instead of allowing arbitrary inputs?
Rather call `$src` `$mask`.
================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.inverse.ballot.i32.ll:153
+endif:
+ %input = phi i32 [ %s0_1, %entry], [ %tmp, %if ]
+
----------------
Nit: Missing whitespace after `%entry`
================
Comment at: llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.inverse.ballot.i64.ll:52
+entry:
+ %ballot = call i1 @llvm.amdgcn.inverse.ballot.i64(i64 u0xFFFFFFFFFFFFFFFF)
+ %sel = select i1 %ballot, i64 1, i64 0
----------------
Can you add tests for more complex bitmask inputs?
================
Comment at: llvm/test/CodeGen/AMDGPU/llvm.amdgcn.inverse.ballot.i64.ll:15
+; GFX10-NEXT: v_cndmask_b32_e64 v2, 0, 1, s[0:1]
+; GFX10-NEXT: global_store_dwordx2 v[0:1], v[2:3], off
+; GFX10-NEXT: s_endpgm
----------------
Can be optimized, as s0-s2 and v2-v3 are zero. Probably add another test to show the MIR your code reduces the intrinsic call to
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D146287/new/
https://reviews.llvm.org/D146287
More information about the llvm-commits
mailing list