[PATCH] D65088: [AMDGPU][RFC] New llvm.amdgcn.ballot intrinsic

Matt Arsenault via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Mar 18 08:08:59 PDT 2020


arsenm added a comment.

In D65088#1928542 <https://reviews.llvm.org/D65088#1928542>, @Flakebi wrote:

> Use getCopyFromReg(exec) and rebase on fixed whole-wave-mode.
>
> The remaining exec copy can get fixed in multiple ways:
>
> - Prevent sinking the ctpop into the next basic block. Unfortunately marking the instruction as convergent does not work because the Machine Sinking pass moves it instead.
> - Reuse the copy of exec that gets created by `s_and_saveexec`. This might work with GlobalISel, so it’s probably worth to wait until this gets used.


I was afraid this would be a problem. We probably need a version of COPY with convergent set. If MachineSinking is moving a convergent instruction, that's also a bug



================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:8684
+
+      return DCI.DAG.getCopyFromReg(DCI.DAG.getEntryNode(), SDLoc(N), Exec, VT);
+    }
----------------
I didn't think of it before, but reading exec this way could potentially be dangerous due to the fact that we have the terrible operations that modify exec in the middle of IR blocks, and we split them later. We might have to do this fold later


================
Comment at: llvm/lib/Target/AMDGPU/SIInstructions.td:428
+  (i64 (int_amdgcn_ballot i1:$src)),
+  (S_AND_B64 (i64 (COPY_TO_REGCLASS $src, SReg_64)), (i64 EXEC))
+>;
----------------
Is the COPY_TO_REGCLASS just a tablegen workaround?


================
Comment at: llvm/lib/Target/AMDGPU/SIInstructions.td:433
+  (i32 (int_amdgcn_ballot i1:$src)),
+  (S_AND_B32 (i32 (COPY_TO_REGCLASS $src, SReg_32)), (i32 EXEC_LO))
+>;
----------------
Probably need an explicit Src_b32:$src to make this work with GlobalISel


================
Comment at: llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp:3903
+            II->getModule(), Intrinsic::read_register, II->getType());
+        Metadata *MDArgs[] = {MDString::get(II->getContext(), "exec")};
+        MDNode *MD = MDNode::get(II->getContext(), MDArgs);
----------------
What happens for wave32 here?


================
Comment at: llvm/test/Transforms/InstCombine/AMDGPU/amdgcn-intrinsics.ll:2412
+}
+
 ; --------------------------------------------------------------------
----------------
Wave32 tests also


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65088/new/

https://reviews.llvm.org/D65088





More information about the llvm-commits mailing list