[llvm] [NVVM] Update properties for non-sync variants of the SHFL intrinsics (PR #189615)

via llvm-commits llvm-commits at lists.llvm.org
Tue Mar 31 04:29:20 PDT 2026


llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-llvm-ir

Author: Prerona Chaudhuri (pchaudhuri-nv)

<details>
<summary>Changes</summary>

Non-sync SHFL variants (shfl without .sync) are pure functions of their SSA operands and the active thread mask. Assign IntrReadMem, IntrInaccessibleMemOnly and IntrWillReturn so that: - Reading the implicit mask state is modeled for correct ordering with other convergent operations - Truly dead non-sync shfl code can still be DCE'd

Sync SHFL variants keep IntrInaccessibleMemOnly (no IntrReadMem, no IntrWillReturn) to model synchronization side effects and prevent unsafe DCE/reordering.

---
Full diff: https://github.com/llvm/llvm-project/pull/189615.diff


2 Files Affected:

- (modified) llvm/include/llvm/IR/IntrinsicsNVVM.td (+13-4) 
- (added) llvm/test/CodeGen/NVPTX/dead-shfl.ll (+19) 


``````````diff
diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td b/llvm/include/llvm/IR/IntrinsicsNVVM.td
index b3e0033d005a9..b851fb28ec1d0 100644
--- a/llvm/include/llvm/IR/IntrinsicsNVVM.td
+++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td
@@ -2477,13 +2477,22 @@ def int_nvvm_read_ptx_sreg_dynamic_smem_size :
 //
 // SHUFFLE
 //
-// Generate intrinsics for all variants of shfl instruction.
-let IntrProperties = [IntrInaccessibleMemOnly, IntrConvergent, IntrNoCallback] in {
-  foreach sync = [false, true] in {
+// Non-sync SHFL variants are pure functions of their SSA operands and the active
+// thread mask. Memory properties (IntrReadMem + IntrInaccessibleMemOnly + IntrWillReturn) 
+// model reading the implicit mask state to ensure correct ordering with other convergent
+// operations, preventing unsafe reordering while allowing DCE of truly dead code.
+//
+// Sync shfl variants have synchronization side effects modeled as
+// inaccessible memory accesses to prevent DCE and reordering.
+foreach isSync = [false, true] in {
+  defvar CommonIntrinsicProps = [IntrInaccessibleMemOnly, IntrConvergent, IntrNoCallback];
+  defvar IntrinsicProps = !if(isSync, CommonIntrinsicProps,
+                              !listconcat(CommonIntrinsicProps, [IntrReadMem, IntrWillReturn]));
+  let IntrProperties = IntrinsicProps in {
     foreach mode = ["up", "down", "bfly", "idx"] in {
       foreach type = ["i32", "f32"] in {
         foreach return_pred = [false, true] in {
-          defvar i = SHFL_INFO<sync, mode, type, return_pred>;
+          defvar i = SHFL_INFO<isSync, mode, type, return_pred>;
           if i.withGccBuiltin then
             def i.Name : NVVMBuiltin, Intrinsic<i.RetTy, i.ArgsTy>;
           else
diff --git a/llvm/test/CodeGen/NVPTX/dead-shfl.ll b/llvm/test/CodeGen/NVPTX/dead-shfl.ll
new file mode 100644
index 0000000000000..9819e4be6f7e0
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/dead-shfl.ll
@@ -0,0 +1,19 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
+; RUN: opt < %s -passes=instsimplify -S -mtriple=nvptx64-nvidia-cuda | FileCheck %s
+
+declare { i32, i1 } @llvm.nvvm.shfl.idx.i32p(i32, i32, i32)
+
+; Test that dead shuffle instructions are eliminated.
+define void @test_dead_shfl_const_zero(i32 %lane) #0 {
+; CHECK-LABEL: define void @test_dead_shfl_const_zero(
+; CHECK-SAME: i32 [[LANE:%.*]]) #[[ATTR1:[0-9]+]] {
+; CHECK-NEXT:    ret void
+;
+  %r0 = call { i32, i1 } @llvm.nvvm.shfl.idx.i32p(i32 0, i32 %lane, i32 31)
+  %r1 = call { i32, i1 } @llvm.nvvm.shfl.idx.i32p(i32 0, i32 %lane, i32 31)
+  %r2 = call { i32, i1 } @llvm.nvvm.shfl.idx.i32p(i32 0, i32 %lane, i32 31)
+  %r3 = call { i32, i1 } @llvm.nvvm.shfl.idx.i32p(i32 0, i32 %lane, i32 31)
+  ret void
+}
+
+attributes #0 = { convergent }

``````````

</details>


https://github.com/llvm/llvm-project/pull/189615


More information about the llvm-commits mailing list