[llvm] [NVVM] Update properties for non-sync variants of the SHFL intrinsics (PR #189615)
via llvm-commits
llvm-commits at lists.llvm.org
Tue Mar 31 04:29:20 PDT 2026
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-llvm-ir
Author: Prerona Chaudhuri (pchaudhuri-nv)
<details>
<summary>Changes</summary>
Non-sync SHFL variants (shfl without .sync) are pure functions of their SSA operands and the active thread mask. Assign IntrReadMem, IntrInaccessibleMemOnly and IntrWillReturn so that: - Reading the implicit mask state is modeled for correct ordering with other convergent operations - Truly dead non-sync shfl code can still be DCE'd
Sync SHFL variants keep IntrInaccessibleMemOnly (no IntrReadMem, no IntrWillReturn) to model synchronization side effects and prevent unsafe DCE/reordering.
---
Full diff: https://github.com/llvm/llvm-project/pull/189615.diff
2 Files Affected:
- (modified) llvm/include/llvm/IR/IntrinsicsNVVM.td (+13-4)
- (added) llvm/test/CodeGen/NVPTX/dead-shfl.ll (+19)
``````````diff
diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td b/llvm/include/llvm/IR/IntrinsicsNVVM.td
index b3e0033d005a9..b851fb28ec1d0 100644
--- a/llvm/include/llvm/IR/IntrinsicsNVVM.td
+++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td
@@ -2477,13 +2477,22 @@ def int_nvvm_read_ptx_sreg_dynamic_smem_size :
//
// SHUFFLE
//
-// Generate intrinsics for all variants of shfl instruction.
-let IntrProperties = [IntrInaccessibleMemOnly, IntrConvergent, IntrNoCallback] in {
- foreach sync = [false, true] in {
+// Non-sync SHFL variants are pure functions of their SSA operands and the active
+// thread mask. Memory properties (IntrReadMem + IntrInaccessibleMemOnly + IntrWillReturn)
+// model reading the implicit mask state to ensure correct ordering with other convergent
+// operations, preventing unsafe reordering while allowing DCE of truly dead code.
+//
+// Sync shfl variants have synchronization side effects modeled as
+// inaccessible memory accesses to prevent DCE and reordering.
+foreach isSync = [false, true] in {
+ defvar CommonIntrinsicProps = [IntrInaccessibleMemOnly, IntrConvergent, IntrNoCallback];
+ defvar IntrinsicProps = !if(isSync, CommonIntrinsicProps,
+ !listconcat(CommonIntrinsicProps, [IntrReadMem, IntrWillReturn]));
+ let IntrProperties = IntrinsicProps in {
foreach mode = ["up", "down", "bfly", "idx"] in {
foreach type = ["i32", "f32"] in {
foreach return_pred = [false, true] in {
- defvar i = SHFL_INFO<sync, mode, type, return_pred>;
+ defvar i = SHFL_INFO<isSync, mode, type, return_pred>;
if i.withGccBuiltin then
def i.Name : NVVMBuiltin, Intrinsic<i.RetTy, i.ArgsTy>;
else
diff --git a/llvm/test/CodeGen/NVPTX/dead-shfl.ll b/llvm/test/CodeGen/NVPTX/dead-shfl.ll
new file mode 100644
index 0000000000000..9819e4be6f7e0
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/dead-shfl.ll
@@ -0,0 +1,19 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 6
+; RUN: opt < %s -passes=instsimplify -S -mtriple=nvptx64-nvidia-cuda | FileCheck %s
+
+declare { i32, i1 } @llvm.nvvm.shfl.idx.i32p(i32, i32, i32)
+
+; Test that dead shuffle instructions are eliminated.
+define void @test_dead_shfl_const_zero(i32 %lane) #0 {
+; CHECK-LABEL: define void @test_dead_shfl_const_zero(
+; CHECK-SAME: i32 [[LANE:%.*]]) #[[ATTR1:[0-9]+]] {
+; CHECK-NEXT: ret void
+;
+ %r0 = call { i32, i1 } @llvm.nvvm.shfl.idx.i32p(i32 0, i32 %lane, i32 31)
+ %r1 = call { i32, i1 } @llvm.nvvm.shfl.idx.i32p(i32 0, i32 %lane, i32 31)
+ %r2 = call { i32, i1 } @llvm.nvvm.shfl.idx.i32p(i32 0, i32 %lane, i32 31)
+ %r3 = call { i32, i1 } @llvm.nvvm.shfl.idx.i32p(i32 0, i32 %lane, i32 31)
+ ret void
+}
+
+attributes #0 = { convergent }
``````````
</details>
https://github.com/llvm/llvm-project/pull/189615
More information about the llvm-commits
mailing list