[llvm] cuda clang: Fix argument order for __reduce_max_sync (PR #132881)

Austin Schuh via llvm-commits llvm-commits at lists.llvm.org
Mon Mar 24 21:50:59 PDT 2025


https://github.com/AustinSchuh created https://github.com/llvm/llvm-project/pull/132881

The following cuda kernel would crash with an "an illegal instruction was encountered" message.

__global__ void testcode(const float* data, unsigned *max_value) {
    unsigned r = static_cast<unsigned>(data[threadIdx.x]);

    const unsigned mask = __ballot_sync(0xFFFFFFFF, true);

    unsigned mx = __reduce_max_sync(mask, r);
    atomicMax(max_value, mx);
}

Digging into the ptx from both nvcc and clang, I discovered that the arguments for the mask and value were swapped.  This swaps them back.

Fixes: https://github.com/llvm/llvm-project/issues/131415

>From 381ba9f21b4c2a2c4028143b84e9f71c8a20692f Mon Sep 17 00:00:00 2001
From: Austin Schuh <austin.linux at gmail.com>
Date: Mon, 24 Mar 2025 21:42:45 -0700
Subject: [PATCH] cuda clang: Fix argument order for __reduce_max_sync

The following cuda kernel would crash with an "an illegal instruction
was encountered" message.

__global__ void testcode(const float* data, unsigned *max_value) {
    unsigned r = static_cast<unsigned>(data[threadIdx.x]);

    const unsigned mask = __ballot_sync(0xFFFFFFFF, true);

    unsigned mx = __reduce_max_sync(mask, r);
    atomicMax(max_value, mx);
}

Digging into the ptx from both nvcc and clang, I discovered that the
arguments for the mask and value were swapped.  This swaps them back.

Fixes: https://github.com/llvm/llvm-project/issues/131415
Signed-off-by: Austin Schuh <austin.linux at gmail.com>
---
 llvm/lib/Target/NVPTX/NVPTXIntrinsics.td | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index b2e05a567b4fe..1943e94c3ee7a 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -315,7 +315,7 @@ defm MATCH_ALLP_SYNC_64 : MATCH_ALLP_SYNC<Int64Regs, "b64", int_nvvm_match_all_s
 multiclass REDUX_SYNC<string BinOp, string PTXType, Intrinsic Intrin> {
   def : NVPTXInst<(outs Int32Regs:$dst), (ins Int32Regs:$src, Int32Regs:$mask),
           "redux.sync." # BinOp # "." # PTXType # " $dst, $src, $mask;",
-          [(set i32:$dst, (Intrin i32:$src, Int32Regs:$mask))]>,
+          [(set i32:$dst, (Intrin i32:$mask, Int32Regs:$src))]>,
         Requires<[hasPTX<70>, hasSM<80>]>;
 }
 



More information about the llvm-commits mailing list