[llvm] cuda clang: Fix argument order for __reduce_max_sync (PR #132881)

Durgadoss R via llvm-commits llvm-commits at lists.llvm.org
Tue Mar 25 04:28:16 PDT 2025


================
@@ -315,7 +315,7 @@ defm MATCH_ALLP_SYNC_64 : MATCH_ALLP_SYNC<Int64Regs, "b64", int_nvvm_match_all_s
 multiclass REDUX_SYNC<string BinOp, string PTXType, Intrinsic Intrin> {
   def : NVPTXInst<(outs Int32Regs:$dst), (ins Int32Regs:$src, Int32Regs:$mask),
           "redux.sync." # BinOp # "." # PTXType # " $dst, $src, $mask;",
-          [(set i32:$dst, (Intrin i32:$src, Int32Regs:$mask))]>,
+          [(set i32:$dst, (Intrin i32:$mask, Int32Regs:$src))]>,
----------------
durga4github wrote:

This changes what the Intrinsic definition at the IR level expects. This is in the file IntrinsicsNVVM.td.

>From what I see, the intrinsic definition (in IntrinsicsNVVM.td) and the backend-codegen (NVPTXIntrinsics.td) are aligned. And both are aligned with the ordering in the PTX Documentation, which is always `src` followed by `membermask`.

https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-redux-sync

I am wondering whether this needs a fix in the clang(builtin) to intrinsic-lowering.
(clang/lib/Headers/__clang_cuda_intrinsics.h)

Let us wait to hear from @Artem-B 


https://github.com/llvm/llvm-project/pull/132881


More information about the llvm-commits mailing list