[llvm] [AMDGPU] Extend wave reduce intrinsics for i32 type (PR #126469)
Joseph Huber via llvm-commits
llvm-commits at lists.llvm.org
Thu Mar 27 10:04:47 PDT 2025
jhuber6 wrote:
> I'd like to flag that these - or separate - intrinsics should have a clustered reduce mode, such that you can, say, do "the first 16 lanes get the reduction of their values, the second 16 lanes get the reduction of their values, ...".
>
> Otherwise, higher-level code like LLPC or (in development) MLIR will need to implement that logic itself
That's basically the `width` argument to the CUDA `__shfl` intrinsic, right? We could reasonably add that as an argument to this.
https://github.com/llvm/llvm-project/pull/126469
More information about the llvm-commits
mailing list