[llvm] [AMDGPU] Extend wave reduce intrinsics for i32 type (PR #126469)

Joseph Huber via llvm-commits llvm-commits at lists.llvm.org
Thu Mar 27 10:04:47 PDT 2025


jhuber6 wrote:

> I'd like to flag that these - or separate - intrinsics should have a clustered reduce mode, such that you can, say, do "the first 16 lanes get the reduction of their values, the second 16 lanes get the reduction of their values, ...".
> 
> Otherwise, higher-level code like LLPC or (in development) MLIR will need to implement that logic itself

That's basically the `width` argument to the CUDA `__shfl` intrinsic, right? We could reasonably add that as an argument to this.

https://github.com/llvm/llvm-project/pull/126469


More information about the llvm-commits mailing list