[Mlir-commits] [mlir] [AMDGPU] Implement gpu.subgroup_reduce with DPP intrinsics on AMD GPUs (PR #133204)
llvmlistbot at llvm.org
llvmlistbot at llvm.org
Wed Apr 2 16:12:42 PDT 2025
Muzammiluddin-Syed-ECE wrote:
> Specific note: looking at the device libraries, they use the `share(0)` permutation and then specifically `permlanex16` to get the row broadcasts (and might shift left instead of right)
I can't seem to find an equivalent op to `permlanex16` defined in the ROCDL or AMDGPU dialects in mlir, should I be using the intrinsics from llvm here instead?
https://github.com/llvm/llvm-project/pull/133204
More information about the Mlir-commits
mailing list