[Mlir-commits] [mlir] [AMDGPU] Implement gpu.subgroup_reduce with DPP intrinsics on AMD GPUs (PR #133204)
Krzysztof Drewniak
llvmlistbot at llvm.org
Wed Apr 2 16:32:10 PDT 2025
krzysz00 wrote:
> Ok so after actually running a few sizes of MatVecs, I see that it runs into the same issue as our pass of "ExpandGPUOps" decomposing the subgroup_reduce before it can make it to these passes.
>
> So, in conclusion, why does that that pattern work? It doesn't...
I think this means IREE's `ExpandGPUOps` needs to be fixed to run this pattern before - or with a higher benefit - than the expansion to shuffles, then.
> I can't seem to find an equivalent op to permlanex16 defined in the ROCDL or AMDGPU dialects in mlir, should I be using the intrinsics from llvm here instead?
You'll at the very least want to add `rocdl.permlanex16` if it doesn't - and you may want `amdgpu.permlanex16` if there're bitcasts / splitting up vectors required
https://github.com/llvm/llvm-project/pull/133204
More information about the Mlir-commits
mailing list