[Mlir-commits] [mlir] [AMDGPU] Implement gpu.subgroup_reduce with DPP intrinsics on AMD GPUs (PR #133204)

Wed Apr 2 16:12:42 PDT 2025

Muzammiluddin-Syed-ECE wrote:

> Specific note: looking at the device libraries, they use the `share(0)` permutation and then specifically `permlanex16` to get the row broadcasts (and might shift left instead of right)

I can't seem to find an equivalent op to `permlanex16` defined in the ROCDL or AMDGPU dialects in mlir, should I be using the intrinsics from llvm here instead?

https://github.com/llvm/llvm-project/pull/133204