[Mlir-commits] [mlir] [mlir][gpu] Pattern to promote `gpu.shuffle` to specialized AMDGPU ops (PR #137109)

Fri Apr 25 08:03:03 PDT 2025

kuhar wrote:

> We are decomposing reductions early in Wave so it's just a bunch of `gpu.shuffle xor` + `arith`.
> 
> And in general, `gpu.shuffle` is fundamental building block so it would be nice to have a good lowering for it.

Do you have any reduction that cannot fit the `gpu.subgroup_reduce` op semantics (with cluster sizes and offsets)? If that's the case, could some other higher-level op help you?

> I will probably add dpp support later, so the overall uplifting flow will be shuffle -> swizzle -> dpp

In general, this seems backwards to me wrt how we usually structure lowering in mlir, but the saving grace is that this is not strictly /promotion/uplifting, since you go to `amdgpu` which doesn't have any 'horizontal' lowering AFAIK that could undo these promotions.

> there might be `gpu.shuffle` cases that can be lowered more efficiently, perhaps

+1, this is a good way to frame this. My only concern here is that this is adding complexity/maintenance burden we may not need. If we went the lowering route.

https://github.com/llvm/llvm-project/pull/137109