[Mlir-commits] [mlir] [AMDGPU] Implement gpu.subgroup_reduce with DPP intrinsics on AMD GPUs (PR #133204)

Wed Apr 2 15:32:33 PDT 2025

Muzammiluddin-Syed-ECE wrote:

> ... I've just realized I have a structural comment - that is, darn it, we might want to move the code again. If you look at `LowerGpuOpsToNVVMOps.cpp` , that implements the subgroup reduce lowering as part of the conversion to Nvidia-flavored LLVM IR.
> 
> Can you take a look and see why that pattern works / why we can't just stick this in `LowerGPUOpsToROCDL`?

Ok so after actually running a few sizes of MatVecs, I see that it runs into the same issue as our pass of "ExpandGPUOps" decomposing the subgroup_reduce before it can make it to these passes. 

So, in conclusion, why does that that pattern work? It doesn't. 

https://github.com/llvm/llvm-project/pull/133204