[Mlir-commits] [mlir] [AMDGPU] Implement gpu.subgroup_reduce with DPP intrinsics on AMD GPUs (PR #133204)

Wed Apr 16 00:49:50 PDT 2025

================
@@ -372,6 +500,14 @@ void mlir::populateGpuBreakDownSubgroupReducePatterns(
   patterns.add<ScalarizeSingleElementReduce>(patterns.getContext(), benefit);
 }
 
+void mlir::populateGpuLowerSubgroupReduceToDPPPatterns(
+    RewritePatternSet &patterns, unsigned subgroupSize, amdgpu::Chipset chipset,
+    PatternBenefit benefit) {
+  patterns.add<ScalarSubgroupReduceToDPP>(patterns.getContext(), subgroupSize,
+                                          /*matchClustered=*/true, chipset,
+                                          benefit);
+}
+
----------------
andfau-amd wrote:

Thanks for tagging me! I described the motivation in the commit message of https://github.com/llvm/llvm-project/commit/a800ffac4115259a76d803512eda31e4de787570. Basically, for certain backends, you might want to or have to apply different lowering strategies for the clustered and non-clustered forms. Off the top of my head, I'm pretty sure I had Vulkan SPIR-V in mind here, because there's a native SPIR-V op for doing a non-clustered reduction, whereas the clustered form would need to use the lowering to shuffles.

https://github.com/llvm/llvm-project/pull/133204