[Mlir-commits] [mlir] [AMDGPU] Implement gpu.subgroup_reduce with DPP intrinsics on AMD GPUs (PR #133204)

Tue Apr 15 21:02:11 PDT 2025

================
@@ -372,6 +500,14 @@ void mlir::populateGpuBreakDownSubgroupReducePatterns(
   patterns.add<ScalarizeSingleElementReduce>(patterns.getContext(), benefit);
 }
 
+void mlir::populateGpuLowerSubgroupReduceToDPPPatterns(
+    RewritePatternSet &patterns, unsigned subgroupSize, amdgpu::Chipset chipset,
+    PatternBenefit benefit) {
+  patterns.add<ScalarSubgroupReduceToDPP>(patterns.getContext(), subgroupSize,
+                                          /*matchClustered=*/true, chipset,
+                                          benefit);
+}
+
----------------
Muzammiluddin-Syed-ECE wrote:

1) You're right I didn't include a way to set that. I was missing another function which I've added to include the `/*matchClustered=*/false` case. 

2) I think your question is what the purpose of `matchClustered` is. I only have a guess, but I believe it's to allow for the selective inclusion of patterns that either do whole subgroup reduction or clustered subgroup reduction. So, if for example in some pass, you only care about reduction across entire subgroups, setting `matchClustered` to false will throw an error upon encountering a reduction across groups of 4 lanes. Why this is useful I'm not sure, but it seems to have been implemented for the patterns which lower to `gpu.shuffle`, so I decided to carry it over just in case. I will have to investigate further to arrive at a more insightful explanation.

https://github.com/llvm/llvm-project/pull/133204