[Mlir-commits] [mlir] [AMDGPU] Implement gpu.subgroup_reduce with DPP intrinsics on AMD GPUs (PR #133204)

Wed Apr 16 01:11:33 PDT 2025

================
@@ -62,6 +63,18 @@ void populateGpuLowerSubgroupReduceToShufflePatterns(
     RewritePatternSet &patterns, unsigned subgroupSize,
     unsigned shuffleBitwidth = 32, PatternBenefit benefit = 1);
 
+/// Collect a set of patterns to lower `gpu.subgroup_reduce` into `amdgpu.dpp`
+/// ops over scalar types. Assumes that the subgroup has
+/// `subgroupSize` lanes. Applicable only to AMD GPUs.
+void populateGpuLowerSubgroupReduceToDPPPatterns(RewritePatternSet &patterns,
+                                                 unsigned subgroupSize,
+                                                 amdgpu::Chipset chipset,
+                                                 PatternBenefit benefit = 1);
+
+void populateGpuLowerClusteredSubgroupReduceToDPPPatterns(
+    RewritePatternSet &patterns, unsigned subgroupSize, amdgpu::Chipset chipset,
+    PatternBenefit benefit = 1);
+
 /// Disjoint counterpart of `populateGpuLowerSubgroupReduceToShufflePatterns`
----------------
andfau-amd wrote:

I think it would be better if `populateGpuLowerSubgroupReduceToShufflePatterns` and `populateGpuLowerClusteredSubgroupReduceToShufflePatterns` are right next to eachother in this file, rather than having the DPP patterns sandwiched in the middle?

https://github.com/llvm/llvm-project/pull/133204