[Mlir-commits] [mlir] [mlir][AMDGPU] Implement gpu.subgroup_reduce with DPP intrinsics on AMD GPUs (PR #133204)

Wed Apr 16 13:53:13 PDT 2025

================
@@ -362,6 +366,164 @@ struct VectorSubgroupReduceToShuffles final
   unsigned shuffleBitwidth = 0;
   bool matchClustered = false;
 };
+
+std::optional<Value> createSubgroupDPPReduction(OpBuilder &b, Location loc,
+                                                Value input,
+                                                gpu::AllReduceOperation mode,
+                                                const ClusterInfo &ci,
+                                                amdgpu::Chipset chipset) {
+  Value result = input;
+  constexpr int allRows = 0xf;
+  constexpr int allBanks = 0xf;
+  const bool boundCtrl = true;
+  Value lane0 =
+      b.create<arith::ConstantOp>(loc, b.getI32Type(), b.getI32IntegerAttr(0));
+  Value lane32 =
----------------
krzysz00 wrote:

Let's only create these on the branches where they gut used to avoid polluting the IR with random constants

https://github.com/llvm/llvm-project/pull/133204