[Mlir-commits] [mlir] [mlir][AMDGPU] Implement gpu.subgroup_reduce with DPP intrinsics on AMD GPUs (PR #133204)
Krzysztof Drewniak
llvmlistbot at llvm.org
Wed Apr 16 13:53:12 PDT 2025
================
@@ -362,6 +366,164 @@ struct VectorSubgroupReduceToShuffles final
unsigned shuffleBitwidth = 0;
bool matchClustered = false;
};
+
+std::optional<Value> createSubgroupDPPReduction(OpBuilder &b, Location loc,
+ Value input,
+ gpu::AllReduceOperation mode,
+ const ClusterInfo &ci,
+ amdgpu::Chipset chipset) {
+ Value result = input;
+ constexpr int allRows = 0xf;
+ constexpr int allBanks = 0xf;
+ const bool boundCtrl = true;
+ Value lane0 =
+ b.create<arith::ConstantOp>(loc, b.getI32Type(), b.getI32IntegerAttr(0));
+ Value lane32 =
+ b.create<arith::ConstantOp>(loc, b.getI32Type(), b.getI32IntegerAttr(32));
+
+ auto dppReduceAcrossLanes = [&](int numLanes,
+ Value res) -> std::optional<Value> {
+ Value dppResult, laneVal;
+
+ switch (numLanes) {
+ case 2:
----------------
krzysz00 wrote:
I think the if >= 2, if >= 4, ... scheme we had before makes the fallthrough more obvious
https://github.com/llvm/llvm-project/pull/133204
More information about the Mlir-commits
mailing list