[Mlir-commits] [mlir] [SCFToGPU] Convert scf.parallel+scf.reduce to gpu.all_reduce (PR #122782)
Adam Siemieniuk
llvmlistbot at llvm.org
Tue Jan 14 03:26:47 PST 2025
================
@@ -648,6 +653,30 @@ ParallelToGpuLaunchLowering::matchAndRewrite(ParallelOp parallelOp,
rewriter.setInsertionPointAfter(parent);
leftNestingScope = true;
seenSideeffects = false;
+ } else if (auto reduceOp = dyn_cast<scf::ReduceOp>(op)) {
+ // Convert scf.reduction op
+ auto parentLoop = op->getParentOfType<ParallelOp>();
+ if (!parentLoop || op->getOperands().size() != 1) {
+ return failure();
+ }
+ auto operand = op->getOperands().front();
+ auto newValue = cloningMap.lookupOrNull(operand);
+ if (!newValue) {
+ return failure();
+ }
+ // Replace by gpu.all_reduce.
+ auto gpuRedOp = rewriter.create<gpu::AllReduceOp>(loc, newValue);
+ cloningMap.map(parentLoop->getResult(0), gpuRedOp.getResult());
+ // Copy region.
+ rewriter.inlineRegionBefore(reduceOp.getRegion(0), gpuRedOp.getRegion(),
+ gpuRedOp.getRegion().begin());
----------------
adam-smnk wrote:
`scf.reduce` is not `IsolatedFromAbove` unlike `gpu.all_reduce`.
Extra validation of the region is required in cases like this:
```mlir
scf.reduce(%1 : f32) {
^bb0(%arg3: f32, %arg4: f32):
%2 = arith.addf %arg3, %arg4 : f32
%r = arith.addf %2, %externalVal : f32
scf.reduce.return %r : f32
}
```
https://github.com/llvm/llvm-project/pull/122782
More information about the Mlir-commits
mailing list