[flang-commits] [flang] [flang][do concurrent] Re-model `reduce` to match reductions are modelled in OpenMP and OpenACC (PR #145837)
via flang-commits
flang-commits at lists.llvm.org
Wed Jun 25 22:36:18 PDT 2025
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-flang-fir-hlfir
Author: Kareem Ergawy (ergawy)
<details>
<summary>Changes</summary>
This PR proposes re-modelling `reduce` specifiers to match OpenMP and OpenACC. In particular, this PR includes the following:
* A new `fir` op: `fir.delcare_reduction` which is identical to OpenMP's `omp.declare_reduction` op.
* Updating the `reduce` clause on `fir.do_concurrent.loop` to use the new op.
* Re-uses the `ReductionProcessor` component to emit reductions for `do concurrent` just like we do for OpenMP. To do this, the `ReductionProcessor` had to be refactored to be more generalized.
* Upates mapping `do concurrent` to `fir.loop ... unordered` nests using the new reduction model.
Unfortunately, this is a big PR that would be difficult to divide up in smaller parts because the bottom of the changes are the `fir` table-gen changes to `do concurrent`. However, doing these MLIR changes cascades to the other parts that have to be modified to not break things.
---
Patch is 79.29 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/145837.diff
19 Files Affected:
- (modified) flang/include/flang/Optimizer/Dialect/FIRAttr.td (+1-1)
- (modified) flang/include/flang/Optimizer/Dialect/FIROps.td (+128-8)
- (modified) flang/lib/Lower/Bridge.cpp (+61-24)
- (modified) flang/lib/Lower/OpenMP/ClauseProcessor.cpp (+64-15)
- (modified) flang/lib/Lower/OpenMP/ClauseProcessor.h (-1)
- (modified) flang/lib/Lower/OpenMP/Clauses.h (+1)
- (modified) flang/lib/Lower/OpenMP/OpenMP.cpp (-1)
- (modified) flang/lib/Lower/OpenMP/ReductionProcessor.cpp (+154-128)
- (modified) flang/lib/Lower/OpenMP/ReductionProcessor.h (+16-14)
- (modified) flang/lib/Lower/Support/Utils.cpp (+1-3)
- (modified) flang/lib/Optimizer/CodeGen/CodeGen.cpp (+25-23)
- (modified) flang/lib/Optimizer/Dialect/FIROps.cpp (+199-94)
- (modified) flang/lib/Optimizer/Transforms/SimplifyFIROperations.cpp (+9-1)
- (modified) flang/test/Fir/do_concurrent.fir (+61-2)
- (modified) flang/test/Fir/invalid.fir (+16-2)
- (added) flang/test/Lower/do_concurrent_reduce.f90 (+41)
- (modified) flang/test/Lower/loops.f90 (+1-1)
- (modified) flang/test/Lower/loops3.f90 (+1-1)
- (modified) flang/test/Transforms/do_concurrent-to-do_loop-unodered.fir (+1-1)
``````````diff
diff --git a/flang/include/flang/Optimizer/Dialect/FIRAttr.td b/flang/include/flang/Optimizer/Dialect/FIRAttr.td
index 2845080030b92..7bd96ac3ea631 100644
--- a/flang/include/flang/Optimizer/Dialect/FIRAttr.td
+++ b/flang/include/flang/Optimizer/Dialect/FIRAttr.td
@@ -112,7 +112,7 @@ def fir_ReduceOperationEnum : I32BitEnumAttr<"ReduceOperationEnum",
I32BitEnumAttrCaseBit<"MIN", 7, "min">,
I32BitEnumAttrCaseBit<"IAND", 8, "iand">,
I32BitEnumAttrCaseBit<"IOR", 9, "ior">,
- I32BitEnumAttrCaseBit<"EIOR", 10, "eior">
+ I32BitEnumAttrCaseBit<"IEOR", 10, "ieor">
]> {
let separator = ", ";
let cppNamespace = "::fir";
diff --git a/flang/include/flang/Optimizer/Dialect/FIROps.td b/flang/include/flang/Optimizer/Dialect/FIROps.td
index 8ac847dd7dd0a..e7eed633de710 100644
--- a/flang/include/flang/Optimizer/Dialect/FIROps.td
+++ b/flang/include/flang/Optimizer/Dialect/FIROps.td
@@ -3518,7 +3518,7 @@ def fir_BoxTotalElementsOp
def YieldOp : fir_Op<"yield",
[Pure, ReturnLike, Terminator,
- ParentOneOf<["LocalitySpecifierOp"]>]> {
+ ParentOneOf<["LocalitySpecifierOp", "DeclareReductionOp"]>]> {
let summary = "loop yield and termination operation";
let description = [{
"fir.yield" yields SSA values from a fir dialect op region and
@@ -3662,6 +3662,103 @@ def fir_LocalitySpecifierOp : fir_Op<"local", [IsolatedFromAbove]> {
let hasRegionVerifier = 1;
}
+def fir_DeclareReductionOp : fir_Op<"declare_reduction", [IsolatedFromAbove,
+ Symbol]> {
+ let summary = "declares a reduction kind";
+ let description = [{
+ Note: this operation is adapted from omp::DeclareReductionOp. There is a lot
+ duplication at the moment. TODO Combined both ops into one. See:
+ https://discourse.llvm.org/t/dialect-for-data-locality-sharing-specifiers-clauses-in-openmp-openacc-and-do-concurrent/86108.
+
+ Declares an `do concurrent` reduction. This requires two mandatory and three
+ optional regions.
+
+ 1. The optional alloc region specifies how to allocate the thread-local
+ reduction value. This region should not contain control flow and all
+ IR should be suitable for inlining straight into an entry block. In
+ the common case this is expected to contain only allocas. It is
+ expected to `fir.yield` the allocated value on all control paths.
+ If allocation is conditional (e.g. only allocate if the mold is
+ allocated), this should be done in the initilizer region and this
+ region not included. The alloc region is not used for by-value
+ reductions (where allocation is implicit).
+ 2. The initializer region specifies how to initialize the thread-local
+ reduction value. This is usually the neutral element of the reduction.
+ For convenience, the region has an argument that contains the value
+ of the reduction accumulator at the start of the reduction. If an alloc
+ region is specified, there is a second block argument containing the
+ address of the allocated memory. The initializer region is expected to
+ `fir.yield` the new value on all control flow paths.
+ 3. The reduction region specifies how to combine two values into one, i.e.
+ the reduction operator. It accepts the two values as arguments and is
+ expected to `fir.yield` the combined value on all control flow paths.
+ 4. The atomic reduction region is optional and specifies how two values
+ can be combined atomically given local accumulator variables. It is
+ expected to store the combined value in the first accumulator variable.
+ 5. The cleanup region is optional and specifies how to clean up any memory
+ allocated by the initializer region. The region has an argument that
+ contains the value of the thread-local reduction accumulator. This will
+ be executed after the reduction has completed.
+
+ Note that the MLIR type system does not allow for type-polymorphic
+ reductions. Separate reduction declarations should be created for different
+ element and accumulator types.
+
+ For initializer and reduction regions, the operand to `fir.yield` must
+ match the parent operation's results.
+ }];
+
+ let arguments = (ins SymbolNameAttr:$sym_name,
+ TypeAttr:$type);
+
+ let regions = (region MaxSizedRegion<1>:$allocRegion,
+ AnyRegion:$initializerRegion,
+ AnyRegion:$reductionRegion,
+ AnyRegion:$atomicReductionRegion,
+ AnyRegion:$cleanupRegion);
+
+ let assemblyFormat = "$sym_name `:` $type attr-dict-with-keyword "
+ "( `alloc` $allocRegion^ )? "
+ "`init` $initializerRegion "
+ "`combiner` $reductionRegion "
+ "( `atomic` $atomicReductionRegion^ )? "
+ "( `cleanup` $cleanupRegion^ )? ";
+
+ let extraClassDeclaration = [{
+ mlir::BlockArgument getAllocMoldArg() {
+ auto ®ion = getAllocRegion();
+ return region.empty() ? nullptr : region.getArgument(0);
+ }
+ mlir::BlockArgument getInitializerMoldArg() {
+ return getInitializerRegion().getArgument(0);
+ }
+ mlir::BlockArgument getInitializerAllocArg() {
+ return getAllocRegion().empty() ?
+ nullptr : getInitializerRegion().getArgument(1);
+ }
+ mlir::BlockArgument getReductionLhsArg() {
+ return getReductionRegion().getArgument(0);
+ }
+ mlir::BlockArgument getReductionRhsArg() {
+ return getReductionRegion().getArgument(1);
+ }
+ mlir::BlockArgument getAtomicReductionLhsArg() {
+ auto ®ion = getAtomicReductionRegion();
+ return region.empty() ? nullptr : region.getArgument(0);
+ }
+ mlir::BlockArgument getAtomicReductionRhsArg() {
+ auto ®ion = getAtomicReductionRegion();
+ return region.empty() ? nullptr : region.getArgument(1);
+ }
+ mlir::BlockArgument getCleanupAllocArg() {
+ auto ®ion = getCleanupRegion();
+ return region.empty() ? nullptr : region.getArgument(0);
+ }
+ }];
+
+ let hasRegionVerifier = 1;
+}
+
def fir_DoConcurrentOp : fir_Op<"do_concurrent",
[SingleBlock, AutomaticAllocationScope]> {
let summary = "do concurrent loop wrapper";
@@ -3700,6 +3797,25 @@ def fir_LocalSpecifier {
);
}
+def fir_ReduceSpecifier {
+ dag arguments = (ins
+ Variadic<AnyType>:$reduce_vars,
+ OptionalAttr<DenseBoolArrayAttr>:$reduce_byref,
+
+ // This introduces redundency in how reductions are modelled. In particular,
+ // a single reduction is represented by 2 attributes:
+ //
+ // 1. `$reduce_syms` which is a list of `DeclareReductionOp`s.
+ // 2. `$reduce_attrs` which is an array of `fir::ReduceAttr` values.
+ //
+ // The first makes it easier to map `do concurrent` to parallization models
+ // (e.g. OpenMP and OpenACC) while the second makes it easier to map it to
+ // nests of `fir.do_loop ... unodered` ops.
+ OptionalAttr<SymbolRefArrayAttr>:$reduce_syms,
+ OptionalAttr<ArrayAttr>:$reduce_attrs
+ );
+}
+
def fir_DoConcurrentLoopOp : fir_Op<"do_concurrent.loop",
[AttrSizedOperandSegments, DeclareOpInterfaceMethods<LoopLikeOpInterface,
["getLoopInductionVars"]>,
@@ -3709,7 +3825,7 @@ def fir_DoConcurrentLoopOp : fir_Op<"do_concurrent.loop",
let description = [{
An operation that models a Fortran `do concurrent` loop's header and block.
This is a single-region single-block terminator op that is expected to
- terminate the region of a `omp.do_concurrent` wrapper op.
+ terminate the region of a `fir.do_concurrent` wrapper op.
This op borrows from both `scf.parallel` and `fir.do_loop` ops. Similar to
`scf.parallel`, a loop nest takes 3 groups of SSA values as operands that
@@ -3747,8 +3863,6 @@ def fir_DoConcurrentLoopOp : fir_Op<"do_concurrent.loop",
- `lowerBound`: The group of SSA values for the nest's lower bounds.
- `upperBound`: The group of SSA values for the nest's upper bounds.
- `step`: The group of SSA values for the nest's steps.
- - `reduceOperands`: The reduction SSA values, if any.
- - `reduceAttrs`: Attributes to store reduction operations, if any.
- `loopAnnotation`: Loop metadata to be passed down the compiler pipeline to
LLVM.
}];
@@ -3757,12 +3871,12 @@ def fir_DoConcurrentLoopOp : fir_Op<"do_concurrent.loop",
Variadic<Index>:$lowerBound,
Variadic<Index>:$upperBound,
Variadic<Index>:$step,
- Variadic<AnyType>:$reduceOperands,
- OptionalAttr<ArrayAttr>:$reduceAttrs,
OptionalAttr<LoopAnnotationAttr>:$loopAnnotation
);
- let arguments = !con(opArgs, fir_LocalSpecifier.arguments);
+ let arguments = !con(opArgs,
+ fir_LocalSpecifier.arguments,
+ fir_ReduceSpecifier.arguments);
let regions = (region SizedRegion<1>:$region);
@@ -3783,12 +3897,18 @@ def fir_DoConcurrentLoopOp : fir_Op<"do_concurrent.loop",
getNumLocalOperands());
}
+ mlir::Block::BlockArgListType getRegionReduceArgs() {
+ return getBody()->getArguments().slice(getNumInductionVars()
+ + getNumLocalOperands(),
+ getNumReduceOperands());
+ }
+
/// Number of operands controlling the loop
unsigned getNumControlOperands() { return getLowerBound().size() * 3; }
// Get Number of reduction operands
unsigned getNumReduceOperands() {
- return getReduceOperands().size();
+ return getReduceVars().size();
}
mlir::Operation::operand_range getLocalOperands() {
diff --git a/flang/lib/Lower/Bridge.cpp b/flang/lib/Lower/Bridge.cpp
index 8506b9a984e58..7e8f09a293834 100644
--- a/flang/lib/Lower/Bridge.cpp
+++ b/flang/lib/Lower/Bridge.cpp
@@ -12,6 +12,7 @@
#include "flang/Lower/Bridge.h"
+#include "OpenMP/ReductionProcessor.h"
#include "flang/Lower/Allocatable.h"
#include "flang/Lower/CallInterface.h"
#include "flang/Lower/Coarray.h"
@@ -127,9 +128,8 @@ struct IncrementLoopInfo {
bool isConcurrent;
llvm::SmallVector<const Fortran::semantics::Symbol *> localSymList;
llvm::SmallVector<const Fortran::semantics::Symbol *> localInitSymList;
- llvm::SmallVector<
- std::pair<fir::ReduceOperationEnum, const Fortran::semantics::Symbol *>>
- reduceSymList;
+ llvm::SmallVector<const Fortran::semantics::Symbol *> reduceSymList;
+ llvm::SmallVector<fir::ReduceOperationEnum> reduceOperatorList;
llvm::SmallVector<const Fortran::semantics::Symbol *> sharedSymList;
mlir::Value loopVariable = nullptr;
@@ -1981,7 +1981,7 @@ class FirConverter : public Fortran::lower::AbstractConverter {
case Fortran::parser::ReductionOperator::Operator::Ior:
return fir::ReduceOperationEnum::IOR;
case Fortran::parser::ReductionOperator::Operator::Ieor:
- return fir::ReduceOperationEnum::EIOR;
+ return fir::ReduceOperationEnum::IEOR;
}
llvm_unreachable("illegal reduction operator");
}
@@ -2015,8 +2015,8 @@ class FirConverter : public Fortran::lower::AbstractConverter {
std::get<Fortran::parser::ReductionOperator>(reduceList->t));
for (const Fortran::parser::Name &x :
std::get<std::list<Fortran::parser::Name>>(reduceList->t)) {
- info.reduceSymList.push_back(
- std::make_pair(reduce_operation, x.symbol));
+ info.reduceSymList.push_back(x.symbol);
+ info.reduceOperatorList.push_back(reduce_operation);
}
}
}
@@ -2077,6 +2077,7 @@ class FirConverter : public Fortran::lower::AbstractConverter {
assign.u = Fortran::evaluate::Assignment::BoundsSpec{};
genAssignment(assign);
}
+
for (const Fortran::semantics::Symbol *sym : info.sharedSymList) {
const auto *hostDetails =
sym->detailsIf<Fortran::semantics::HostAssocDetails>();
@@ -2100,6 +2101,46 @@ class FirConverter : public Fortran::lower::AbstractConverter {
}
}
+ llvm::SmallVector<bool> reduceVarByRef;
+ llvm::SmallVector<mlir::Attribute> reductionDeclSymbols;
+ llvm::SmallVector<mlir::Attribute> nestReduceAttrs;
+
+ for (const auto &reduceOp : info.reduceOperatorList) {
+ nestReduceAttrs.push_back(
+ fir::ReduceAttr::get(builder->getContext(), reduceOp));
+ }
+
+ llvm::SmallVector<mlir::Value> reduceVars;
+ Fortran::lower::omp::ReductionProcessor rp;
+ rp.processReductionArguments<fir::DeclareReductionOp>(
+ toLocation(), *this, info.reduceOperatorList, reduceVars,
+ reduceVarByRef, reductionDeclSymbols, info.reduceSymList);
+
+ doConcurrentLoopOp.getReduceVarsMutable().assign(reduceVars);
+ doConcurrentLoopOp.setReduceSymsAttr(
+ reductionDeclSymbols.empty()
+ ? nullptr
+ : mlir::ArrayAttr::get(builder->getContext(),
+ reductionDeclSymbols));
+ doConcurrentLoopOp.setReduceAttrsAttr(
+ nestReduceAttrs.empty()
+ ? nullptr
+ : mlir::ArrayAttr::get(builder->getContext(), nestReduceAttrs));
+ doConcurrentLoopOp.setReduceByrefAttr(
+ reduceVarByRef.empty() ? nullptr
+ : mlir::DenseBoolArrayAttr::get(
+ builder->getContext(), reduceVarByRef));
+
+ for (auto [sym, reduceVar] :
+ llvm::zip_equal(info.reduceSymList, reduceVars)) {
+ auto arg = doConcurrentLoopOp.getRegion().begin()->addArgument(
+ reduceVar.getType(), doConcurrentLoopOp.getLoc());
+ bindSymbol(*sym, hlfir::translateToExtendedValue(
+ reduceVar.getLoc(), *builder, hlfir::Entity{arg},
+ /*contiguousHint=*/true)
+ .first);
+ }
+
// Note that allocatable, types with ultimate components, and type
// requiring finalization are forbidden in LOCAL/LOCAL_INIT (F2023 C1130),
// so no clean-up needs to be generated for these entities.
@@ -2191,6 +2232,12 @@ class FirConverter : public Fortran::lower::AbstractConverter {
}
}
+ // Introduce a `do concurrent` scope to bind symbols corresponding to local,
+ // local_init, and reduce region arguments.
+ if (!incrementLoopNestInfo.empty() &&
+ incrementLoopNestInfo.back().isConcurrent)
+ localSymbols.pushScope();
+
// Increment loop begin code. (Infinite/while code was already generated.)
if (!infiniteLoop && !whileCondition)
genFIRIncrementLoopBegin(incrementLoopNestInfo, doStmtEval.dirs);
@@ -2214,6 +2261,10 @@ class FirConverter : public Fortran::lower::AbstractConverter {
// This call may generate a branch in some contexts.
genFIR(endDoEval, unstructuredContext);
+
+ if (!incrementLoopNestInfo.empty() &&
+ incrementLoopNestInfo.back().isConcurrent)
+ localSymbols.popScope();
}
/// Generate FIR to evaluate loop control values (lower, upper and step).
@@ -2396,19 +2447,6 @@ class FirConverter : public Fortran::lower::AbstractConverter {
info.stepVariable = builder->createTemporary(loc, stepValue.getType());
builder->create<fir::StoreOp>(loc, stepValue, info.stepVariable);
}
-
- if (genDoConcurrent && nestReduceOperands.empty()) {
- // Create DO CONCURRENT reduce operands and attributes
- for (const auto &reduceSym : info.reduceSymList) {
- const fir::ReduceOperationEnum reduceOperation = reduceSym.first;
- const Fortran::semantics::Symbol *sym = reduceSym.second;
- fir::ExtendedValue exv = getSymbolExtendedValue(*sym, nullptr);
- nestReduceOperands.push_back(fir::getBase(exv));
- auto reduceAttr =
- fir::ReduceAttr::get(builder->getContext(), reduceOperation);
- nestReduceAttrs.push_back(reduceAttr);
- }
- }
}
for (auto [info, lowerValue, upperValue, stepValue] :
@@ -2506,11 +2544,10 @@ class FirConverter : public Fortran::lower::AbstractConverter {
builder->setInsertionPointToEnd(loopWrapperOp.getBody());
auto loopOp = builder->create<fir::DoConcurrentLoopOp>(
- loc, nestLBs, nestUBs, nestSts, nestReduceOperands,
- nestReduceAttrs.empty()
- ? nullptr
- : mlir::ArrayAttr::get(builder->getContext(), nestReduceAttrs),
- nullptr, /*local_vars=*/std::nullopt, /*local_syms=*/nullptr);
+ loc, nestLBs, nestUBs, nestSts, nullptr, /*local_vars=*/std::nullopt,
+ /*local_syms=*/nullptr, /*reduce_vars=*/std::nullopt,
+ /*reduce_byref=*/nullptr, /*reduce_syms=*/nullptr,
+ /*reduce_attrs=*/nullptr);
llvm::SmallVector<mlir::Type> loopBlockArgTypes(
incrementLoopNestInfo.size(), builder->getIndexType());
diff --git a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
index 7bea427099a28..5aebfc901e8ac 100644
--- a/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/ClauseProcessor.cpp
@@ -12,6 +12,7 @@
#include "ClauseProcessor.h"
#include "Clauses.h"
+#include "ReductionProcessor.h"
#include "Utils.h"
#include "flang/Lower/ConvertExprToHLFIR.h"
@@ -25,6 +26,21 @@ namespace Fortran {
namespace lower {
namespace omp {
+using ReductionModifier =
+ Fortran::lower::omp::clause::Reduction::ReductionModifier;
+
+mlir::omp::ReductionModifier translateReductionModifier(ReductionModifier mod) {
+ switch (mod) {
+ case ReductionModifier::Default:
+ return mlir::omp::ReductionModifier::defaultmod;
+ case ReductionModifier::Inscan:
+ return mlir::omp::ReductionModifier::inscan;
+ case ReductionModifier::Task:
+ return mlir::omp::ReductionModifier::task;
+ }
+ return mlir::omp::ReductionModifier::defaultmod;
+}
+
/// Check for unsupported map operand types.
static void checkMapType(mlir::Location location, mlir::Type type) {
if (auto refType = mlir::dyn_cast<fir::ReferenceType>(type))
@@ -1076,6 +1092,18 @@ bool ClauseProcessor::processIf(
});
return found;
}
+
+template <typename T>
+void collectReductionSyms(
+ const T &reduction,
+ llvm::SmallVectorImpl<const semantics::Symbol *> &reductionSyms) {
+ const auto &objectList{std::get<omp::ObjectList>(reduction.t)};
+ for (const Object &object : objectList) {
+ const semantics::Symbol *symbol = object.sym();
+ reductionSyms.push_back(symbol);
+ }
+}
+
bool ClauseProcessor::processInReduction(
mlir::Location currentLocation, mlir::omp::InReductionClauseOps &result,
llvm::SmallVectorImpl<const semantics::Symbol *> &outReductionSyms) const {
@@ -1085,10 +1113,14 @@ bool ClauseProcessor::processInReduction(
llvm::SmallVector<bool> inReduceVarByRef;
llvm::SmallVector<mlir::Attribute> inReductionDeclSymbols;
llvm::SmallVector<const semantics::Symbol *> inReductionSyms;
+ collectReductionSyms(clause, inReductionSyms);
+
ReductionProcessor rp;
- rp.processReductionArguments<omp::clause::InReduction>(
- currentLocation, converter, clause, inReductionVars,
- inReduceVarByRef, inReductionDeclSymbols, inReductionSyms);
+ rp.processReductionArguments<mlir::omp::DeclareReductionOp>(
+ currentLocation, converter,
+ std::get<typename omp::clause::ReductionOperatorList>(clause.t),
+ inReductionVars, inReduceVarByRef, inReductionDeclSymbols,
+ inReductionSyms);
// Copy local lists into the output.
llvm::copy(inReductionVars, std::back_inserter(result.inReductionVars));
@@ -1416,10 +1448,23 @@ bool ClauseProcessor::processReduction(
llvm::SmallVector<bool> reduceVarByRef;
llvm::SmallVector<mlir::Attribute> reductionDeclSymbols;
llvm::SmallVector<const semantics::Symbol *> reductionSyms;
+ collectReductionSyms(clause, reductionSyms);
+
+ auto mod = std::get<std::optional<ReductionModifier>>(clause.t);
+ if (mod.has_value()) {
+ if (mod.value() == ReductionModifier::Task)
+ TODO(currentLocation, "Reduction modifier `task` is not supported");
+ else
+ res...
[truncated]
``````````
</details>
https://github.com/llvm/llvm-project/pull/145837
More information about the flang-commits
mailing list