[flang-commits] [flang] [flang][do concurrent] Re-model `reduce` to match reductions are modelled in OpenMP and OpenACC (PR #145837)
Tom Eccles via flang-commits
flang-commits at lists.llvm.org
Thu Jun 26 07:58:35 PDT 2025
================
@@ -3662,6 +3662,103 @@ def fir_LocalitySpecifierOp : fir_Op<"local", [IsolatedFromAbove]> {
let hasRegionVerifier = 1;
}
+def fir_DeclareReductionOp : fir_Op<"declare_reduction", [IsolatedFromAbove,
+ Symbol]> {
+ let summary = "declares a reduction kind";
+ let description = [{
+ Note: this operation is adapted from omp::DeclareReductionOp. There is a lot
+ duplication at the moment. TODO Combined both ops into one. See:
+ https://discourse.llvm.org/t/dialect-for-data-locality-sharing-specifiers-clauses-in-openmp-openacc-and-do-concurrent/86108.
+
+ Declares an `do concurrent` reduction. This requires two mandatory and three
+ optional regions.
+
+ 1. The optional alloc region specifies how to allocate the thread-local
+ reduction value. This region should not contain control flow and all
+ IR should be suitable for inlining straight into an entry block. In
+ the common case this is expected to contain only allocas. It is
+ expected to `fir.yield` the allocated value on all control paths.
+ If allocation is conditional (e.g. only allocate if the mold is
+ allocated), this should be done in the initilizer region and this
+ region not included. The alloc region is not used for by-value
+ reductions (where allocation is implicit).
+ 2. The initializer region specifies how to initialize the thread-local
+ reduction value. This is usually the neutral element of the reduction.
+ For convenience, the region has an argument that contains the value
+ of the reduction accumulator at the start of the reduction. If an alloc
+ region is specified, there is a second block argument containing the
+ address of the allocated memory. The initializer region is expected to
+ `fir.yield` the new value on all control flow paths.
+ 3. The reduction region specifies how to combine two values into one, i.e.
+ the reduction operator. It accepts the two values as arguments and is
+ expected to `fir.yield` the combined value on all control flow paths.
+ 4. The atomic reduction region is optional and specifies how two values
+ can be combined atomically given local accumulator variables. It is
+ expected to store the combined value in the first accumulator variable.
+ 5. The cleanup region is optional and specifies how to clean up any memory
+ allocated by the initializer region. The region has an argument that
+ contains the value of the thread-local reduction accumulator. This will
+ be executed after the reduction has completed.
----------------
tblah wrote:
For the OpenMP dialect I had to write these to be very vague to keep it not flang dependent. In the FIR dialect we can explain this more clearly by referring to boxes, ALLOCATABLE, etc.
A rough example
1. Alloc region allocas the box and does nothing else
2. init region allocates memory for the box to point to, sets the metadata fields, initializes the memory with the neutral value for the reduction
The whole design is a bit convoluted by the presence of by-value reductions. These were originally kept in case there was a performance penalty to by-ref for trivial types, but we haven't seen that penalty show up for privatization so this adds needless complexity. But I understand this is already a big PR and you may not want to change more things at once.
https://github.com/llvm/llvm-project/pull/145837
More information about the flang-commits
mailing list