[flang-commits] [flang] [mlir] [Flang][OpenMP] Support conditional lastprivate on host (PR #200086)
Sunil Shrestha via flang-commits
flang-commits at lists.llvm.org
Wed May 27 16:58:12 PDT 2026
https://github.com/sshrestha-aa created https://github.com/llvm/llvm-project/pull/200086
This patch lowers lastprivate(conditional:) on the host by leveraging the existing user-defined reduction (UDR) infrastructure. A packed struct is created where each thread tracks, for every lastprivate variable, both the candidate value and the canonical iteration index of its last update. The reduction combiner selects the value from the sequentially later iteration (for do loops) or from the lexically later section (for sections).
The implementation locates the enclosing omp.parallel and places the shared struct before it so that all threads in the team reduce into the same storage. For orphaned worksharing constructs — where no enclosing parallel is visible at compile time — a module-scope global of the struct type is used instead. This is correct for a single level of parallelism, but concurrent nested teams executing the same orphaned construct would race on the shared global. This limitation mirrors the current Clang behavior, which also uses a single global and does not support nested parallelism for conditional lastprivate.
Assisted-by : Claude Opus 4.6
>From 3dab401ba9b1cc1aec9ef4978befaa434c5e8718 Mon Sep 17 00:00:00 2001
From: Sunil Shrestha <sunil.shrestha at hpe.com>
Date: Fri, 20 Mar 2026 11:10:15 -0500
Subject: [PATCH] [Flang][OpenMP] Support conditional lastprivate on host
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This patch lowers lastprivate(conditional:) on the host by leveraging the
existing user-defined reduction (UDR) infrastructure. A packed struct is created
where each thread tracks, for every lastprivate variable, both the candidate
value and the canonical iteration index of its last update. The reduction
combiner selects the value from the sequentially later iteration (for do loops)
or from the lexically later section (for sections).
The implementation locates the enclosing omp.parallel and places the shared
struct before it so that all threads in the team reduce into the same storage.
For orphaned worksharing constructs — where no enclosing parallel is visible at
compile time — a module-scope global of the struct type is used instead. This is
correct for a single level of parallelism, but concurrent nested teams executing
the same orphaned construct would race on the shared global. This limitation
mirrors the current Clang behavior, which also uses a single global and does not
support nested parallelism for conditional lastprivate.
Assisted-by : Claude Opus 4.6
---
.../lib/Lower/OpenMP/DataSharingProcessor.cpp | 15 +-
flang/lib/Lower/OpenMP/DataSharingProcessor.h | 6 +
flang/lib/Lower/OpenMP/OpenMP.cpp | 797 +++++++++++++++++-
.../lib/Lower/Support/ReductionProcessor.cpp | 7 +
.../OpenMP/Todo/lastprivate-conditional.f90 | 12 -
...astprivate-conditional-sections-nowait.f90 | 37 +
...tprivate-conditional-sections-orphaned.f90 | 76 ++
.../lastprivate-conditional-sections.f90 | 80 ++
...stprivate-conditional-wsloop-nested-if.f90 | 36 +
.../lastprivate-conditional-wsloop-nowait.f90 | 38 +
...astprivate-conditional-wsloop-orphaned.f90 | 73 ++
.../OpenMP/lastprivate-conditional-wsloop.f90 | 81 ++
.../OpenMP/OpenMPToLLVMIRTranslation.cpp | 3 +-
13 files changed, 1238 insertions(+), 23 deletions(-)
delete mode 100644 flang/test/Lower/OpenMP/Todo/lastprivate-conditional.f90
create mode 100644 flang/test/Lower/OpenMP/lastprivate-conditional-sections-nowait.f90
create mode 100644 flang/test/Lower/OpenMP/lastprivate-conditional-sections-orphaned.f90
create mode 100644 flang/test/Lower/OpenMP/lastprivate-conditional-sections.f90
create mode 100644 flang/test/Lower/OpenMP/lastprivate-conditional-wsloop-nested-if.f90
create mode 100644 flang/test/Lower/OpenMP/lastprivate-conditional-wsloop-nowait.f90
create mode 100644 flang/test/Lower/OpenMP/lastprivate-conditional-wsloop-orphaned.f90
create mode 100644 flang/test/Lower/OpenMP/lastprivate-conditional-wsloop.f90
diff --git a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
index e392497d30de7..da2b0582e22a3 100644
--- a/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
+++ b/flang/lib/Lower/OpenMP/DataSharingProcessor.cpp
@@ -278,10 +278,19 @@ void DataSharingProcessor::collectSymbolsForPrivatization() {
explicitlyPrivatizedSymbols);
} else if (const auto &lastPrivateClause =
std::get_if<omp::clause::Lastprivate>(&clause.u)) {
- lastprivateModifierNotSupported(*lastPrivateClause,
- converter.getCurrentLocation());
+ auto &modifier = std::get<
+ std::optional<omp::clause::Lastprivate::LastprivateModifier>>(
+ lastPrivateClause->t);
+
const ObjectList &objects = std::get<ObjectList>(lastPrivateClause->t);
- collectOmpObjectListSymbol(objects, explicitlyPrivatizedSymbols);
+ if (modifier &&
+ *modifier ==
+ omp::clause::Lastprivate::LastprivateModifier::Conditional) {
+ // conditional lastprivate path
+ collectOmpObjectListSymbol(objects, conditionalLastPrivatizedSymbols);
+ } else {
+ collectOmpObjectListSymbol(objects, explicitlyPrivatizedSymbols);
+ }
}
}
diff --git a/flang/lib/Lower/OpenMP/DataSharingProcessor.h b/flang/lib/Lower/OpenMP/DataSharingProcessor.h
index 5dd564d4bbb61..f889adce0f049 100644
--- a/flang/lib/Lower/OpenMP/DataSharingProcessor.h
+++ b/flang/lib/Lower/OpenMP/DataSharingProcessor.h
@@ -97,6 +97,7 @@ class DataSharingProcessor {
llvm::SetVector<const semantics::Symbol *> explicitlyPrivatizedSymbols;
llvm::SetVector<const semantics::Symbol *> defaultSymbols;
llvm::SetVector<const semantics::Symbol *> allPrivatizedSymbols;
+ llvm::SetVector<const semantics::Symbol *> conditionalLastPrivatizedSymbols;
lower::AbstractConverter &converter;
semantics::SemanticsContext &semaCtx;
@@ -193,6 +194,11 @@ class DataSharingProcessor {
void privatizeSymbol(const semantics::Symbol *symToPrivatize,
mlir::omp::PrivateClauseOps *clauseOps,
std::optional<llvm::omp::Directive> dir = std::nullopt);
+
+ const llvm::SetVector<const semantics::Symbol *> &
+ getConditionalLastprivateSymbols() const {
+ return conditionalLastPrivatizedSymbols;
+ }
};
} // namespace omp
diff --git a/flang/lib/Lower/OpenMP/OpenMP.cpp b/flang/lib/Lower/OpenMP/OpenMP.cpp
index 266b06f353675..4f7abaf9d7137 100644
--- a/flang/lib/Lower/OpenMP/OpenMP.cpp
+++ b/flang/lib/Lower/OpenMP/OpenMP.cpp
@@ -52,12 +52,45 @@
#include "mlir/Dialect/OpenMP/OpenMPDialect.h"
#include "mlir/Support/StateStack.h"
#include "mlir/Transforms/RegionUtils.h"
+#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/STLExtras.h"
+#include <atomic>
using namespace Fortran::lower::omp;
using namespace Fortran::common::openmp;
using namespace Fortran::utils::openmp;
+// Forward declarations
+static fir::RecordType buildConditionalLpType(
+ Fortran::lower::AbstractConverter &converter,
+ const llvm::SetVector<const Fortran::semantics::Symbol *> &condLpSyms,
+ mlir::Location loc);
+
+static mlir::omp::DeclareReductionOp buildConditionalLastPrivateReduction(
+ Fortran::lower::AbstractConverter &converter, fir::RecordType lpCondType,
+ const llvm::SetVector<const Fortran::semantics::Symbol *> &condLpSyms);
+
+static void rewriteConditionalLpAssignsInWsLoops(
+ Fortran::lower::AbstractConverter &converter, mlir::omp::WsloopOp wsloopOp,
+ fir::RecordType lpType,
+ const llvm::MapVector<mlir::Value, std::string> &condLpOrigAddrs,
+ mlir::Location loc);
+
+static void rewriteConditionalLpAssignsInSections(
+ Fortran::lower::AbstractConverter &converter,
+ mlir::omp::SectionsOp sectionsOp, fir::RecordType lpType,
+ const llvm::MapVector<mlir::Value, std::string> &condLpOrigAddrs,
+ mlir::Location loc);
+
+static void initConditionalLpStruct(fir::FirOpBuilder &builder,
+ mlir::Location loc,
+ fir::RecordType lpCondType,
+ mlir::Value structRef);
+
+static mlir::Value
+getOrCreateConditionalLpGlobal(Fortran::lower::AbstractConverter &converter,
+ mlir::Location loc, fir::RecordType lpType);
+
//===----------------------------------------------------------------------===//
// Code generation helper functions
//===----------------------------------------------------------------------===//
@@ -415,6 +448,16 @@ static void bindEntryBlockArgs(lower::AbstractConverter &converter,
llvm::SmallVector<const semantics::Symbol *> processedSyms;
for (const Object &object : objects) {
const semantics::Symbol *sym = object.sym();
+ if (!sym) {
+ // Null sentinel: this entry corresponds to a compiler-synthesized
+ // reduction (e.g. the conditional lastprivate struct) that has no
+ // Fortran symbol. We must keep a placeholder so that processedSyms
+ // stays in lock-step with `vars` and `args` — the later
+ // llvm::zip_equal(processedSyms, vars, args) asserts equal lengths.
+ // The matching block argument is silently skipped below.
+ processedSyms.push_back(nullptr);
+ continue;
+ }
if (const auto *commonDet =
sym->detailsIf<semantics::CommonBlockDetails>()) {
llvm::transform(commonDet->objects(), std::back_inserter(processedSyms),
@@ -424,7 +467,9 @@ static void bindEntryBlockArgs(lower::AbstractConverter &converter,
}
}
- for (auto [sym, var, arg] : llvm::zip_equal(processedSyms, vars, args))
+ for (auto [sym, var, arg] : llvm::zip_equal(processedSyms, vars, args)) {
+ if (!sym)
+ continue; // Skip synthetic reduction entries.
converter.bindSymbol(
*sym,
hlfir::translateToExtendedValue(
@@ -432,6 +477,7 @@ static void bindEntryBlockArgs(lower::AbstractConverter &converter,
/*contiguousHint=*/
evaluate::IsSimplyContiguous(*sym, converter.getFoldingContext()))
.first);
+ }
};
// Process in clause name alphabetical order to match block arguments order.
@@ -2645,6 +2691,26 @@ genScanOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
converter.getCurrentLocation(), clauseOps);
}
+// Forward declaration.
+static void
+emitNestedParallelGuardForCondLp(lower::AbstractConverter &converter,
+ mlir::Location loc);
+
+/// Walk up the parent-op chain from the current insertion point and return
+/// the nearest enclosing \c omp::ParallelOp, or \c nullptr if none exists
+/// (i.e. the construct is orphaned). The walk handles intervening ops such
+/// as \c fir::IfOp that may appear between the worksharing construct and its
+/// enclosing parallel region.
+static mlir::omp::ParallelOp
+findEnclosingParallelOp(fir::FirOpBuilder &builder) {
+ for (auto *op = builder.getInsertionBlock()->getParentOp(); op;
+ op = op->getParentOp()) {
+ if (auto parallelOp = mlir::dyn_cast<mlir::omp::ParallelOp>(op))
+ return parallelOp;
+ }
+ return {};
+}
+
static mlir::omp::SectionsOp
genSectionsOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
semantics::SemanticsContext &semaCtx,
@@ -2671,13 +2737,58 @@ genSectionsOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
/*useDelayedPrivatization=*/false, symTable);
dsp.processStep1();
+ // Detect conditional lastprivate symbols for sections.
+ auto &condLpSyms = dsp.getConditionalLastprivateSymbols();
+ fir::RecordType lpType;
+ mlir::Value lpAlloca;
+ if (!condLpSyms.empty()) {
+ lpType = buildConditionalLpType(converter, condLpSyms, loc);
+ mlir::omp::DeclareReductionOp declRedOp =
+ buildConditionalLastPrivateReduction(converter, lpType, condLpSyms);
+
+ // Create the struct alloca outside the parent parallel (if any).
+ // In the orphaned case (no enclosing ParallelOp), use a
+ // module-scope global so that all threads share one reduction target.
+ auto enclosingParallel = findEnclosingParallelOp(builder);
+ bool isOrphaned = !enclosingParallel;
+
+ // Guard against nested parallelism in the orphaned case.
+ // Emit this BEFORE touching the global to avoid racing on it.
+ if (isOrphaned)
+ emitNestedParallelGuardForCondLp(converter, loc);
+
+ if (!isOrphaned) {
+ mlir::OpBuilder::InsertionGuard guard(builder);
+ builder.setInsertionPoint(enclosingParallel);
+ lpAlloca = builder.createTemporary(loc, lpType);
+ initConditionalLpStruct(builder, loc, lpType, lpAlloca);
+ } else {
+ lpAlloca = getOrCreateConditionalLpGlobal(converter, loc, lpType);
+ // The global is shared across all threads. Use omp.single (which
+ // has an implicit barrier at exit) so that exactly one thread
+ // initialises and all threads wait before entering the construct.
+ mlir::omp::SingleOperands initSingleOps;
+ auto singleOp = mlir::omp::SingleOp::create(builder, loc, initSingleOps);
+ mlir::Block *singleBlock = builder.createBlock(&singleOp.getRegion());
+ builder.setInsertionPointToStart(singleBlock);
+ initConditionalLpStruct(builder, loc, lpType, lpAlloca);
+ mlir::omp::TerminatorOp::create(builder, loc);
+ builder.setInsertionPointAfter(singleOp);
+ }
+
+ clauseOps.reductionVars.push_back(lpAlloca);
+ clauseOps.reductionByref.push_back(true);
+ clauseOps.reductionSyms.push_back(
+ mlir::SymbolRefAttr::get(builder.getContext(), declRedOp.getSymName()));
+ reductionObjects.push_back(Object{{nullptr, std::nullopt}});
+ }
+
List<Clause> nonDsaClauses;
List<const clause::Lastprivate *> lastprivates;
for (const Clause &clause : item->clauses) {
if (clause.id == llvm::omp::Clause::OMPC_lastprivate) {
auto &lastp = std::get<clause::Lastprivate>(clause.u);
- lastprivateModifierNotSupported(lastp, converter.getCurrentLocation());
lastprivates.push_back(&lastp);
} else {
switch (clause.id) {
@@ -2732,6 +2843,26 @@ genSectionsOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
sectionQueue, sectionQueue.begin());
}
+ // Capture original addresses and rewrite conditional LP assigns in sections.
+ llvm::MapVector<mlir::Value, std::string> condLpOrigAddrs;
+ if (!condLpSyms.empty()) {
+ for (const auto *sym : condLpSyms) {
+ mlir::Value addr = converter.getSymbolAddress(*sym);
+ if (addr)
+ condLpOrigAddrs[addr] = sym->name().ToString();
+ }
+ rewriteConditionalLpAssignsInSections(converter, sectionsOp, lpType,
+ condLpOrigAddrs, loc);
+ }
+
+ // Collect conditional LP symbol names so we can skip them in the normal
+ // lastprivate copy-back (they are handled by the reduction path).
+ llvm::SmallDenseSet<const semantics::Symbol *> condLpSymSet(
+ condLpSyms.begin(), condLpSyms.end());
+
+ // Track whether any non-conditional lastprivate copy-backs were emitted.
+ bool hasNonCondLastprivate = false;
+
if (!lastprivates.empty()) {
mlir::Region §ionsBody = sectionsOp.getRegion();
assert(sectionsBody.hasOneBlock());
@@ -2750,6 +2881,10 @@ genSectionsOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
const auto &objList = std::get<ObjectList>(lastp->t);
for (const Object &object : objList) {
semantics::Symbol *sym = object.sym();
+ // Skip conditional LP symbols — handled by the reduction path.
+ if (condLpSymSet.count(sym))
+ continue;
+ hasNonCondLastprivate = true;
if (const auto *common =
sym->detailsIf<semantics::CommonBlockDetails>()) {
for (const auto &obj : common->objects())
@@ -2764,12 +2899,45 @@ genSectionsOp(lower::AbstractConverter &converter, lower::SymMap &symTable,
// Perform DataSharingProcessor's step2 out of SECTIONS
builder.setInsertionPointAfter(sectionsOp.getOperation());
dsp.processStep2(sectionsOp, false);
- // Emit implicit barrier to synchronize threads and avoid data
- // races on post-update of lastprivate variables when `nowait`
- // clause is present.
- if (clauseOps.nowait && !lastprivates.empty())
+ // Emit barrier when nowait is present and there are lastprivate copy-backs
+ // (either non-conditional or conditional). The barrier ensures all threads
+ // have completed their work before lastprivate values are read/copied.
+ //
+ // NOTE: The LLVM OpenMP runtime currently imposes an implicit barrier
+ // inside __kmpc_reduce for tree reductions. If the runtime were modified
+ // to release losing threads early when nowait is specified, we could use
+ // the return value from the tree reduction (case 1 = winner) to let the
+ // winner thread perform the copy-back without a separate barrier.
+ if (clauseOps.nowait && (hasNonCondLastprivate || !condLpSyms.empty()))
mlir::omp::BarrierOp::create(builder, loc);
+ // Copy-back: copy winning values from the shared reduction struct to the
+ // original variables. When nowait is absent, the worksharing construct's
+ // implicit end-barrier guarantees all reductions are combined before we
+ // reach this point. When nowait is present, the barrier above ensures
+ // the reduction is fully finalized before reading the struct.
+ // Wrapped in omp.single so exactly one thread performs the stores.
+ if (!condLpSyms.empty()) {
+ mlir::omp::SingleOperands singleClauseOps;
+ auto singleOp = mlir::omp::SingleOp::create(builder, loc, singleClauseOps);
+ mlir::Block *singleBlock = builder.createBlock(&singleOp.getRegion());
+ builder.setInsertionPointToStart(singleBlock);
+
+ for (auto &[origAddr, symName] : condLpOrigAddrs) {
+ unsigned valFieldIdx = lpType.getFieldIndex(symName);
+ mlir::Type valType = lpType.getType(valFieldIdx);
+
+ fir::IntOrValue valFIdx =
+ mlir::IntegerAttr::get(builder.getI32Type(), valFieldIdx);
+ mlir::Value fieldAddr = fir::CoordinateOp::create(
+ builder, loc, builder.getRefType(valType), lpAlloca,
+ llvm::SmallVector<fir::IntOrValue, 1>{valFIdx});
+ mlir::Value val = fir::LoadOp::create(builder, loc, fieldAddr);
+ fir::StoreOp::create(builder, loc, val, origAddr);
+ }
+ mlir::omp::TerminatorOp::create(builder, loc);
+ }
+
return sectionsOp;
}
@@ -3361,6 +3529,154 @@ static mlir::omp::DistributeOp genStandaloneDistribute(
return distributeOp;
}
+/// Zero-initialize the value fields and set index fields to -1 in a
+/// conditional-lastprivate reduction struct.
+///
+/// The struct groups all value fields first, then all index fields:
+/// {val_0, val_1, ..., idx_0, idx_1, ...}
+/// so fields [0, numVars) are value fields and [numVars, 2*numVars) are
+/// the corresponding iteration index fields.
+///
+/// The -1 sentinel on index fields ensures the combiner's "sequentially
+/// last" comparison treats the slot as "no iteration has written yet"
+/// (any real canonical loop IV >= 0 beats -1).
+static void initConditionalLpStruct(fir::FirOpBuilder &builder,
+ mlir::Location loc,
+ fir::RecordType lpCondType,
+ mlir::Value structRef) {
+ auto fields = lpCondType.getTypeList();
+ unsigned numVars = fields.size() / 2;
+ for (unsigned i = 0, e = fields.size(); i < e; ++i) {
+ mlir::Type fieldTy = fields[i].second;
+ fir::IntOrValue idx = mlir::IntegerAttr::get(builder.getI32Type(), i);
+ mlir::Value fieldAddr = fir::CoordinateOp::create(
+ builder, loc, builder.getRefType(fieldTy), structRef,
+ llvm::SmallVector<fir::IntOrValue, 1>{idx});
+ mlir::Value initVal;
+ if (i >= numVars) // index field (second half)
+ initVal = builder.createIntegerConstant(loc, fieldTy, -1);
+ else // value field (first half)
+ initVal = fir::factory::createZeroValue(builder, loc, fieldTy);
+ fir::StoreOp::create(builder, loc, initVal, fieldAddr);
+ }
+}
+
+/// Emit a runtime guard for orphaned conditional-lastprivate worksharing
+/// constructs. The module-scope global used for the reduction struct is
+/// shared across all teams, so concurrent nested teams would race on it.
+/// Clang has a similar limitation for conditional lastprivate due to its
+/// use of a shared global variable.
+///
+/// Emits: if (omp_get_level() > 1) ERROR STOP "<message>"
+static void
+emitNestedParallelGuardForCondLp(lower::AbstractConverter &converter,
+ mlir::Location loc) {
+ fir::FirOpBuilder &builder = converter.getFirOpBuilder();
+ mlir::MLIRContext *ctx = builder.getContext();
+ mlir::Type i32Ty = builder.getI32Type();
+
+ // Declare omp_get_level_() -> i32 if not already present.
+ auto funcTy = mlir::FunctionType::get(ctx, {}, {i32Ty});
+ if (!builder.getNamedFunction("omp_get_level_"))
+ builder.createFunction(loc, "omp_get_level_", funcTy);
+
+ mlir::Value level =
+ fir::CallOp::create(builder, loc,
+ builder.getNamedFunction("omp_get_level_"),
+ mlir::ValueRange{})
+ .getResult(0);
+ mlir::Value one = builder.createIntegerConstant(loc, i32Ty, 1);
+ mlir::Value isNested = mlir::arith::CmpIOp::create(
+ builder, loc, mlir::arith::CmpIPredicate::sgt, level, one);
+
+ auto ifOp = fir::IfOp::create(builder, loc, /*resultTypes=*/{}, isNested,
+ /*withElse=*/false);
+ builder.setInsertionPoint(ifOp.getThenRegion().front().getTerminator());
+
+ // Build a global string constant for the error message.
+ llvm::StringRef msg =
+ "orphaned worksharing construct with lastprivate(conditional:) "
+ "is not supported in nested parallelism";
+ std::string globalName = "_lp_cond_nested_msg";
+ size_t msgLen = msg.size();
+ auto charTy = fir::CharacterType::get(ctx, 1, msgLen);
+ if (!builder.getNamedGlobal(globalName)) {
+ fir::GlobalOp global = builder.createGlobal(
+ loc, charTy, globalName, builder.createInternalLinkage(),
+ /*value=*/mlir::Attribute{}, /*isConst=*/true);
+ mlir::Region ®ion = global.getRegion();
+ mlir::Block *block = builder.createBlock(®ion);
+ builder.setInsertionPointToStart(block);
+ mlir::Value val = fir::StringLitOp::create(builder, loc, charTy, msg);
+ fir::HasValueOp::create(builder, loc, val);
+ builder.setInsertionPoint(ifOp.getThenRegion().front().getTerminator());
+ }
+
+ // Declare _FortranAStopStatementText if not already present.
+ mlir::Type i64Ty = builder.getI64Type();
+ mlir::Type i1Ty = builder.getI1Type();
+ mlir::Type ptrTy = builder.getRefType(builder.getIntegerType(8));
+ auto stopTy = mlir::FunctionType::get(ctx, {ptrTy, i64Ty, i1Ty, i1Ty}, {});
+ if (!builder.getNamedFunction("_FortranAStopStatementText"))
+ builder.createFunction(loc, "_FortranAStopStatementText", stopTy);
+
+ mlir::Value msgAddr =
+ fir::AddrOfOp::create(builder, loc, builder.getRefType(charTy),
+ builder.getSymbolRefAttr(globalName));
+ mlir::Value msgPtr = builder.createConvert(loc, ptrTy, msgAddr);
+ mlir::Value len = builder.createIntegerConstant(loc, i64Ty, msgLen);
+ mlir::Value trueVal = builder.createIntegerConstant(loc, i1Ty, 1);
+ mlir::Value falseVal = builder.createIntegerConstant(loc, i1Ty, 0);
+ fir::CallOp::create(builder, loc,
+ builder.getNamedFunction("_FortranAStopStatementText"),
+ mlir::ValueRange{msgPtr, len, trueVal, falseVal});
+
+ builder.setInsertionPointAfter(ifOp);
+}
+
+/// Return the address of a module-scope global for the conditional-lastprivate
+/// reduction struct. This is used in the *orphaned* worksharing case (sections
+/// or wsloop inside a subroutine called from a parallel region) where the
+/// parent op is a FuncOp, not a ParallelOp.
+///
+/// Because there is no enclosing omp.parallel in the same function, a stack
+/// alloca would give every thread its own private copy and the cross-thread
+/// reduction combine would never merge results. A global provides a single
+/// shared address that all threads in the team can reduce into — the same
+/// semantics a dummy argument provides for ordinary user REDUCTION variables.
+///
+/// Nested parallelism (concurrent teams executing the same orphaned construct)
+/// would race on this global; a runtime guard emitted by
+/// emitNestedParallelGuardForCondLp() aborts in that case.
+static mlir::Value
+getOrCreateConditionalLpGlobal(lower::AbstractConverter &converter,
+ mlir::Location loc, fir::RecordType lpType) {
+ fir::FirOpBuilder &builder = converter.getFirOpBuilder();
+
+ // Derive a unique global name from the RecordType name.
+ // Type name is "_lp_cond_t.lN.M", global becomes "_lp_cond_global.lN.M".
+ llvm::StringRef typeName = lpType.getName();
+ assert(typeName.starts_with("_lp_cond_t") &&
+ "unexpected conditional LP type name prefix");
+ std::string globalName =
+ "_lp_cond_global" +
+ typeName.substr(llvm::StringRef("_lp_cond_t").size()).str();
+
+ // Create the global if it does not already exist.
+ // The global is re-initialized by initConditionalLpStruct before each
+ // worksharing construct invocation (to reset values from prior calls),
+ // so a simple zero-init suffices here.
+ fir::GlobalOp global = builder.getNamedGlobal(globalName);
+ if (!global) {
+ builder.createGlobal(loc, lpType, globalName,
+ builder.createInternalLinkage());
+ global = builder.getNamedGlobal(globalName);
+ }
+ assert(global && "global should have been created");
+ return fir::AddrOfOp::create(builder, loc, global.resultType(),
+ global.getSymbol());
+}
+
static mlir::omp::WsloopOp genStandaloneDo(
lower::AbstractConverter &converter, lower::SymMap &symTable,
lower::StatementContext &stmtCtx, semantics::SemanticsContext &semaCtx,
@@ -3376,6 +3692,61 @@ static mlir::omp::WsloopOp genStandaloneDo(
enableDelayedPrivatization, symTable);
dsp.processStep1(&wsloopClauseOps);
+ // Conditional lastprivate: build struct type, declare_reduction, and
+ // inject a synthetic reduction variable into the wsloop.
+ auto &condLpSyms = dsp.getConditionalLastprivateSymbols();
+ fir::RecordType lpType; // hoisted for post-loop rewrite pass
+ mlir::Value lpAlloca; // hoisted for post-reduction copy-back
+ if (!condLpSyms.empty()) {
+ fir::FirOpBuilder &builder = converter.getFirOpBuilder();
+ lpType = buildConditionalLpType(converter, condLpSyms, loc);
+ mlir::omp::DeclareReductionOp declRedOp =
+ buildConditionalLastPrivateReduction(converter, lpType, condLpSyms);
+
+ // Create the struct alloca OUTSIDE the parent omp.parallel (if any),
+ // so the reduction result persists after the parallel region ends.
+ // In the orphaned case (no enclosing ParallelOp), use a
+ // module-scope global so that all threads share one reduction target.
+ auto enclosingParallel = findEnclosingParallelOp(builder);
+ bool isOrphaned = !enclosingParallel;
+
+ // Guard against nested parallelism in the orphaned case.
+ // Emit this BEFORE touching the global to avoid racing on it.
+ if (isOrphaned)
+ emitNestedParallelGuardForCondLp(converter, loc);
+
+ if (!isOrphaned) {
+ mlir::OpBuilder::InsertionGuard guard(builder);
+ builder.setInsertionPoint(enclosingParallel);
+ lpAlloca = builder.createTemporary(loc, lpType);
+ // Index fields are initialised to -1 so the combiner's "sequentially
+ // last" comparison treats them as "no iteration has written yet"
+ // (any real canonical loop IV >= 0 beats -1).
+ initConditionalLpStruct(builder, loc, lpType, lpAlloca);
+ } else {
+ lpAlloca = getOrCreateConditionalLpGlobal(converter, loc, lpType);
+ // The global is shared across all threads. Use omp.single (which
+ // has an implicit barrier at exit) so that exactly one thread
+ // initialises and all threads wait before entering the construct.
+ mlir::omp::SingleOperands initSingleOps;
+ auto singleOp = mlir::omp::SingleOp::create(builder, loc, initSingleOps);
+ mlir::Block *singleBlock = builder.createBlock(&singleOp.getRegion());
+ builder.setInsertionPointToStart(singleBlock);
+ initConditionalLpStruct(builder, loc, lpType, lpAlloca);
+ mlir::omp::TerminatorOp::create(builder, loc);
+ builder.setInsertionPointAfter(singleOp);
+ }
+
+ // Append to wsloop clause operands.
+ wsloopClauseOps.reductionVars.push_back(lpAlloca);
+ wsloopClauseOps.reductionByref.push_back(true);
+ wsloopClauseOps.reductionSyms.push_back(
+ mlir::SymbolRefAttr::get(builder.getContext(), declRedOp.getSymName()));
+
+ // Use a null-symbol Object as a sentinel — bindPrivateLike will skip it.
+ wsloopReductionObjects.push_back(Object{{nullptr, std::nullopt}});
+ }
+
mlir::omp::LoopNestOperands loopNestClauseOps;
llvm::SmallVector<const semantics::Symbol *> iv;
genLoopNestClauses(converter, semaCtx, eval, item->clauses, loc,
@@ -3389,9 +3760,63 @@ static mlir::omp::WsloopOp genStandaloneDo(
auto wsloopOp = genWrapperOp<mlir::omp::WsloopOp>(
converter, loc, wsloopClauseOps, wsloopArgs);
+ // Capture original addresses of conditional LP symbols
+ // before the loop body is lowered. These SSA values are the addresses
+ // visible at this point (dummy arguments or host variables); they must
+ // not be remapped by privatization since conditional lastprivate
+ // variables are shared (they appear in a reduction clause, not private).
+ llvm::MapVector<mlir::Value, std::string> condLpOrigAddrs;
+ for (const auto *sym : condLpSyms) {
+ mlir::Value addr = converter.getSymbolAddress(*sym);
+ if (addr)
+ condLpOrigAddrs[addr] = sym->name().ToString();
+ }
+
genLoopNestOp(converter, symTable, semaCtx, eval, loc, queue, item,
loopNestClauseOps, iv, {{wsloopOp, wsloopArgs}},
llvm::omp::Directive::OMPD_do, dsp);
+
+ // Rewrite assignments to conditional LP symbols
+ // to store into struct fields + record iteration index.
+ if (!condLpSyms.empty()) {
+ rewriteConditionalLpAssignsInWsLoops(converter, wsloopOp, lpType,
+ condLpOrigAddrs, loc);
+ }
+
+ // Post-reduction copy-back. When nowait is absent, the wsloop's implicit
+ // end-barrier guarantees all reductions are combined. When nowait is
+ // present, an explicit barrier is needed before reading the struct.
+ // Wrapped in omp.single so exactly one thread performs the stores.
+ if (!condLpSyms.empty()) {
+ fir::FirOpBuilder &builder = converter.getFirOpBuilder();
+ mlir::OpBuilder::InsertionGuard guard(builder);
+
+ // Insert right after the wsloop, still inside the parallel body.
+ builder.setInsertionPointAfter(wsloopOp);
+
+ if (wsloopClauseOps.nowait)
+ mlir::omp::BarrierOp::create(builder, loc);
+
+ mlir::omp::SingleOperands singleClauseOps;
+ auto singleOp = mlir::omp::SingleOp::create(builder, loc, singleClauseOps);
+ mlir::Block *singleBlock = builder.createBlock(&singleOp.getRegion());
+ builder.setInsertionPointToStart(singleBlock);
+
+ for (auto &[origAddr, symName] : condLpOrigAddrs) {
+ unsigned valFieldIdx = lpType.getFieldIndex(symName);
+ mlir::Type valType = lpType.getType(valFieldIdx);
+
+ fir::IntOrValue valFIdx =
+ mlir::IntegerAttr::get(builder.getI32Type(), valFieldIdx);
+ mlir::Value fieldAddr = fir::CoordinateOp::create(
+ builder, loc, builder.getRefType(valType), lpAlloca,
+ llvm::SmallVector<fir::IntOrValue, 1>{valFIdx});
+ mlir::Value val = fir::LoadOp::create(builder, loc, fieldAddr);
+ fir::StoreOp::create(builder, loc, val, origAddr);
+ }
+ mlir::omp::TerminatorOp::create(builder, loc);
+ }
+
return wsloopOp;
}
@@ -4297,6 +4722,366 @@ getReductionType(lower::AbstractConverter &converter,
return reductionType;
}
+/// Compute a flattened canonical (0-based, always ascending) iteration number
+/// from all loop IVs. For a single loop, this is simply (IV - LB) / step.
+/// For collapsed loops with dimensions d0..dN, the flattened index is:
+/// c0 * (N1*N2*...*Nk) + c1 * (N2*...*Nk) + ... + ck
+/// where ci = (IVi - LBi) / stepi and Ni = (UBi - LBi) / stepi + 1.
+/// This yields a unique monotonic index regardless of loop direction,
+/// which is essential for the combiner's `sgt` comparison to correctly
+/// identify the sequentially last iteration.
+static mlir::Value
+computeFlattenedCanonicalIV(fir::FirOpBuilder &builder, mlir::Location loc,
+ mlir::omp::LoopNestOp loopNestOp) {
+ mlir::Region ®ion = loopNestOp.getRegion();
+ auto lbs = loopNestOp.getLoopLowerBounds();
+ auto ubs = loopNestOp.getLoopUpperBounds();
+ auto steps = loopNestOp.getLoopSteps();
+ unsigned numDims = lbs.size();
+
+ // Use i64 for the flattened index to avoid overflow.
+ mlir::Type i64Ty = builder.getI64Type();
+
+ // Compute canonical IV and trip count for each dimension.
+ llvm::SmallVector<mlir::Value> canonIVs(numDims);
+ llvm::SmallVector<mlir::Value> tripCounts(numDims);
+ for (unsigned d = 0; d < numDims; ++d) {
+ mlir::Value iv = region.front().getArgument(d);
+ mlir::Type ivType = iv.getType();
+ mlir::Value lb = lbs[d];
+ mlir::Value ub = ubs[d];
+ mlir::Value step = steps[d];
+ if (lb.getType() != ivType)
+ lb = fir::ConvertOp::create(builder, loc, ivType, lb);
+ if (ub.getType() != ivType)
+ ub = fir::ConvertOp::create(builder, loc, ivType, ub);
+ if (step.getType() != ivType)
+ step = fir::ConvertOp::create(builder, loc, ivType, step);
+
+ mlir::Value diff = mlir::arith::SubIOp::create(builder, loc, iv, lb);
+ mlir::Value ci = mlir::arith::DivSIOp::create(builder, loc, diff, step);
+ canonIVs[d] = fir::ConvertOp::create(builder, loc, i64Ty, ci);
+
+ // Trip count: (UB - LB) / step + 1 (loop bounds are inclusive).
+ mlir::Value range = mlir::arith::SubIOp::create(builder, loc, ub, lb);
+ mlir::Value trips = mlir::arith::DivSIOp::create(builder, loc, range, step);
+ mlir::Value one = builder.createIntegerConstant(loc, ivType, 1);
+ trips = mlir::arith::AddIOp::create(builder, loc, trips, one);
+ tripCounts[d] = fir::ConvertOp::create(builder, loc, i64Ty, trips);
+ }
+
+ // Flatten: result = c0*N1*N2*...*Nk + c1*N2*...*Nk + ... + ck
+ mlir::Value flatIdx = canonIVs[0];
+ for (unsigned d = 1; d < numDims; ++d) {
+ flatIdx = mlir::arith::MulIOp::create(builder, loc, flatIdx, tripCounts[d]);
+ flatIdx = mlir::arith::AddIOp::create(builder, loc, flatIdx, canonIVs[d]);
+ }
+ return flatIdx;
+}
+
+/// Common helper: within a given region, replace uses of original variable
+/// addresses with struct value fields and inject index stores after each
+/// assignment.
+/// \p genIndexVal is called at the point of each assignment to produce the
+/// i64 index value to store (e.g. canonical loop IV or constant section index).
+static void rewriteCondLpInRegion(
+ fir::FirOpBuilder &builder, mlir::Location loc, fir::RecordType lpType,
+ mlir::Value structArg, mlir::Region ®ion,
+ const llvm::MapVector<mlir::Value, std::string> &condLpOrigAddrs,
+ llvm::function_ref<mlir::Value(fir::FirOpBuilder &, mlir::Location)>
+ genIndexVal) {
+ // Step 1: Replace uses of original addresses with struct value field addrs.
+ llvm::MapVector<mlir::Value, std::string> valAddrToSymName;
+ {
+ mlir::OpBuilder::InsertionGuard guard(builder);
+ builder.setInsertionPointToStart(®ion.front());
+
+ for (auto &[origAddrConst, symName] : condLpOrigAddrs) {
+ mlir::Value origAddr = origAddrConst;
+ unsigned valFieldIdx = lpType.getFieldIndex(symName);
+ mlir::Type valType = lpType.getType(valFieldIdx);
+
+ fir::IntOrValue valFIdx =
+ mlir::IntegerAttr::get(builder.getI32Type(), valFieldIdx);
+ mlir::Value valAddr = fir::CoordinateOp::create(
+ builder, loc, builder.getRefType(valType), structArg,
+ llvm::SmallVector<fir::IntOrValue, 1>{valFIdx});
+
+ origAddr.replaceUsesWithIf(valAddr, [&](mlir::OpOperand &use) {
+ return region.isAncestor(use.getOwner()->getParentRegion());
+ });
+
+ valAddrToSymName[valAddr] = symName;
+ }
+ }
+
+ // Step 2: Walk ops that write to struct value fields (hlfir.assign and
+ // fir.store) and inject a store of the index value after each write.
+ // Both forms must be tracked because lowering may produce either depending
+ // on the variable type and context.
+ llvm::SmallVector<mlir::Operation *> toAnnotate;
+ region.walk([&](hlfir::AssignOp assignOp) {
+ if (valAddrToSymName.count(assignOp.getLhs()))
+ toAnnotate.push_back(assignOp);
+ });
+ region.walk([&](fir::StoreOp storeOp) {
+ if (valAddrToSymName.count(storeOp.getMemref()))
+ toAnnotate.push_back(storeOp);
+ });
+
+ // Compute the index value once at the region entry so that it dominates
+ // all write sites (which may be inside nested fir.if blocks).
+ mlir::Value indexVal;
+ if (!toAnnotate.empty()) {
+ mlir::OpBuilder::InsertionGuard guard(builder);
+ builder.setInsertionPointToStart(®ion.front());
+ indexVal = genIndexVal(builder, loc);
+ if (indexVal.getType() != builder.getI64Type())
+ indexVal =
+ fir::ConvertOp::create(builder, loc, builder.getI64Type(), indexVal);
+ }
+
+ for (mlir::Operation *writeOp : toAnnotate) {
+ mlir::Value target;
+ if (auto assignOp = mlir::dyn_cast<hlfir::AssignOp>(writeOp))
+ target = assignOp.getLhs();
+ else
+ target = mlir::cast<fir::StoreOp>(writeOp).getMemref();
+
+ std::string symName = valAddrToSymName.lookup(target);
+ unsigned idxFieldIdx = lpType.getFieldIndex("k" + symName);
+ mlir::Type idxType = lpType.getType(idxFieldIdx);
+
+ mlir::OpBuilder::InsertionGuard guard(builder);
+ builder.setInsertionPointAfter(writeOp);
+
+ fir::IntOrValue idxFIdx =
+ mlir::IntegerAttr::get(builder.getI32Type(), idxFieldIdx);
+ mlir::Value idxAddr = fir::CoordinateOp::create(
+ builder, loc, builder.getRefType(idxType), structArg,
+ llvm::SmallVector<fir::IntOrValue, 1>{idxFIdx});
+
+ fir::StoreOp::create(builder, loc, indexVal, idxAddr);
+ }
+}
+
+/// Rewrite references to conditional lastprivate symbols inside the
+/// loop body to use the reduction struct's value fields instead. Then inject
+/// a store of the canonical iteration number into the corresponding index
+/// field after each assignment.
+static void rewriteConditionalLpAssignsInWsLoops(
+ lower::AbstractConverter &converter, mlir::omp::WsloopOp wsloopOp,
+ fir::RecordType lpType,
+ const llvm::MapVector<mlir::Value, std::string> &condLpOrigAddrs,
+ mlir::Location loc) {
+ fir::FirOpBuilder &builder = converter.getFirOpBuilder();
+
+ // Get the struct reduction block arg (last reduction, since we appended it).
+ auto blockArgIface =
+ mlir::cast<mlir::omp::BlockArgOpenMPOpInterface>(*wsloopOp);
+ mlir::Value structArg = blockArgIface.getReductionBlockArgs().back();
+
+ // Get the loop nest op and compute a flattened canonical IV across all
+ // dimensions (handles both single and collapsed loops).
+ auto loopNestOp =
+ mlir::cast<mlir::omp::LoopNestOp>(wsloopOp.getWrappedLoop());
+
+ rewriteCondLpInRegion(
+ builder, loc, lpType, structArg, loopNestOp.getRegion(), condLpOrigAddrs,
+ [&](fir::FirOpBuilder &b, mlir::Location l) -> mlir::Value {
+ return computeFlattenedCanonicalIV(b, l, loopNestOp);
+ });
+}
+
+/// Rewrite conditional lastprivate assignments inside an omp.sections region.
+/// For each omp.section, replace uses of the original variable address with
+/// the struct's value field, and store the compile-time section index into
+/// the index field after each assignment.
+static void rewriteConditionalLpAssignsInSections(
+ lower::AbstractConverter &converter, mlir::omp::SectionsOp sectionsOp,
+ fir::RecordType lpType,
+ const llvm::MapVector<mlir::Value, std::string> &condLpOrigAddrs,
+ mlir::Location loc) {
+ fir::FirOpBuilder &builder = converter.getFirOpBuilder();
+ mlir::Region §ionsRegion = sectionsOp.getRegion();
+
+ unsigned sectionIdx = 0;
+ for (mlir::Operation &op : sectionsRegion.front()) {
+ auto sectionOp = mlir::dyn_cast<mlir::omp::SectionOp>(op);
+ if (!sectionOp)
+ continue;
+
+ auto sectionArgIface =
+ mlir::cast<mlir::omp::BlockArgOpenMPOpInterface>(*sectionOp);
+ mlir::Value sectionStructArg =
+ sectionArgIface.getReductionBlockArgs().back();
+
+ unsigned idx = sectionIdx++;
+ rewriteCondLpInRegion(
+ builder, loc, lpType, sectionStructArg, sectionOp.getRegion(),
+ condLpOrigAddrs,
+ [idx](fir::FirOpBuilder &b, mlir::Location l) -> mlir::Value {
+ return b.createIntegerConstant(l, b.getI64Type(), idx);
+ });
+ }
+}
+
+static mlir::omp::DeclareReductionOp buildConditionalLastPrivateReduction(
+ lower::AbstractConverter &converter, fir::RecordType lpCondType,
+ const llvm::SetVector<const semantics::Symbol *> &condLpSyms) {
+
+ // Init callback: initialize all fields of the lp_t struct.
+ // Value fields get 0; index fields get -1.
+ //
+ // Returns a null mlir::Value to signal that initialization has already
+ // been performed directly on ompPriv. The reduction infrastructure
+ // (populateByRefInitAndCleanupRegions → initAndCleanupUnboxedDerivedType)
+ // checks for a non-null scalarInitValue before emitting a store, so
+ // returning null here safely skips the redundant store.
+ auto genInitValueCB = [lpCondType](fir::FirOpBuilder &builder,
+ mlir::Location loc, mlir::Type type,
+ mlir::Value ompOrig,
+ mlir::Value ompPriv) -> mlir::Value {
+ initConditionalLpStruct(builder, loc, lpCondType, ompPriv);
+ return mlir::Value{};
+ };
+
+ // Combiner callback: for each (value, index) pair, pick the later iteration.
+ // Fields are arranged as: {val_0, ..., val_{N-1}, idx_0, ..., idx_{N-1}}
+ // where idx field names are "k" + val field name.
+ // If rhs.idx > lhs.idx, copy rhs value and index into lhs.
+ auto genCombinerCB = [lpCondType](fir::FirOpBuilder &builder,
+ mlir::Location loc, mlir::Type type,
+ mlir::Value lhs, mlir::Value rhs,
+ bool isByRef) {
+ fir::RecordType lpType = lpCondType; // non-const copy for getFieldIndex
+ auto fields = lpType.getTypeList();
+ unsigned numVars = fields.size() / 2;
+
+ // Walk the first half (value fields). Index field name = "k" +
+ // value name.
+ for (unsigned i = 0; i < numVars; ++i) {
+ auto [valName, valType] = fields[i];
+ std::string idxName = "k" + valName;
+ unsigned valIdx = lpType.getFieldIndex(valName);
+ unsigned idxIdx = lpType.getFieldIndex(idxName);
+ mlir::Type idxType = lpType.getType(idxIdx);
+
+ // Get addresses of LHS and RHS index fields
+ fir::IntOrValue idxFieldIdx =
+ mlir::IntegerAttr::get(builder.getI32Type(), idxIdx);
+ mlir::Value lhsIdxAddr = fir::CoordinateOp::create(
+ builder, loc, builder.getRefType(idxType), lhs,
+ llvm::SmallVector<fir::IntOrValue, 1>{idxFieldIdx});
+ mlir::Value rhsIdxAddr = fir::CoordinateOp::create(
+ builder, loc, builder.getRefType(idxType), rhs,
+ llvm::SmallVector<fir::IntOrValue, 1>{idxFieldIdx});
+
+ mlir::Value lhsIdx = fir::LoadOp::create(builder, loc, lhsIdxAddr);
+ mlir::Value rhsIdx = fir::LoadOp::create(builder, loc, rhsIdxAddr);
+
+ // Compare: rhs index > lhs index (signed, iteration indices)
+ mlir::Value cmp = mlir::arith::CmpIOp::create(
+ builder, loc, mlir::arith::CmpIPredicate::sgt, rhsIdx, lhsIdx);
+
+ // If RHS comes from a later iteration, copy its value and index to LHS
+ auto ifOp = fir::IfOp::create(builder, loc, cmp, /*else*/ false);
+ builder.setInsertionPointToStart(&ifOp.getThenRegion().front());
+
+ // Copy value field: rhs.val_s → lhs.val_s
+ fir::IntOrValue valFieldIdx =
+ mlir::IntegerAttr::get(builder.getI32Type(), valIdx);
+ mlir::Value rhsValAddr = fir::CoordinateOp::create(
+ builder, loc, builder.getRefType(valType), rhs,
+ llvm::SmallVector<fir::IntOrValue, 1>{valFieldIdx});
+ mlir::Value lhsValAddr = fir::CoordinateOp::create(
+ builder, loc, builder.getRefType(valType), lhs,
+ llvm::SmallVector<fir::IntOrValue, 1>{valFieldIdx});
+ mlir::Value rhsVal = fir::LoadOp::create(builder, loc, rhsValAddr);
+ fir::StoreOp::create(builder, loc, rhsVal, lhsValAddr);
+
+ // Copy index field: rhs.idx_s → lhs.idx_s
+ fir::StoreOp::create(builder, loc, rhsIdx, lhsIdxAddr);
+
+ builder.setInsertionPointAfter(ifOp);
+ }
+
+ // By-ref: yield the accumulator (LHS)
+ mlir::omp::YieldOp::create(builder, loc, lhs);
+ };
+
+ // RecordType is always by-ref
+ bool isByRef = true;
+ mlir::Location loc = converter.getCurrentLocation();
+ mlir::Type redType = fir::ReferenceType::get(lpCondType);
+ std::string reductionName = ReductionProcessor::getReductionName(
+ "lp_cond", converter.getKindMap(), redType, isByRef);
+
+ return ReductionProcessor::createDeclareReductionHelper<
+ mlir::omp::DeclareReductionOp>(converter, reductionName, redType, loc,
+ isByRef, genCombinerCB, genInitValueCB);
+}
+
+/// Build a FIR RecordType for conditional lastprivate reduction.
+/// For symbols {x, y}, creates:
+/// !fir.type<_lp_cond_t.lN.M{x:T_x, y:T_y, kx:i64, ky:i64}>
+/// where N is the source line number and M is a monotonic counter.
+static fir::RecordType buildConditionalLpType(
+ lower::AbstractConverter &converter,
+ const llvm::SetVector<const semantics::Symbol *> &condLpSyms,
+ mlir::Location loc) {
+ fir::FirOpBuilder &builder = converter.getFirOpBuilder();
+ mlir::MLIRContext *context = builder.getContext();
+
+ // Derive a unique suffix from the source location and a monotonic counter.
+ // The line number makes names traceable to source; the counter prevents
+ // collisions when INCLUDE files place directives on identical line numbers.
+ // Use atomic for thread-safety in case flang ever lowers in parallel.
+ static std::atomic<unsigned> counter{0};
+ unsigned line = 0;
+ if (auto fileLoc = mlir::dyn_cast<mlir::FileLineColLoc>(loc))
+ line = fileLoc.getLine();
+ else if (auto fusedLoc = mlir::dyn_cast<mlir::FusedLoc>(loc)) {
+ for (mlir::Location sub : fusedLoc.getLocations()) {
+ if (auto fileSub = mlir::dyn_cast<mlir::FileLineColLoc>(sub)) {
+ line = fileSub.getLine();
+ break;
+ }
+ }
+ }
+ std::string typeName =
+ "_lp_cond_t.l" + std::to_string(line) + "." + std::to_string(counter++);
+
+ auto lpCondType = fir::RecordType::get(context, typeName);
+
+ // if it exists return, else build
+ if (lpCondType.isFinalized())
+ return lpCondType;
+
+ // Build field list: first all value fields, then all index fields.
+ // Grouping values before indices (rather than interleaving value/index
+ // pairs) can reduce padding holes when value types differ from i64.
+ llvm::SmallVector<std::pair<std::string, mlir::Type>> fields;
+
+ // Value fields first.
+ for (const auto *sym : condLpSyms) {
+ std::string symName = sym->name().ToString();
+ mlir::Type symType = converter.genType(*sym);
+ fields.push_back({symName, symType});
+ }
+
+ // Then index fields (i64).
+ for (const auto *sym : condLpSyms) {
+ std::string indexName = "k" + sym->name().ToString();
+ fields.push_back({indexName, builder.getI64Type()});
+ }
+
+ // Finalize the type with the field list
+ lpCondType.finalize({}, fields);
+
+ return lpCondType;
+}
+
// Represent the reduction combiner as a clause, return reference to it.
// If there is a "combiner" clause already present, do nothing. Otherwise
// manufacture a combiner clause from the combiner expression on the reduction
diff --git a/flang/lib/Lower/Support/ReductionProcessor.cpp b/flang/lib/Lower/Support/ReductionProcessor.cpp
index b3a27736d1616..45459ca5c73b7 100644
--- a/flang/lib/Lower/Support/ReductionProcessor.cpp
+++ b/flang/lib/Lower/Support/ReductionProcessor.cpp
@@ -71,6 +71,13 @@ ReductionProcessor::createDeclareReduction<fir::DeclareReductionOp>(
const ReductionIdentifier redId, mlir::Type type, mlir::Location loc,
bool isByRef);
+template mlir::omp::DeclareReductionOp
+ReductionProcessor::createDeclareReductionHelper<mlir::omp::DeclareReductionOp>(
+ AbstractConverter &converter, llvm::StringRef reductionOpName,
+ mlir::Type type, mlir::Location loc, bool isByRef,
+ GenCombinerCBTy genCombinerCB, GenInitValueCBTy genInitValueCB,
+ const semantics::Symbol *sym);
+
ReductionProcessor::ReductionIdentifier ReductionProcessor::getReductionType(
const omp::clause::ProcedureDesignator &pd) {
auto redType = llvm::StringSwitch<std::optional<ReductionIdentifier>>(
diff --git a/flang/test/Lower/OpenMP/Todo/lastprivate-conditional.f90 b/flang/test/Lower/OpenMP/Todo/lastprivate-conditional.f90
deleted file mode 100644
index 2b96093da3a8f..0000000000000
--- a/flang/test/Lower/OpenMP/Todo/lastprivate-conditional.f90
+++ /dev/null
@@ -1,12 +0,0 @@
-! RUN: %not_todo_cmd bbc -emit-fir -fopenmp -fopenmp-version=50 -o - %s 2>&1 | FileCheck %s
-! RUN: %not_todo_cmd %flang_fc1 -emit-fir -fopenmp -fopenmp-version=50 -o - %s 2>&1 | FileCheck %s
-
-! CHECK: not yet implemented: lastprivate clause with CONDITIONAL modifier
-subroutine foo()
- integer :: x, i
- x = 1
- !$omp parallel do lastprivate(conditional: x)
- do i = 1, 100
- x = x + 1
- enddo
-end
diff --git a/flang/test/Lower/OpenMP/lastprivate-conditional-sections-nowait.f90 b/flang/test/Lower/OpenMP/lastprivate-conditional-sections-nowait.f90
new file mode 100644
index 0000000000000..3e19a82f74b9b
--- /dev/null
+++ b/flang/test/Lower/OpenMP/lastprivate-conditional-sections-nowait.f90
@@ -0,0 +1,37 @@
+! Test lowering of `lastprivate(conditional:)` on an omp sections construct
+! with the nowait clause. The lowering must emit an explicit barrier
+! before the copy-back to ensure the reduction is fully finalized.
+
+! RUN: bbc -fopenmp -fopenmp-version=50 -emit-hlfir %s -o - | FileCheck %s
+! RUN: %flang_fc1 -fopenmp -fopenmp-version=50 -emit-hlfir %s -o - | FileCheck %s
+
+subroutine test_conditional_lp_sections_nowait(x)
+ implicit none
+ integer, intent(inout) :: x
+
+ !$omp parallel
+ !$omp sections lastprivate(conditional: x) nowait
+ !$omp section
+ x = 10
+ !$omp section
+ x = 20
+ !$omp end sections nowait
+ !$omp end parallel
+end subroutine
+
+! CHECK-LABEL: func.func @_QPtest_conditional_lp_sections_nowait
+! CHECK: %[[STRUCT:.*]] = fir.alloca !fir.type<_lp_cond_t.{{l[0-9]+\.[0-9]+}}{x:i32,kx:i64}>
+
+! CHECK: omp.parallel {
+! CHECK: omp.sections nowait
+! CHECK-SAME: reduction(byref @lp_cond_byref_rec__lp_cond_t
+! CHECK-SAME: %[[STRUCT]]
+
+! CHECK: omp.barrier
+
+! CHECK: omp.single {
+! CHECK: fir.coordinate_of %[[STRUCT]], x
+! CHECK: fir.load
+! CHECK: fir.store
+! CHECK: omp.terminator
+! CHECK: }
diff --git a/flang/test/Lower/OpenMP/lastprivate-conditional-sections-orphaned.f90 b/flang/test/Lower/OpenMP/lastprivate-conditional-sections-orphaned.f90
new file mode 100644
index 0000000000000..f12ba762d39be
--- /dev/null
+++ b/flang/test/Lower/OpenMP/lastprivate-conditional-sections-orphaned.f90
@@ -0,0 +1,76 @@
+! Test lowering of `lastprivate(conditional:)` on an ORPHANED omp sections
+! construct (sections inside a subroutine called from a parallel region).
+!
+! Because the subroutine has no enclosing omp.parallel, a stack alloca would
+! give each thread a private copy and the cross-thread reduction would never
+! merge. The lowering must therefore place the reduction struct in a
+! module-scope fir.global internal rather than a fir.alloca.
+
+! RUN: %flang_fc1 -fopenmp -fopenmp-version=50 -emit-hlfir %s -o - | FileCheck %s
+
+subroutine test_orphaned_sections(n)
+ implicit none
+ integer, intent(inout) :: n
+
+ !$omp sections lastprivate(conditional: n)
+ !$omp section
+ n = 10
+ !$omp section
+ n = 20
+ !$omp end sections
+end subroutine
+
+! -- declare_reduction for the struct type ------------------------------------
+! CHECK-LABEL: omp.declare_reduction @lp_cond_byref_rec__lp_cond_t
+! CHECK-SAME: : !fir.ref<!fir.type<_lp_cond_t.{{l[0-9]+\.[0-9]+}}{n:i32,kn:i64}>>
+
+! -- Function body: address_of global (no fir.alloca for the struct) ----------
+! CHECK-LABEL: func.func @_QPtest_orphaned_sections
+
+! -- Runtime guard: abort if called from nested parallelism ------------------
+! Guard is emitted BEFORE init to avoid racing on the global.
+! CHECK: %[[LEVEL:.*]] = fir.call @omp_get_level_() {{.*}} : () -> i32
+! CHECK: %[[ONE:.*]] = arith.constant 1 : i32
+! CHECK: %[[NESTED:.*]] = arith.cmpi sgt, %[[LEVEL]], %[[ONE]] : i32
+! CHECK: fir.if %[[NESTED]] {
+! CHECK: fir.call @_FortranAStopStatementText
+! CHECK: }
+
+! CHECK-NOT: fir.alloca !fir.type<_lp_cond_t
+! CHECK: %[[GADDR:.*]] = fir.address_of(@_lp_cond_global.{{l[0-9]+\.[0-9]+}}) : !fir.ref<!fir.type<_lp_cond_t.{{l[0-9]+\.[0-9]+}}{n:i32,kn:i64}>>
+
+! -- Init sentinels written inside omp.single --------------------------------
+! CHECK: omp.single {
+! CHECK: %[[NCOORD:.*]] = fir.coordinate_of %[[GADDR]], n
+! CHECK: %[[C0:.*]] = arith.constant 0 : i32
+! CHECK: fir.store %[[C0]] to %[[NCOORD]]
+! CHECK: %[[KNCOORD:.*]] = fir.coordinate_of %[[GADDR]], kn
+! CHECK: %[[CM1:.*]] = arith.constant -1 : i64
+! CHECK: fir.store %[[CM1]] to %[[KNCOORD]]
+! CHECK: omp.terminator
+! CHECK: }
+
+! -- Sections carries the global address as a by-ref reduction ---------------
+! CHECK: omp.sections
+! CHECK-SAME: reduction(byref @lp_cond_byref_rec__lp_cond_t
+! CHECK-SAME: %[[GADDR]]
+
+! -- Section 0: index constant hoisted to entry, stored after assignment ------
+! CHECK: omp.section {
+! CHECK: %[[IDX0:.*]] = arith.constant 0 : i64
+! CHECK: hlfir.assign
+! CHECK: fir.store %[[IDX0]]
+
+! -- Section 1: index constant hoisted to entry, stored after assignment ------
+! CHECK: omp.section {
+! CHECK: %[[IDX1:.*]] = arith.constant 1 : i64
+! CHECK: hlfir.assign
+! CHECK: fir.store %[[IDX1]]
+
+! -- Copy-back: load winning value from global and store to dummy arg ---------
+! CHECK: fir.coordinate_of %[[GADDR]], n
+! CHECK: fir.load
+! CHECK: fir.store
+
+! -- Module-level global declared at end of module (not a stack alloca) -------
+! CHECK: fir.global internal @_lp_cond_global.{{l[0-9]+\.[0-9]+}} : !fir.type<_lp_cond_t.{{l[0-9]+\.[0-9]+}}{n:i32,kn:i64}>
diff --git a/flang/test/Lower/OpenMP/lastprivate-conditional-sections.f90 b/flang/test/Lower/OpenMP/lastprivate-conditional-sections.f90
new file mode 100644
index 0000000000000..502b74d14c8e4
--- /dev/null
+++ b/flang/test/Lower/OpenMP/lastprivate-conditional-sections.f90
@@ -0,0 +1,80 @@
+! Test lowering of `lastprivate(conditional:)` on an omp sections construct
+! with multiple variables. The lowering must:
+! 1. Build a packed struct type {val, val, ..., idx, idx, ...}
+! 2. Create an omp.declare_reduction with identity 0 / -1
+! 3. Inject the struct as a by-ref reduction variable on the sections
+! 4. Rewrite assignments to use struct value fields + store constant section
+! index (0, 1, ...) into the index fields
+! 5. Copy back the winning values after the sections
+
+! RUN: bbc -fopenmp -fopenmp-version=50 -emit-hlfir %s -o - | FileCheck %s
+! RUN: %flang_fc1 -fopenmp -fopenmp-version=50 -emit-hlfir %s -o - | FileCheck %s
+
+subroutine test_conditional_lp_sections(x, y)
+ implicit none
+ integer, intent(inout) :: x, y
+
+ !$omp parallel sections lastprivate(conditional: x, y)
+ !$omp section
+ x = 10
+ y = 20
+
+ !$omp section
+ x = 30
+ y = 40
+ !$omp end parallel sections
+end subroutine
+
+! -- declare_reduction with struct type containing value/index pairs ----------
+! CHECK-LABEL: omp.declare_reduction @lp_cond_byref_rec__lp_cond_t
+! CHECK-SAME: : !fir.ref<!fir.type<_lp_cond_t.{{l[0-9]+\.[0-9]+}}{x:i32,y:i32,kx:i64,ky:i64}>>
+
+! -- Init region: value fields = 0, index fields = -1 ------------------------
+! CHECK: init {
+! CHECK-DAG: arith.constant 0 : i32
+! CHECK-DAG: arith.constant -1 : i64
+! CHECK: }
+
+! -- Combiner: sgt on i64 index fields, two pairs ----------------------------
+! CHECK: combiner {
+! CHECK: arith.cmpi sgt, %{{.*}}, %{{.*}} : i64
+! CHECK: fir.if
+! CHECK: arith.cmpi sgt, %{{.*}}, %{{.*}} : i64
+! CHECK: fir.if
+! CHECK: omp.yield
+
+! -- Struct alloca + init before parallel -------------------------------------
+! CHECK-LABEL: func.func @_QPtest_conditional_lp_sections
+! CHECK: %[[STRUCT:.*]] = fir.alloca !fir.type<_lp_cond_t.{{l[0-9]+\.[0-9]+}}{x:i32,y:i32,kx:i64,ky:i64}>
+! CHECK-DAG: arith.constant 0 : i32
+! CHECK-DAG: arith.constant -1 : i64
+
+! -- Sections carries the struct as a by-ref reduction ------------------------
+! CHECK: omp.parallel {
+! CHECK: omp.sections
+! CHECK-SAME: reduction(byref @lp_cond_byref_rec__lp_cond_t
+! CHECK-SAME: %[[STRUCT]]
+
+! -- Section 0: index constant hoisted to entry, stored after each assign -----
+! CHECK: omp.section {
+! CHECK: %[[IDX0:.*]] = arith.constant 0 : i64
+! CHECK: hlfir.assign
+! CHECK: fir.store %[[IDX0]]
+! CHECK: hlfir.assign
+! CHECK: fir.store %[[IDX0]]
+
+! -- Section 1: index constant hoisted to entry, stored after each assign -----
+! CHECK: omp.section {
+! CHECK: %[[IDX1:.*]] = arith.constant 1 : i64
+! CHECK: hlfir.assign
+! CHECK: fir.store %[[IDX1]]
+! CHECK: hlfir.assign
+! CHECK: fir.store %[[IDX1]]
+
+! -- Copy-back after sections -------------------------------------------------
+! CHECK: fir.coordinate_of %[[STRUCT]], {{[xy]}}
+! CHECK: fir.load
+! CHECK: fir.store
+! CHECK: fir.coordinate_of %[[STRUCT]], {{[xy]}}
+! CHECK: fir.load
+! CHECK: fir.store
diff --git a/flang/test/Lower/OpenMP/lastprivate-conditional-wsloop-nested-if.f90 b/flang/test/Lower/OpenMP/lastprivate-conditional-wsloop-nested-if.f90
new file mode 100644
index 0000000000000..a80c7e45a8315
--- /dev/null
+++ b/flang/test/Lower/OpenMP/lastprivate-conditional-wsloop-nested-if.f90
@@ -0,0 +1,36 @@
+! Test that lastprivate(conditional:) correctly identifies the enclosing
+! omp.parallel even when the worksharing construct is inside a Fortran IF
+! block (which lowers to fir.if). The struct alloca must be placed before
+! the omp.parallel — NOT treated as orphaned.
+
+! RUN: bbc -fopenmp -fopenmp-version=50 -emit-hlfir %s -o - | FileCheck %s
+! RUN: %flang_fc1 -fopenmp -fopenmp-version=50 -emit-hlfir %s -o - | FileCheck %s
+
+subroutine test_nested_if(n, a, x, flag)
+ implicit none
+ integer, intent(in) :: n, flag
+ integer, intent(in) :: a(n)
+ integer, intent(inout) :: x
+ integer :: k
+
+ !$omp parallel do lastprivate(conditional: x)
+ do k = 1, n
+ if (a(k) < 150) then
+ x = k + 1
+ end if
+ end do
+ !$omp end parallel do
+end subroutine
+
+! -- The struct is stack-allocated (fir.alloca), not a global -----------------
+! CHECK-LABEL: func.func @_QPtest_nested_if
+! CHECK: fir.alloca !fir.type<_lp_cond_t.{{l[0-9]+\.[0-9]+}}{x:i32,kx:i64}>
+! CHECK-NOT: fir.address_of(@_lp_cond_global
+
+! -- No nesting guard emitted (this is not orphaned) -------------------------
+! CHECK-NOT: fir.call @omp_get_level_
+
+! -- omp.parallel with the struct as reduction --------------------------------
+! CHECK: omp.parallel
+! CHECK: omp.wsloop
+! CHECK-SAME: reduction(byref @lp_cond_byref_rec__lp_cond_t
diff --git a/flang/test/Lower/OpenMP/lastprivate-conditional-wsloop-nowait.f90 b/flang/test/Lower/OpenMP/lastprivate-conditional-wsloop-nowait.f90
new file mode 100644
index 0000000000000..e27a4ebe0c301
--- /dev/null
+++ b/flang/test/Lower/OpenMP/lastprivate-conditional-wsloop-nowait.f90
@@ -0,0 +1,38 @@
+! Test lowering of `lastprivate(conditional:)` on a worksharing do loop
+! with the nowait clause. The lowering must emit an explicit barrier
+! before the copy-back to ensure the reduction is fully finalized.
+
+! RUN: bbc -fopenmp -fopenmp-version=50 -emit-hlfir %s -o - | FileCheck %s
+! RUN: %flang_fc1 -fopenmp -fopenmp-version=50 -emit-hlfir %s -o - | FileCheck %s
+
+subroutine test_conditional_lp_nowait(n, x)
+ implicit none
+ integer, intent(in) :: n
+ integer, intent(inout) :: x
+ integer :: k
+
+ !$omp parallel
+ !$omp do lastprivate(conditional: x) nowait
+ do k = 1, n
+ x = k
+ end do
+ !$omp end do nowait
+ !$omp end parallel
+end subroutine
+
+! CHECK-LABEL: func.func @_QPtest_conditional_lp_nowait
+! CHECK: %[[STRUCT:.*]] = fir.alloca !fir.type<_lp_cond_t.{{l[0-9]+\.[0-9]+}}{x:i32,kx:i64}>
+
+! CHECK: omp.parallel {
+! CHECK: omp.wsloop nowait
+! CHECK-SAME: reduction(byref @lp_cond_byref_rec__lp_cond_t
+! CHECK-SAME: %[[STRUCT]]
+
+! CHECK: omp.barrier
+
+! CHECK: omp.single {
+! CHECK: fir.coordinate_of %[[STRUCT]], x
+! CHECK: fir.load
+! CHECK: fir.store
+! CHECK: omp.terminator
+! CHECK: }
diff --git a/flang/test/Lower/OpenMP/lastprivate-conditional-wsloop-orphaned.f90 b/flang/test/Lower/OpenMP/lastprivate-conditional-wsloop-orphaned.f90
new file mode 100644
index 0000000000000..b8cb9b3159d7c
--- /dev/null
+++ b/flang/test/Lower/OpenMP/lastprivate-conditional-wsloop-orphaned.f90
@@ -0,0 +1,73 @@
+! Test lowering of `lastprivate(conditional:)` on an ORPHANED omp do loop
+! (wsloop inside a subroutine called from a parallel region).
+!
+! Because the subroutine has no enclosing omp.parallel, a stack alloca would
+! give each thread a private copy and the cross-thread reduction would never
+! merge. The lowering must therefore place the reduction struct in a
+! module-scope fir.global internal rather than a fir.alloca.
+
+! RUN: %flang_fc1 -fopenmp -fopenmp-version=50 -emit-hlfir %s -o - | FileCheck %s
+
+subroutine test_orphaned_wsloop(n, x)
+ implicit none
+ integer, intent(in) :: n
+ integer, intent(inout) :: x
+ integer :: k
+
+ !$omp do lastprivate(conditional: x)
+ do k = 1, n
+ x = k
+ end do
+ !$omp end do
+end subroutine
+
+! -- declare_reduction for the struct type ------------------------------------
+! CHECK-LABEL: omp.declare_reduction @lp_cond_byref_rec__lp_cond_t
+! CHECK-SAME: : !fir.ref<!fir.type<_lp_cond_t.{{l[0-9]+\.[0-9]+}}{x:i32,kx:i64}>>
+
+! -- Function body: address_of global (no fir.alloca for the struct) ----------
+! CHECK-LABEL: func.func @_QPtest_orphaned_wsloop
+
+! -- Runtime guard: abort if called from nested parallelism ------------------
+! Guard is emitted BEFORE init to avoid racing on the global.
+! CHECK: %[[LEVEL:.*]] = fir.call @omp_get_level_() {{.*}} : () -> i32
+! CHECK: %[[ONE:.*]] = arith.constant 1 : i32
+! CHECK: %[[NESTED:.*]] = arith.cmpi sgt, %[[LEVEL]], %[[ONE]] : i32
+! CHECK: fir.if %[[NESTED]] {
+! CHECK: fir.call @_FortranAStopStatementText
+! CHECK: }
+
+! CHECK-NOT: fir.alloca !fir.type<_lp_cond_t
+! CHECK: %[[GADDR:.*]] = fir.address_of(@_lp_cond_global.{{l[0-9]+\.[0-9]+}}) : !fir.ref<!fir.type<_lp_cond_t.{{l[0-9]+\.[0-9]+}}{x:i32,kx:i64}>>
+
+! -- Init sentinels written inside omp.single --------------------------------
+! CHECK: omp.single {
+! CHECK: %[[XCOORD:.*]] = fir.coordinate_of %[[GADDR]], x
+! CHECK: %[[C0:.*]] = arith.constant 0 : i32
+! CHECK: fir.store %[[C0]] to %[[XCOORD]]
+! CHECK: %[[KXCOORD:.*]] = fir.coordinate_of %[[GADDR]], kx
+! CHECK: %[[CM1:.*]] = arith.constant -1 : i64
+! CHECK: fir.store %[[CM1]] to %[[KXCOORD]]
+! CHECK: omp.terminator
+! CHECK: }
+
+! -- Wsloop carries the global address as a by-ref reduction -----------------
+! CHECK: omp.wsloop
+! CHECK-SAME: reduction(byref @lp_cond_byref_rec__lp_cond_t
+! CHECK-SAME: %[[GADDR]]
+! CHECK-SAME: -> %[[SARG:.*]] :
+
+! -- Loop body: struct value and index fields updated -------------------------
+! CHECK: omp.loop_nest
+! CHECK: fir.coordinate_of %[[SARG]], x
+! CHECK: hlfir.assign
+! CHECK: fir.coordinate_of %[[SARG]], kx
+! CHECK: fir.store
+
+! -- Copy-back: load winning value from global and store to dummy arg ---------
+! CHECK: fir.coordinate_of %[[GADDR]], x
+! CHECK: fir.load
+! CHECK: fir.store
+
+! -- Module-level global declared at end of module (not a stack alloca) -------
+! CHECK: fir.global internal @_lp_cond_global.{{l[0-9]+\.[0-9]+}} : !fir.type<_lp_cond_t.{{l[0-9]+\.[0-9]+}}{x:i32,kx:i64}>
diff --git a/flang/test/Lower/OpenMP/lastprivate-conditional-wsloop.f90 b/flang/test/Lower/OpenMP/lastprivate-conditional-wsloop.f90
new file mode 100644
index 0000000000000..ae2216447ff18
--- /dev/null
+++ b/flang/test/Lower/OpenMP/lastprivate-conditional-wsloop.f90
@@ -0,0 +1,81 @@
+! Test lowering of `lastprivate(conditional:)` on a worksharing do loop
+! with multiple variables. The lowering must:
+! 1. Build a packed struct type {val, val, ..., idx, idx, ...}
+! 2. Create an omp.declare_reduction with identity 0 / -1
+! 3. Inject the struct as a by-ref reduction variable on the wsloop
+! 4. Rewrite assignments to use struct value fields + store canonical IV
+! 5. Copy back the winning values after the wsloop
+
+! RUN: bbc -fopenmp -fopenmp-version=50 -emit-hlfir %s -o - | FileCheck %s
+! RUN: %flang_fc1 -fopenmp -fopenmp-version=50 -emit-hlfir %s -o - | FileCheck %s
+
+subroutine test_conditional_lp(n, a, x, y)
+ implicit none
+ integer, intent(in) :: n
+ integer, intent(in) :: a(n)
+ integer, intent(inout) :: x, y
+ integer :: k
+
+ !$omp parallel do lastprivate(conditional: x, y)
+ do k = 1, n
+ if (a(k) < 150) then
+ x = k + 1
+ end if
+ if (a(k) < 100) then
+ y = k
+ end if
+ end do
+ !$omp end parallel do
+end subroutine
+
+! -- declare_reduction with struct type containing value/index pairs ----------
+! CHECK-LABEL: omp.declare_reduction @lp_cond_byref_rec__lp_cond_t
+! CHECK-SAME: : !fir.ref<!fir.type<_lp_cond_t.{{l[0-9]+\.[0-9]+}}{x:i32,y:i32,kx:i64,ky:i64}>>
+
+! -- Init region: value fields = 0, index fields = -1 ------------------------
+! CHECK: init {
+! CHECK-DAG: arith.constant 0 : i32
+! CHECK-DAG: arith.constant -1 : i64
+! CHECK: }
+
+! -- Combiner: sgt on i64 index fields, two pairs ----------------------------
+! CHECK: combiner {
+! CHECK: arith.cmpi sgt, %{{.*}}, %{{.*}} : i64
+! CHECK: fir.if
+! CHECK: arith.cmpi sgt, %{{.*}}, %{{.*}} : i64
+! CHECK: fir.if
+! CHECK: omp.yield
+
+! -- Struct alloca + init (0 / -1) before parallel ----------------------------
+! CHECK-LABEL: func.func @_QPtest_conditional_lp
+! CHECK: %[[STRUCT:.*]] = fir.alloca !fir.type<_lp_cond_t.{{l[0-9]+\.[0-9]+}}{x:i32,y:i32,kx:i64,ky:i64}>
+! CHECK-DAG: arith.constant 0 : i32
+! CHECK-DAG: arith.constant -1 : i64
+
+! -- Wsloop carries the struct as a by-ref reduction --------------------------
+! CHECK: omp.parallel {
+! CHECK: omp.wsloop
+! CHECK-SAME: reduction(byref @lp_cond_byref_rec__lp_cond_t
+! CHECK-SAME: %[[STRUCT]]
+! CHECK-SAME: -> %[[SARG:.*]] :
+
+! -- Loop body: struct value fields used, canonical IV stored to index fields -
+! CHECK: omp.loop_nest (%{{.*}}) : i32
+! CHECK-DAG: fir.coordinate_of %[[SARG]], x
+! CHECK-DAG: fir.coordinate_of %[[SARG]], y
+! CHECK: fir.if
+! CHECK: hlfir.assign
+! CHECK: fir.coordinate_of %[[SARG]], kx
+! CHECK: fir.store %{{.*}} : !fir.ref<i64>
+! CHECK: fir.if
+! CHECK: hlfir.assign
+! CHECK: fir.coordinate_of %[[SARG]], ky
+! CHECK: fir.store %{{.*}} : !fir.ref<i64>
+
+! -- Copy-back after wsloop ---------------------------------------------------
+! CHECK: fir.coordinate_of %[[STRUCT]], {{[xy]}}
+! CHECK: fir.load
+! CHECK: fir.store
+! CHECK: fir.coordinate_of %[[STRUCT]], {{[xy]}}
+! CHECK: fir.load
+! CHECK: fir.store
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index f0511bb4be7dd..c623acd742fce 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -3881,8 +3881,7 @@ convertOmpWsloop(Operation &opInst, llvm::IRBuilderBase &builder,
// Process the reductions if required.
if (failed(createReductionsAndCleanup(
wsloopOp, builder, moduleTranslation, allocaIP, reductionDecls,
- privateReductionVariables, isByRef, wsloopOp.getNowait(),
- /*isTeamsReduction=*/false)))
+ privateReductionVariables, isByRef, wsloopOp.getNowait())))
return failure();
return cleanupPrivateVars(wsloopOp, builder, moduleTranslation,
More information about the flang-commits
mailing list