[flang-commits] [flang] [flang][OpenMP] Fix crash when a sliced array is specified in a forall within a workshare construct (PR #170913)

Fri Dec 5 12:09:31 PST 2025

https://github.com/ceseo created https://github.com/llvm/llvm-project/pull/170913

This is a fix for two problems that caused a crash:

1. Thread-local variables sometimes are required to be parallelized. Added a special case to handle this in `LowerWorkshare.cpp:isSafeToParallelize`.
2. Race condition caused by a `nowait` added to the `omp.workshare` if it is the last operation in a block. This allowed multiple threads to execute the `omp.workshare` region concurrently. Since _FortranAPushValue modifies a shared stack, this concurrent access causes a crash. Disable the addition of `nowait` and rely on the implicit barrier at the the of the `omp.workshare` region.

Fixes #143330

>From f3d74aeafc4b41d8f63f6dde68397c6e2e6ab3ab Mon Sep 17 00:00:00 2001
From: Carlos Seo <carlos.seo at linaro.org>
Date: Fri, 5 Dec 2025 19:55:22 +0000
Subject: [PATCH] [flang][OpenMP] Fix crash when a sliced array is specified in
 forall construct within workshare construct

This is fix for two problems:

1. Thread-local variables sometimes are required to be paralellized. Added a
   special case to handle this in LowerWorkshare.cpp:isSafeToParallelize.
2. Race condition caused by a `nowait` added to the omp.workshare if it is the
   last operation in a block. This allowed multiple threads to execute the
   omp.workshare region concurrently. Since _FortranAPushValue modifies a
   shared stack, this concurrent access causes a crash. Disable the addition of
   `nowait` and rely on the implicit barrier at the the of the omp.workshare
   region.

Fixes #143330
---
 flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp | 49 ++++++++++++++++++-
 1 file changed, 47 insertions(+), 2 deletions(-)

diff --git a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
index f6af684f06edb..fc1ed9761d0c9 100644
--- a/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
+++ b/flang/lib/Optimizer/OpenMP/LowerWorkshare.cpp
@@ -136,8 +136,48 @@ static bool mustParallelizeOp(Operation *op) {
 }
 
 static bool isSafeToParallelize(Operation *op) {
-  return isa<hlfir::DeclareOp>(op) || isa<fir::DeclareOp>(op) ||
-         isMemoryEffectFree(op);
+  if (isa<hlfir::DeclareOp>(op) || isa<fir::DeclareOp>(op) ||
+      isMemoryEffectFree(op))
+    return true;
+
+  // Special case: Thread-local variables allocated in the OpenMP parallel
+  // region (using fir.alloca) are private to each thread and thus safe (and
+  // sometimes required) to parallelize. If the compiler wraps such load/stores
+  // in an omp.workshare block, only one thread updates its local copy, while
+  // all other threads read uninitialized data (see issue #143330).
+  if (fir::StoreOp store = dyn_cast<fir::StoreOp>(op)) {
+    Value mem = store.getMemref();
+    while (fir::DeclareOp declare = mem.getDefiningOp<fir::DeclareOp>())
+      mem = declare.getMemref();
+    while (hlfir::DeclareOp declare = mem.getDefiningOp<hlfir::DeclareOp>())
+      mem = declare.getMemref();
+
+    if (fir::AllocaOp alloca = mem.getDefiningOp<fir::AllocaOp>()) {
+      if (mlir::omp::ParallelOp parallelOp =
+              alloca->getParentOfType<mlir::omp::ParallelOp>()) {
+        if (op->getParentOfType<mlir::omp::ParallelOp>() == parallelOp)
+          return true;
+      }
+    }
+  }
+
+  if (fir::LoadOp load = dyn_cast<fir::LoadOp>(op)) {
+    Value mem = load.getMemref();
+    while (fir::DeclareOp declare = mem.getDefiningOp<fir::DeclareOp>())
+      mem = declare.getMemref();
+    while (hlfir::DeclareOp declare = mem.getDefiningOp<hlfir::DeclareOp>())
+      mem = declare.getMemref();
+
+    if (fir::AllocaOp alloca = mem.getDefiningOp<fir::AllocaOp>()) {
+      if (mlir::omp::ParallelOp parallelOp =
+              alloca->getParentOfType<mlir::omp::ParallelOp>()) {
+        if (op->getParentOfType<mlir::omp::ParallelOp>() == parallelOp)
+          return true;
+      }
+    }
+  }
+
+  return false;
 }
 
 /// Simple shallow copies suffice for our purposes in this pass, so we implement
@@ -335,6 +375,11 @@ static void parallelizeRegion(Region &sourceRegion, Region &targetRegion,
 
     for (auto [i, opOrSingle] : llvm::enumerate(regions)) {
       bool isLast = i + 1 == regions.size();
+      // Make sure shared runtime calls are synchronized: disable `nowait`
+      // insertion, and rely on the implicit barrier at the end of the
+      // omp.workshare block.
+      if (isa<fir::DoLoopOp>(block.getParentOp()))
+        isLast = false;
       if (std::holds_alternative<SingleRegion>(opOrSingle)) {
         OpBuilder singleBuilder(sourceRegion.getContext());
         Block *singleBlock = new Block();