[Mlir-commits] [mlir] [mlir][OpenMP] don't add compiler-generated barrier in single threaded code (PR #174105)

Wed Dec 31 08:36:59 PST 2025

llvmbot wrote:




@llvm/pr-subscribers-mlir

Author: Tom Eccles (tblah)

<details>
<summary>Changes</summary>

We add barriers to the firstprivate copy region when they are required to avoid a race condition with the lastprivate clause.

The problem is that these barriers are added by the compiler not implied by user code so it is the compiler's problem to avoid deadlock.

I came across a testcase whilst working on taskloop support that looks a bit like this
```
!$omp parallel
  !$omp single
    !$omp taskloop firstprivate(a) lastprivate(a)
      ...
  !$omp end single
!$omp end parallel
```

This is so that there are multiple threads for the generated tasks to be distributed over, but we don't generate the tasks afresh in every thread.

The problem comes when the taskloop requires a barrier to prevent the datarace between firstprivate and lastprivate. This barrier will then be generated inside of SINGLE and so only one thread will encounter the barrier: leading to a deadlock.

This patch works around the problem by detecting this situation statically and then not generating the barrier. There are cases where we cannot detect this statically (e.g. if the TASKLOOP is inside a function call inside of SINGLE). The program will still deadlock in this case after my patch. I'm unsure what the solution would be for that case. I want to fix this simple case in LLVM 22 before engaging in a longer discussion as to whether there is a better way to handle the more general case.

Testing using wsloop because I want to land this (or not) independently of taskloop. Note that for wsloop it would be up to the programmer to remember to use the nowait clause, but nowait cannot be used to control generation of this barrier because it refers to the barrier after the construct not after firstprivate copyin (before the construct execution).

---
Full diff: https://github.com/llvm/llvm-project/pull/174105.diff


2 Files Affected:

- (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+20-1) 
- (added) mlir/test/Target/LLVMIR/openmp-private-barrier-single.mlir (+43) 


``````````diff

diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 03d67a52853f6..8025d9ea64483 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -1678,6 +1678,25 @@ allocatePrivateVars(llvm::IRBuilderBase &builder,
   return afterAllocas;
 }
 
+// This can't always be determined statically, but when we can, it is good to
+// avoid generating compiler-added barriers which will deadlock the program.
+static bool opIsInSingleThread(mlir::Operation *op) {
+  while (mlir::Operation *parent = op->getParentOp()) {
+    if (mlir::isa<omp::SingleOp, omp::CriticalOp>(parent))
+      return true;
+
+    // e.g.
+    // omp.single {
+    //   omp.parallel {
+    //     op
+    //   }
+    // }
+    if (mlir::isa<omp::ParallelOp>(parent))
+      return false;
+  }
+  return false;
+}
+
 static LogicalResult copyFirstPrivateVars(
     mlir::Operation *op, llvm::IRBuilderBase &builder,
     LLVM::ModuleTranslation &moduleTranslation,
@@ -1731,7 +1750,7 @@ static LogicalResult copyFirstPrivateVars(
     moduleTranslation.forgetMapping(copyRegion);
   }
 
-  if (insertBarrier) {
+  if (insertBarrier && !opIsInSingleThread(op)) {
     llvm::OpenMPIRBuilder *ompBuilder = moduleTranslation.getOpenMPBuilder();
     llvm::OpenMPIRBuilder::InsertPointOrErrorTy res =
         ompBuilder->createBarrier(builder.saveIP(), llvm::omp::OMPD_barrier);
diff --git a/mlir/test/Target/LLVMIR/openmp-private-barrier-single.mlir b/mlir/test/Target/LLVMIR/openmp-private-barrier-single.mlir
new file mode 100644
index 0000000000000..1b7b1eeac6227
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/openmp-private-barrier-single.mlir
@@ -0,0 +1,43 @@
+// RUN: mlir-translate -mlir-to-llvmir %s | FileCheck %s
+
+// tests that no privatization barrier is added for operations inside of SINGLE
+
+omp.private {type = private} @_QFwsloop_privateEi_private_ref_i32 : i32
+
+llvm.func @foo_free(!llvm.ptr)
+
+omp.private {type = firstprivate} @_QFwsloop_privateEc_firstprivate_ref_c8 : !llvm.array<1 x i8> copy {
+^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr):
+  %0 = llvm.load %arg0 : !llvm.ptr -> !llvm.array<1 x i8>
+  llvm.store %0, %arg1 : !llvm.array<1 x i8>, !llvm.ptr
+  omp.yield(%arg1 : !llvm.ptr)
+} dealloc {
+^bb0(%arg0: !llvm.ptr):
+  llvm.call @foo_free(%arg0) : (!llvm.ptr) -> ()
+  omp.yield
+}
+
+llvm.func @wsloop_private_(%arg0: !llvm.ptr {fir.bindc_name = "y"}) attributes {fir.internal_name = "_QPwsloop_private", frame_pointer = #llvm.framePointerKind<all>, target_cpu = "x86-64"} {
+  %0 = llvm.mlir.constant(1 : i64) : i64
+  %1 = llvm.alloca %0 x f32 {bindc_name = "x"} : (i64) -> !llvm.ptr
+  %3 = llvm.alloca %0 x i32 {bindc_name = "i"} : (i64) -> !llvm.ptr
+  %5 = llvm.alloca %0 x !llvm.array<1 x i8> {bindc_name = "c"} : (i64) -> !llvm.ptr
+  %6 = llvm.mlir.constant(1 : i32) : i32
+  %7 = llvm.mlir.constant(10 : i32) : i32
+  %8 = llvm.mlir.constant(0 : i32) : i32
+  omp.parallel {
+    omp.single {
+      omp.wsloop private(@_QFwsloop_privateEc_firstprivate_ref_c8 %5 -> %arg1, @_QFwsloop_privateEi_private_ref_i32 %3 -> %arg2 : !llvm.ptr, !llvm.ptr) private_barrier {
+  // CHECK: omp.private.copy:
+  // CHECK-NOT: __kmpc_barrier
+  // CHECK: br label
+        omp.loop_nest (%arg4) : i32 = (%8) to (%7) inclusive step (%6) {
+          omp.yield
+        }
+      }
+      omp.terminator
+    }
+    omp.terminator
+  }
+  llvm.return
+}

``````````

</details>


https://github.com/llvm/llvm-project/pull/174105