[llvm-branch-commits] [mlir] [mlir][OpenMP] Translate explicit task in_reduction (PR #202611)
Sairudra More via llvm-branch-commits
llvm-branch-commits at lists.llvm.org
Tue Jun 9 06:33:08 PDT 2026
https://github.com/Saieiei created https://github.com/llvm/llvm-project/pull/202611
This patch lowers `in_reduction` on explicit `omp.task` operations to LLVM IR.
Inside the outlined task body, the lowering now obtains the executing thread's gtid and calls `__kmpc_task_reduction_get_th_data` with a null descriptor and the original reduction variable address. This lets the runtime walk the enclosing taskgroup reduction registrations and return the per-task private reduction storage. The `in_reduction` block arguments are then remapped to that private storage, so updates in the explicit task body target the task-private reduction copy rather than the original shared variable.
This complements the existing `task_reduction` support for `omp.taskgroup` and builds on the task reduction infrastructure introduced in PR #199565 for `omp.taskloop` reductions, reusing the same runtime model and `__kmpc_task_reduction_get_th_data`-based translation path.
Unsupported cases remain guarded:
- byref `in_reduction`
- richer `declare_reduction` forms such as two-argument initializers, cleanup regions, and missing combiners
---
**Note for reviewers:** This is a *stacked* PR on top of #199670. Its base branch is the upstream user branch `users/saieiei/taskloop-reduction` (the head of #199670), not `main`, so the diff here shows only the explicit-task `in_reduction` change. Please review it after / together with #199670, which it depends on. Once #199670 merges, this branch will be rebased onto `main` and retargeted accordingly.
>From c43410e95ba24223b05e03073539621709dc8dc3 Mon Sep 17 00:00:00 2001
From: Sairudra More <sairudra60 at gmail.com>
Date: Tue, 9 Jun 2026 02:44:30 -0500
Subject: [PATCH] [mlir][OpenMP] Translate explicit task in_reduction
Lower in_reduction on explicit omp.task. Inside the outlined task body,
look up the task-private reduction storage with
__kmpc_task_reduction_get_th_data using a null descriptor, matching the
runtime model where the enclosing taskgroup owns the task_reduction
registration.
The in_reduction block arguments are remapped to the returned private
storage so task-body updates target the private reduction copy instead
of the original shared variable.
Byref in_reduction remains guarded by checkImplementationStatus, and
unsupported declare_reduction forms remain rejected.
---
.../OpenMP/OpenMPToLLVMIRTranslation.cpp | 53 ++++++++-
.../LLVMIR/openmp-task-in-reduction.mlir | 104 ++++++++++++++++++
mlir/test/Target/LLVMIR/openmp-todo.mlir | 6 +-
3 files changed, 159 insertions(+), 4 deletions(-)
create mode 100644 mlir/test/Target/LLVMIR/openmp-task-in-reduction.mlir
diff --git a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
index 03cc2505cbbd8..cf1ae35ccd5e0 100644
--- a/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
+++ b/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp
@@ -441,7 +441,7 @@ static LogicalResult checkImplementationStatus(Operation &op) {
})
.Case([&](omp::TaskOp op) {
checkAllocate(op, result);
- checkInReduction(op, result);
+ checkInReductionByref(op, result);
})
.Case([&](omp::TaskgroupOp op) {
checkAllocate(op, result);
@@ -3069,6 +3069,22 @@ convertOmpTaskOp(omp::TaskOp taskOp, llvm::IRBuilderBase &builder,
if (failed(buildAffinityData(taskOp, builder, moduleTranslation, ad)))
return llvm::failure();
+ // Resolve and validate in_reduction declarations. Byref in_reduction has
+ // already been rejected by checkImplementationStatus; the helper rejects the
+ // remaining richer declare_reduction shapes (two-argument initializer,
+ // cleanup region, missing combiner). This is pure MLIR symbol-table work and
+ // emits no IR. The matching task_reduction descriptor is registered by an
+ // enclosing taskgroup; here we only look the per-task storage up at runtime.
+ SmallVector<omp::DeclareReductionOp> inRedDecls;
+ if (failed(collectAndValidateTaskloopRedDecls(
+ taskOp.getOperation(), taskOp.getInReductionSyms(), "omp.task",
+ "in_reduction", inRedDecls)))
+ return failure();
+ SmallVector<llvm::Value *> inRedOrigPtrs;
+ inRedOrigPtrs.reserve(inRedDecls.size());
+ for (Value v : taskOp.getInReductionVars())
+ inRedOrigPtrs.push_back(moduleTranslation.lookupValue(v));
+
// Set up for call to createTask()
builder.SetInsertPoint(taskStartBlock);
@@ -3138,6 +3154,41 @@ convertOmpTaskOp(omp::TaskOp taskOp, llvm::IRBuilderBase &builder,
moduleTranslation.mapValue(blockArg, llvmPrivateVar);
}
+ // Map in_reduction block arguments to the per-task private storage returned
+ // by __kmpc_task_reduction_get_th_data. This call must be emitted inside
+ // the to-be-outlined task body so that it returns the *executing* thread's
+ // gtid (not the encountering thread's). The descriptor is NULL: the runtime
+ // walks up enclosing taskgroups to find the matching task_reduction
+ // registration for `origPtr`. The original pointers are auto-captured into
+ // the task shareds aggregate by CodeExtractor during
+ // OpenMPIRBuilder::finalize.
+ if (!inRedDecls.empty()) {
+ auto iface = cast<omp::BlockArgOpenMPOpInterface>(taskOp.getOperation());
+ llvm::OpenMPIRBuilder &ompB = *moduleTranslation.getOpenMPBuilder();
+ llvm::Module *m = moduleTranslation.getLLVMModule();
+ llvm::LLVMContext &llvmCtx = m->getContext();
+ llvm::OpenMPIRBuilder::LocationDescription bodyLoc(builder);
+ uint32_t srcLocSize;
+ llvm::Constant *srcLocStr =
+ ompB.getOrCreateSrcLocStr(bodyLoc, srcLocSize);
+ llvm::Value *bodyIdent = ompB.getOrCreateIdent(srcLocStr, srcLocSize);
+ // Align OpenMPIRBuilder's internal IRBuilder with `builder` so the gtid
+ // call lands inside the to-be-outlined task body.
+ ompB.updateToLocation(bodyLoc);
+ llvm::Value *bodyGtid = ompB.getOrCreateThreadID(bodyIdent);
+ llvm::FunctionCallee getThData = ompB.getOrCreateRuntimeFunction(
+ *m, llvm::omp::OMPRTL___kmpc_task_reduction_get_th_data);
+ llvm::Type *ptrTy = llvm::PointerType::getUnqual(llvmCtx);
+ llvm::Value *nullDesc = llvm::ConstantPointerNull::get(ptrTy);
+ ArrayRef<BlockArgument> inRedBlockArgs = iface.getInReductionBlockArgs();
+ for (auto [blockArg, origPtr] :
+ llvm::zip_equal(inRedBlockArgs, inRedOrigPtrs)) {
+ llvm::Value *priv = builder.CreateCall(
+ getThData, {bodyGtid, nullDesc, origPtr}, "omp.inred.priv");
+ moduleTranslation.mapValue(blockArg, priv);
+ }
+ }
+
auto continuationBlockOrError = convertOmpOpRegions(
taskOp.getRegion(), "omp.task.region", builder, moduleTranslation);
if (failed(handleError(continuationBlockOrError, *taskOp)))
diff --git a/mlir/test/Target/LLVMIR/openmp-task-in-reduction.mlir b/mlir/test/Target/LLVMIR/openmp-task-in-reduction.mlir
new file mode 100644
index 0000000000000..2ed81f3d42389
--- /dev/null
+++ b/mlir/test/Target/LLVMIR/openmp-task-in-reduction.mlir
@@ -0,0 +1,104 @@
+// RUN: mlir-translate -mlir-to-llvmir -split-input-file %s | FileCheck %s
+
+// in_reduction on an explicit omp.task. Unlike taskgroup task_reduction, the
+// task does not register a reduction; it participates in a reduction declared
+// by an enclosing taskgroup. The lowering must, inside the outlined task body:
+// 1. Obtain the executing thread's gtid via __kmpc_global_thread_num;
+// 2. Look up the per-task private storage via
+// __kmpc_task_reduction_get_th_data(gtid, null, orig) -- the NULL
+// descriptor makes the runtime walk up enclosing taskgroups to find the
+// matching task_reduction registration for `orig`;
+// 3. Use the returned private pointer for all updates in the task body, never
+// the original shared variable.
+
+omp.declare_reduction @add_i32 : i32
+init {
+^bb0(%arg0: i32):
+ %c0 = llvm.mlir.constant(0 : i32) : i32
+ omp.yield(%c0 : i32)
+}
+combiner {
+^bb0(%arg0: i32, %arg1: i32):
+ %s = llvm.add %arg0, %arg1 : i32
+ omp.yield(%s : i32)
+}
+
+llvm.func @task_in_reduction_single(%x : !llvm.ptr) {
+ omp.task in_reduction(@add_i32 %x -> %prv : !llvm.ptr) {
+ %v = llvm.load %prv : !llvm.ptr -> i32
+ %c1 = llvm.mlir.constant(1 : i32) : i32
+ %s = llvm.add %v, %c1 : i32
+ llvm.store %s, %prv : i32, !llvm.ptr
+ omp.terminator
+ }
+ llvm.return
+}
+
+// The encountering function must NOT register a reduction: no taskgroup, no
+// descriptor array, and no __kmpc_taskred_init for in_reduction.
+// CHECK-LABEL: define void @task_in_reduction_single(
+// CHECK-NOT: @__kmpc_taskred_init
+// CHECK-NOT: @__kmpc_taskgroup
+
+// Outlined task body looks up per-task storage via the runtime with a NULL
+// descriptor, and updates that private storage (not the original pointer).
+// CHECK-LABEL: define internal void @task_in_reduction_single..omp_par(
+// CHECK: %[[BODY_ORIG:.+]] = load ptr, ptr %{{.+}}, align {{.+}}, !align
+// CHECK: %[[BODY_GTID:.+]] = call i32 @__kmpc_global_thread_num(
+// CHECK: %[[PRIV:.+]] = call ptr @__kmpc_task_reduction_get_th_data(i32 %[[BODY_GTID]], ptr null, ptr %[[BODY_ORIG]])
+// CHECK: %[[LD:.+]] = load i32, ptr %[[PRIV]]
+// CHECK: %[[ADD:.+]] = add i32 %[[LD]], 1
+// CHECK: store i32 %[[ADD]], ptr %[[PRIV]]
+
+// -----
+
+// Multiple in_reduction items: the body issues one
+// __kmpc_task_reduction_get_th_data per item, each with a NULL descriptor.
+
+omp.declare_reduction @add_i32 : i32
+init {
+^bb0(%arg0: i32):
+ %c0 = llvm.mlir.constant(0 : i32) : i32
+ omp.yield(%c0 : i32)
+}
+combiner {
+^bb0(%arg0: i32, %arg1: i32):
+ %s = llvm.add %arg0, %arg1 : i32
+ omp.yield(%s : i32)
+}
+
+llvm.func @task_in_reduction_multi(%x : !llvm.ptr, %y : !llvm.ptr) {
+ omp.task in_reduction(@add_i32 %x -> %px, @add_i32 %y -> %py : !llvm.ptr, !llvm.ptr) {
+ %vx = llvm.load %px : !llvm.ptr -> i32
+ %c1 = llvm.mlir.constant(1 : i32) : i32
+ %sx = llvm.add %vx, %c1 : i32
+ llvm.store %sx, %px : i32, !llvm.ptr
+ %vy = llvm.load %py : !llvm.ptr -> i32
+ %c2 = llvm.mlir.constant(2 : i32) : i32
+ %sy = llvm.add %vy, %c2 : i32
+ llvm.store %sy, %py : i32, !llvm.ptr
+ omp.terminator
+ }
+ llvm.return
+}
+
+// CHECK-LABEL: define internal void @task_in_reduction_multi..omp_par(
+// CHECK: call ptr @__kmpc_task_reduction_get_th_data(i32 %{{.+}}, ptr null, ptr %{{.+}})
+// CHECK: call ptr @__kmpc_task_reduction_get_th_data(i32 %{{.+}}, ptr null, ptr %{{.+}})
+
+// -----
+
+// Regression: a plain omp.task with no in_reduction must not emit any
+// __kmpc_task_reduction_get_th_data call.
+
+llvm.func @task_plain(%x : !llvm.ptr) {
+ omp.task {
+ %c1 = llvm.mlir.constant(1 : i32) : i32
+ llvm.store %c1, %x : i32, !llvm.ptr
+ omp.terminator
+ }
+ llvm.return
+}
+
+// CHECK-LABEL: define void @task_plain(
+// CHECK-NOT: @__kmpc_task_reduction_get_th_data
diff --git a/mlir/test/Target/LLVMIR/openmp-todo.mlir b/mlir/test/Target/LLVMIR/openmp-todo.mlir
index 5c22f7f081bb5..3d760f95c7ebc 100644
--- a/mlir/test/Target/LLVMIR/openmp-todo.mlir
+++ b/mlir/test/Target/LLVMIR/openmp-todo.mlir
@@ -262,10 +262,10 @@ atomic {
llvm.atomicrmw fadd %arg2, %2 monotonic : !llvm.ptr, f32
omp.yield
}
-llvm.func @task_in_reduction(%x : !llvm.ptr) {
- // expected-error at below {{not yet implemented: Unhandled clause in_reduction in omp.task operation}}
+llvm.func @task_in_reduction_byref(%x : !llvm.ptr) {
+ // expected-error at below {{not yet implemented: Unhandled clause in_reduction with byref modifier in omp.task operation}}
// expected-error at below {{LLVM Translation failed for operation: omp.task}}
- omp.task in_reduction(@add_f32 %x -> %prv : !llvm.ptr) {
+ omp.task in_reduction(byref @add_f32 %x -> %prv : !llvm.ptr) {
omp.terminator
}
llvm.return
More information about the llvm-branch-commits
mailing list