[Mlir-commits] [llvm] [mlir] [OMPIRBuilder] - Fix emitTargetTaskProxyFunc to not generate empty functions (PR #126958)
llvmlistbot at llvm.org
llvmlistbot at llvm.org
Wed Feb 12 11:38:44 PST 2025
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-mlir-llvm
Author: Pranav Bhandarkar (bhandarkar-pranav)
<details>
<summary>Changes</summary>
This is a fix for https://github.com/llvm/llvm-project/issues/126949 There are two issues being fixed here. First, in some cases, OMPIRBuilder generates empty target task proxy functions. This happens when the target kernel doesn't use any stack-allocated data (either no data or only globals). The second problem is encountered when the target task i.e the code that makes the target call spans a single basic block. This usually happens when we do not generate a target or device kernel launch and instead fall back to the host. In such cases, we end up not outlining the target task entirely. This can cause us to call target kernel twice - once via the target task proxy function and a second time via the host fallback
This PR fixes both of these problems and updates some tests to catch these problems should this patch fail.
---
Full diff: https://github.com/llvm/llvm-project/pull/126958.diff
3 Files Affected:
- (modified) llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp (+28-7)
- (modified) mlir/test/Target/LLVMIR/omptarget-depend-host-only.mlir (+21)
- (modified) mlir/test/Target/LLVMIR/omptarget-nowait-host-only.mlir (+20)
``````````diff
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index 91fc16e54c88f..69c76f7400904 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -7022,9 +7022,10 @@ static Function *emitTargetTaskProxyFunction(OpenMPIRBuilder &OMPBuilder,
Function *KernelLaunchFunction = StaleCI->getCalledFunction();
// StaleCI is the CallInst which is the call to the outlined
- // target kernel launch function. If there are values that the
- // outlined function uses then these are aggregated into a structure
- // which is passed as the second argument. If not, then there's
+ // target kernel launch function. If there are local live-in values
+ // that the outlined function uses then these are aggregated into a structure
+ // which is passed as the second argument. If there are no local live-in valluess
+ // or if all values used by the outlined kernel are global variables, then there's
// only one argument, the threadID. So, StaleCI can be
//
// %structArg = alloca { ptr, ptr }, align 8
@@ -7063,6 +7064,8 @@ static Function *emitTargetTaskProxyFunction(OpenMPIRBuilder &OMPBuilder,
// host and device.
assert((!HasShareds || (StaleCI->arg_size() == 2)) &&
"StaleCI with shareds should have exactly two arguments.");
+
+ Value *ThreadId = ProxyFn->getArg(0);
if (HasShareds) {
auto *ArgStructAlloca = dyn_cast<AllocaInst>(StaleCI->getArgOperand(1));
assert(ArgStructAlloca &&
@@ -7073,7 +7076,6 @@ static Function *emitTargetTaskProxyFunction(OpenMPIRBuilder &OMPBuilder,
AllocaInst *NewArgStructAlloca =
Builder.CreateAlloca(ArgStructType, nullptr, "structArg");
Value *TaskT = ProxyFn->getArg(1);
- Value *ThreadId = ProxyFn->getArg(0);
Value *SharedsSize =
Builder.getInt64(M.getDataLayout().getTypeStoreSize(ArgStructType));
@@ -7086,7 +7088,9 @@ static Function *emitTargetTaskProxyFunction(OpenMPIRBuilder &OMPBuilder,
LoadShared->getPointerAlignment(M.getDataLayout()), SharedsSize);
Builder.CreateCall(KernelLaunchFunction, {ThreadId, NewArgStructAlloca});
- }
+ } else
+ Builder.CreateCall(KernelLaunchFunction, {ThreadId});
+
Builder.CreateRetVoid();
return ProxyFn;
}
@@ -7229,11 +7233,28 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::emitTargetTask(
Builder, AllocaIP, ToBeDeleted, TargetTaskAllocaIP, "global.tid", false));
Builder.restoreIP(TargetTaskBodyIP);
-
if (Error Err = TaskBodyCB(DeviceID, RTLoc, TargetTaskAllocaIP))
return Err;
- OI.ExitBB = Builder.saveIP().getBlock();
+ // The outliner (CodeExtractor) extract a sequence or vector of blocks that
+ // it is given. These blocks are enumerated by
+ // OpenMPIRBuilder::OutlineInfo::collectBlocks which expects the OI.ExitBlock
+ // to be outside the region. In other words, OI.ExitBlock is expected to be
+ // the start of the region after the outlining. We used to set OI.ExitBlock
+ // to the InsertBlock after TaskBodyCB is done. This is fine in most cases
+ // except when the task body is a single basic block. In that case,
+ // OI.ExitBlock is set to the single task body block and will get left out of
+ // the outlining process. So, simply create a new empty block to which we
+ // uncoditionally branch from where TaskBodyCB left off
+ BasicBlock *TargetTaskContBlock =
+ BasicBlock::Create(Builder.getContext(), "target.task.cont");
+
+ auto *CurFn = Builder.GetInsertBlock()->getParent();
+ emitBranch(TargetTaskContBlock);
+ emitBlock(TargetTaskContBlock, CurFn, /*IsFinished=*/true);
+
+ OI.ExitBB = TargetTaskContBlock;
+
OI.PostOutlineCB = [this, ToBeDeleted, Dependencies, HasNoWait,
DeviceID](Function &OutlinedFn) mutable {
assert(OutlinedFn.getNumUses() == 1 &&
diff --git a/mlir/test/Target/LLVMIR/omptarget-depend-host-only.mlir b/mlir/test/Target/LLVMIR/omptarget-depend-host-only.mlir
index 621a206e18053..ece32bb5419c6 100644
--- a/mlir/test/Target/LLVMIR/omptarget-depend-host-only.mlir
+++ b/mlir/test/Target/LLVMIR/omptarget-depend-host-only.mlir
@@ -25,8 +25,29 @@ module attributes {omp.is_target_device = false} {
// CHECK: define void @omp_target_depend_()
// CHECK-NOT: define {{.*}} @
// CHECK-NOT: call i32 @__tgt_target_kernel({{.*}})
+// CHECK: call void @__kmpc_omp_task_begin_if0
+// CHECK-NEXT: call void @.omp_target_task_proxy_func
+// CHECK: call void @__kmpc_omp_task_complete_if0
+// https://github.com/llvm/llvm-project/issues/126949 exposes two issues
+// 1. Empty target task proxy functions
+// 2. When 1 fixed, it leads to a second problem of calling the omp target kernel twice
+// Once via the target task proxy function and a second time after the target task is done.
+// The following checks check problem #2.
+// functions. The following checks tests the fix for this issue.
+// CHECK-NEXT: br label %[[BLOCK_AFTER_OUTLINED_TARGET_TASK_BODY:.*]]
+// CHECK:[[BLOCK_AFTER_OUTLINED_TARGET_TASK_BODY]]:
+// CHECK-NEXT: ret void
+
+// CHECK: define internal void @omp_target_depend_..omp_par
// CHECK: call void @__omp_offloading_[[DEV:.*]]_[[FIL:.*]]_omp_target_depend__l[[LINE:.*]](ptr {{.*}})
+// CHECK-NEXT: br label %[[BLOCK_AFTER_TARGET_TASK_BODY:.*]]
+// CHECK: [[BLOCK_AFTER_TARGET_TASK_BODY]]:
// CHECK-NEXT: ret void
+
// CHECK: define internal void @__omp_offloading_[[DEV]]_[[FIL]]_omp_target_depend__l[[LINE]](ptr %[[ADDR_A:.*]])
// CHECK: store i32 100, ptr %[[ADDR_A]], align 4
+
+// The following check test for the fix of problem #1 as described in https://github.com/llvm/llvm-project/issues/126949
+// CHECK: define internal void @.omp_target_task_proxy_func
+// CHECK: call void @omp_target_depend_..omp_par
diff --git a/mlir/test/Target/LLVMIR/omptarget-nowait-host-only.mlir b/mlir/test/Target/LLVMIR/omptarget-nowait-host-only.mlir
index 6b634226a3568..94d8d052d087e 100644
--- a/mlir/test/Target/LLVMIR/omptarget-nowait-host-only.mlir
+++ b/mlir/test/Target/LLVMIR/omptarget-nowait-host-only.mlir
@@ -20,10 +20,30 @@ module attributes {omp.is_target_device = false} {
// CHECK: define void @omp_target_nowait_()
// CHECK-NOT: define {{.*}} @
// CHECK-NOT: call ptr @__kmpc_omp_target_task_alloc({{.*}})
+// CHECK: call void @__kmpc_omp_task_begin_if0
+// CHECK-NEXT: call void @.omp_target_task_proxy_func
+// CHECK: call void @__kmpc_omp_task_complete_if0
+// https://github.com/llvm/llvm-project/issues/126949 exposes two issues
+// 1. Empty target task proxy functions
+// 2. When 1 fixed, it leads to a second problem of calling the omp target kernel twice
+// Once via the target task proxy function and a second time after the target task is done.
+// The following checks check problem #2.
+// functions. The following checks tests the fix for this issue.
+// CHECK-NEXT: br label %[[BLOCK_AFTER_OUTLINED_TARGET_TASK_BODY:.*]]
+// CHECK:[[BLOCK_AFTER_OUTLINED_TARGET_TASK_BODY]]:
+// CHECK-NEXT: ret void
+
// Verify that we directly emit a call to the "target" region's body from the
// parent function of the the `omp.target` op.
+// CHECK: define internal void @omp_target_nowait_..omp_par
// CHECK: call void @__omp_offloading_[[DEV:.*]]_[[FIL:.*]]_omp_target_nowait__l[[LINE:.*]](ptr {{.*}})
+// CHECK-NEXT: br label %[[BLOCK_AFTER_TARGET_TASK_BODY:.*]]
+// CHECK: [[BLOCK_AFTER_TARGET_TASK_BODY]]:
// CHECK-NEXT: ret void
// CHECK: define internal void @__omp_offloading_[[DEV]]_[[FIL]]_omp_target_nowait__l[[LINE]](ptr %[[ADDR_X:.*]])
// CHECK: store float 5{{.*}}, ptr %[[ADDR_X]], align 4
+
+// The following check test for the fix of problem #1 as described in https://github.com/llvm/llvm-project/issues/126949
+// CHECK: define internal void @.omp_target_task_proxy_func
+// CHECK: call void @omp_target_nowait_..omp_par
``````````
</details>
https://github.com/llvm/llvm-project/pull/126958
More information about the Mlir-commits
mailing list