[Mlir-commits] [llvm] [mlir] [OMPIRBuilder] - Make offloading input data persist for deferred target tasks (PR #133499)
llvmlistbot at llvm.org
llvmlistbot at llvm.org
Fri Mar 28 11:10:34 PDT 2025
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-mlir-llvm
Author: Pranav Bhandarkar (bhandarkar-pranav)
<details>
<summary>Changes</summary>
When we offload to the target, the pointers to data used by the kernel are passed in arrays created by `OMPIRBuilder`. These arrays of pointers are allocated on the stack on the host. This is fine for the most part because absent the `nowait` clause, the default behavior is that target tasks are included tasks. That is, the host is blocked until the offloaded target kernel is done. In turn, this means that the host's stack frame is intact and accessing the array of pointers when offloading is safe. However, when `nowait` is used on the `!$ omp target` instance, then the target task is a deferred task meaning, the generating task on the host does not have to wait for the target kernel to finish. In such cases, it is very likely that the stack frame of the function invoking the target call is wound up thereby leading to memory access errors as shown below.
```
AMDGPU error: Error in hsa_amd_memory_pool_allocate: HSA_STATUS_ERROR_INVALID_ALLOCATION: The requested allocation is not valid.
AMDGPU error: Error in hsa_amd_memory_pool_allocate: HSA_STATUS_ERROR_INVALID_ALLOCATION: The requested allocation is not valid. "PluginInterface" error: Failure to allocate device memory: Failed to allocate from memory manager
fort.cod.out: /llvm/llvm-project/offload/plugins-nextgen/common/src/PluginInterface.cpp:1434: Error llvm::omp::target::plugin::PinnedAllocationMapTy::lockMappedHostBuffer(void *, size_t): Assertion `HstPtr && "Invalid pointer"' failed.
Aborted (core dumped)
```
This PR implements support in OMPIRBuilder to store these arrays of pointers in the task structure that is passed to the target task thereby ensuring it is available to the target task when the target task is eventually scheduled.
---
Patch is 31.32 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/133499.diff
4 Files Affected:
- (modified) llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h (+1-1)
- (modified) llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp (+254-68)
- (modified) mlir/test/Target/LLVMIR/omptarget-depend.mlir (+2-1)
- (modified) mlir/test/Target/LLVMIR/omptargetdata-nowait-llvm.mlir (+15-30)
``````````diff
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
index 28909cef4748d..ac814defbf071 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h
@@ -2481,7 +2481,7 @@ class OpenMPIRBuilder {
TargetTaskBodyCallbackTy TaskBodyCB, Value *DeviceID, Value *RTLoc,
OpenMPIRBuilder::InsertPointTy AllocaIP,
const SmallVector<llvm::OpenMPIRBuilder::DependData> &Dependencies,
- bool HasNoWait);
+ TargetDataRTArgs &RTArgs, bool HasNoWait);
/// Emit the arguments to be passed to the runtime library based on the
/// arrays of base pointers, pointers, sizes, map types, and mappers. If
diff --git a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
index 2e5ce5308eea5..9bb9dd65594bf 100644
--- a/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
+++ b/llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp
@@ -6603,7 +6603,7 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::createTargetData(
/*TargetTaskAllocaIP=*/{}));
else
cantFail(emitTargetTask(TaskBodyCB, DeviceID, SrcLocInfo, AllocaIP,
- /*Dependencies=*/{}, Info.HasNoWait));
+ /*Dependencies=*/{}, RTArgs, Info.HasNoWait));
} else {
Function *BeginMapperFunc = getOrCreateRuntimeFunctionPtr(
omp::OMPRTL___tgt_target_data_begin_mapper);
@@ -7051,9 +7051,25 @@ static Expected<Function *> createOutlinedFunction(
/// void @.omp_target_task_proxy_func(i32 %thread.id, ptr %task)
/// This function is called from emitTargetTask once the
/// code to launch the target kernel has been outlined already.
-static Function *emitTargetTaskProxyFunction(OpenMPIRBuilder &OMPBuilder,
- IRBuilderBase &Builder,
- CallInst *StaleCI) {
+static Function *emitTargetTaskProxyFunction(
+ OpenMPIRBuilder &OMPBuilder, IRBuilderBase &Builder, CallInst *StaleCI,
+ StructType *PrivatesTy, StructType *TaskWithPrivatesTy,
+ const size_t NumOffloadingArrays, const int SharedArgsOperandNo) {
+
+ // NumOffloadingArrays is the number of offloading arrays that we need to copy
+ // into the task structure so that the deferred target task can access this
+ // data even after the stack frame of the generating task has been rolled
+ // back. Offloading arrays contain base pointers, pointers, sizes etc
+ // of the data that the target kernel will access. In other words, the
+ // arrays of pointers held by OpenMPIRBuilder::TargetDataRTArgs
+ // The number of arrays and the size of each array depends on the specifics of
+ // the target call. These arrays are copied into a struct whose type is
+ // PrivatesTy. So, if NumOffloadingArrays is non-zero, PrivatesTy better
+ // not be nullptr
+ assert((!NumOffloadingArrays || PrivatesTy) &&
+ "PrivatesTy cannot be nullptr when there are offloadingArrays"
+ "to privatize");
+
Module &M = OMPBuilder.M;
// KernelLaunchFunction is the target launch function, i.e.
// the function that sets up kernel arguments and calls
@@ -7080,10 +7096,13 @@ static Function *emitTargetTaskProxyFunction(OpenMPIRBuilder &OMPBuilder,
// call void @_QQmain..omp_par.1(i32 %global.tid.val6)
OpenMPIRBuilder::InsertPointTy IP(StaleCI->getParent(),
StaleCI->getIterator());
+
LLVMContext &Ctx = StaleCI->getParent()->getContext();
+
Type *ThreadIDTy = Type::getInt32Ty(Ctx);
Type *TaskPtrTy = OMPBuilder.TaskPtr;
Type *TaskTy = OMPBuilder.Task;
+
auto ProxyFnTy =
FunctionType::get(Builder.getVoidTy(), {ThreadIDTy, TaskPtrTy},
/* isVarArg */ false);
@@ -7093,21 +7112,33 @@ static Function *emitTargetTaskProxyFunction(OpenMPIRBuilder &OMPBuilder,
ProxyFn->getArg(0)->setName("thread.id");
ProxyFn->getArg(1)->setName("task");
+ bool HasShareds = SharedArgsOperandNo > 0;
+ bool HasOffloadingArrays = NumOffloadingArrays > 0;
BasicBlock *EntryBB =
BasicBlock::Create(Builder.getContext(), "entry", ProxyFn);
Builder.SetInsertPoint(EntryBB);
- bool HasShareds = StaleCI->arg_size() > 1;
- // TODO: This is a temporary assert to prove to ourselves that
- // the outlined target launch function is always going to have
- // atmost two arguments if there is any data shared between
- // host and device.
- assert((!HasShareds || (StaleCI->arg_size() == 2)) &&
- "StaleCI with shareds should have exactly two arguments.");
-
Value *ThreadId = ProxyFn->getArg(0);
+ Value *TaskWithPrivates = ProxyFn->getArg(1);
+
+ SmallVector<Value *> KernelLaunchArgs;
+ KernelLaunchArgs.reserve(StaleCI->arg_size());
+ KernelLaunchArgs.push_back(ThreadId);
+
+ if (HasOffloadingArrays) {
+ assert(TaskTy != TaskWithPrivatesTy &&
+ "If there are offloading arrays to pass to the target"
+ "TaskTy cannot be the same as TaskWithPrivatesTy");
+ Value *Privates =
+ Builder.CreateStructGEP(TaskWithPrivatesTy, TaskWithPrivates, 1);
+ for (unsigned int i = 0; i < NumOffloadingArrays; ++i)
+ KernelLaunchArgs.push_back(
+ Builder.CreateStructGEP(PrivatesTy, Privates, i));
+ }
+
if (HasShareds) {
- auto *ArgStructAlloca = dyn_cast<AllocaInst>(StaleCI->getArgOperand(1));
+ auto *ArgStructAlloca =
+ dyn_cast<AllocaInst>(StaleCI->getArgOperand(SharedArgsOperandNo));
assert(ArgStructAlloca &&
"Unable to find the alloca instruction corresponding to arguments "
"for extracted function");
@@ -7115,27 +7146,76 @@ static Function *emitTargetTaskProxyFunction(OpenMPIRBuilder &OMPBuilder,
AllocaInst *NewArgStructAlloca =
Builder.CreateAlloca(ArgStructType, nullptr, "structArg");
- Value *TaskT = ProxyFn->getArg(1);
+
Value *SharedsSize =
Builder.getInt64(M.getDataLayout().getTypeStoreSize(ArgStructType));
- Value *Shareds = Builder.CreateStructGEP(TaskTy, TaskT, 0);
+ Value *TaskT =
+ Builder.CreateStructGEP(TaskWithPrivatesTy, TaskWithPrivates, 0);
+ Value *Shareds = TaskT;
+ // TaskWithPrivatesTy can be
+ // %struct.task_with_privates = type { %struct.kmp_task_ompbuilder_t,
+ // %struct.privates }
+ // OR
+ // %struct.kmp_task_ompbuilder_t ;; This is simply TaskTy
+ // In the former case, that is when TaskWithPrivatesTy is not the same as
+ // TaskTy, then its first member has to be the task descriptor. TaskTy is
+ // the type of the task descriptor. TaskT is the pointer to the task
+ // descriptor. Loading the first member of TaskT, gives us the pointer to
+ // shared data.
+ if (TaskWithPrivatesTy != TaskTy)
+ Shareds = Builder.CreateStructGEP(TaskTy, TaskT, 0);
LoadInst *LoadShared =
Builder.CreateLoad(PointerType::getUnqual(Ctx), Shareds);
Builder.CreateMemCpy(
NewArgStructAlloca, NewArgStructAlloca->getAlign(), LoadShared,
LoadShared->getPointerAlignment(M.getDataLayout()), SharedsSize);
-
- Builder.CreateCall(KernelLaunchFunction, {ThreadId, NewArgStructAlloca});
- } else {
- Builder.CreateCall(KernelLaunchFunction, {ThreadId});
+ KernelLaunchArgs.push_back(NewArgStructAlloca);
}
-
+ Builder.CreateCall(KernelLaunchFunction, KernelLaunchArgs);
Builder.CreateRetVoid();
return ProxyFn;
}
+// This function returns a struct that has at most two members.
+// The first member is always %struct.kmp_task_ompbuilder_t, that is the task
+// descriptor. The second member, if needed, is a struct containing arrays
+// that need to be passed to the offloaded target kernel. For example,
+// if .offload_baseptrs, .offload_ptrs and .offload_sizes have to be passed to
+// the target kernel and their types are [3 x ptr], [3 x ptr] and [3 x i64]
+// respectively, then the types created by this function are
+//
+// %struct.privates = type { [3 x ptr], [3 x ptr], [3 x i64] }
+// %struct.task_with_privates = type { %struct.kmp_task_ompbuilder_t,
+// %struct.privates }
+// %struct.task_with_privates is returned by this function.
+// If there aren't any offloading arrays to pass to the target kernel,
+// %struct.kmp_task_ompbuilder_t is returned.
+static StructType *
+createTaskWithPrivatesTy(Type *Task,
+ ArrayRef<Value *> OffloadingArraysToPrivatize) {
+
+ if (OffloadingArraysToPrivatize.empty())
+ return static_cast<StructType *>(Task);
+
+ SmallVector<Type *, 4> StructFieldTypes;
+ for (auto &V : OffloadingArraysToPrivatize) {
+ assert(V->getType()->isPointerTy() &&
+ "Expected pointer to array to privatize. Got a non-pointer value "
+ "instead");
+ if (auto *GEP = dyn_cast<GetElementPtrInst>(V))
+ StructFieldTypes.push_back(GEP->getSourceElementType());
+ else if (auto *Alloca = dyn_cast<AllocaInst>(V))
+ StructFieldTypes.push_back(Alloca->getAllocatedType());
+ else
+ llvm_unreachable("Unhandled Instruction type");
+ }
+ StructType *PrivatesStructTy =
+ StructType::create(StructFieldTypes, "struct.privates");
+ return StructType::create({Task, PrivatesStructTy},
+ "struct.task_with_privates");
+}
static Error emitTargetOutlinedFunction(
OpenMPIRBuilder &OMPBuilder, IRBuilderBase &Builder, bool IsOffloadEntry,
TargetRegionEntryInfo &EntryInfo,
@@ -7161,7 +7241,7 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::emitTargetTask(
TargetTaskBodyCallbackTy TaskBodyCB, Value *DeviceID, Value *RTLoc,
OpenMPIRBuilder::InsertPointTy AllocaIP,
const SmallVector<llvm::OpenMPIRBuilder::DependData> &Dependencies,
- bool HasNoWait) {
+ TargetDataRTArgs &RTArgs, bool HasNoWait) {
// The following explains the code-gen scenario for the `target` directive. A
// similar scneario is followed for other device-related directives (e.g.
@@ -7171,27 +7251,30 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::emitTargetTask(
// When we arrive at this function, the target region itself has been
// outlined into the function OutlinedFn.
// So at ths point, for
- // --------------------------------------------------
+ // --------------------------------------------------------------
// void user_code_that_offloads(...) {
- // omp target depend(..) map(from:a) map(to:b, c)
- // a = b + c
+ // omp target depend(..) map(from:a) map(to:b) private(i)
+ // do i = 1, 10
+ // a(i) = b(i) + n
// }
//
- // --------------------------------------------------
+ // --------------------------------------------------------------
//
// we have
//
- // --------------------------------------------------
+ // --------------------------------------------------------------
//
// void user_code_that_offloads(...) {
- // %.offload_baseptrs = alloca [3 x ptr], align 8
- // %.offload_ptrs = alloca [3 x ptr], align 8
- // %.offload_mappers = alloca [3 x ptr], align 8
+ // %.offload_baseptrs = alloca [2 x ptr], align 8
+ // %.offload_ptrs = alloca [2 x ptr], align 8
+ // %.offload_mappers = alloca [2 x ptr], align 8
// ;; target region has been outlined and now we need to
// ;; offload to it via a target task.
// }
- // void outlined_device_function(ptr a, ptr b, ptr c) {
- // *a = *b + *c
+ // void outlined_device_function(ptr a, ptr b, ptr n) {
+ // n = *n_ptr;
+ // do i = 1, 10
+ // a(i) = b(i) + n
// }
//
// We have to now do the following
@@ -7204,33 +7287,58 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::emitTargetTask(
// (iii) Create a task with the task entry point created in (ii)
//
// That is we create the following
- //
+ // struct task_with_privates {
+ // struct kmp_task_ompbuilder_t;
+ // struct privates {
+ // [2 x ptr], ; baseptrs
+ // [2 x ptr] ; ptrs
+ // [2 x i64] ; sizes
+ // }
+ // }
// void user_code_that_offloads(...) {
- // %.offload_baseptrs = alloca [3 x ptr], align 8
- // %.offload_ptrs = alloca [3 x ptr], align 8
- // %.offload_mappers = alloca [3 x ptr], align 8
+ // %.offload_baseptrs = alloca [2 x ptr], align 8
+ // %.offload_ptrs = alloca [2 x ptr], align 8
+ // %.offload_sizes = alloca [2 x i64], align 8
//
// %structArg = alloca { ptr, ptr, ptr }, align 8
- // %strucArg[0] = %.offload_baseptrs
- // %strucArg[1] = %.offload_ptrs
- // %strucArg[2] = %.offload_mappers
- // proxy_target_task = @__kmpc_omp_task_alloc(...,
- // @.omp_target_task_proxy_func)
- // memcpy(proxy_target_task->shareds, %structArg, sizeof(structArg))
+ // %strucArg[0] = a
+ // %strucArg[1] = b
+ // %strucArg[2] = &n
+ //
+ // target_task_with_privates = @__kmpc_omp_target_task_alloc(...,
+ // sizeof(kmp_task_ompbuilder_t),
+ // sizeof(structArg),
+ // @.omp_target_task_proxy_func,
+ // ...)
+ // memcpy(target_task->shareds, %structArg, sizeof(structArg))
+ // memcpy(target_task->privates->baseptrs,
+ // offload_baseptrs, sizeof(offload_baseptrs)
+ // memcpy(target_task->privates->ptrs,
+ // offload_ptrs, sizeof(offload_ptrs)
+ // memcpy(target_task->privates->sizes,
+ // offload_sizes, sizeof(offload_sizes)
// dependencies_array = ...
// ;; if nowait not present
// call @__kmpc_omp_wait_deps(..., dependencies_array)
// call @__kmpc_omp_task_begin_if0(...)
// call @ @.omp_target_task_proxy_func(i32 thread_id, ptr
- // %proxy_target_task) call @__kmpc_omp_task_complete_if0(...)
+ // %target_task_with_privates)
+ // call @__kmpc_omp_task_complete_if0(...)
// }
//
// define internal void @.omp_target_task_proxy_func(i32 %thread.id,
// ptr %task) {
// %structArg = alloca {ptr, ptr, ptr}
- // %shared_data = load (getelementptr %task, 0, 0)
- // mempcy(%structArg, %shared_data, sizeof(structArg))
- // kernel_launch_function(%thread.id, %structArg)
+ // %task_ptr = getelementptr(%task, 0, 0)
+ // %shared_data = load (getelementptr %task_ptr, 0, 0)
+ // mempcy(%structArg, %shared_data, sizeof(%structArg))
+ //
+ // %offloading_arrays = getelementptr(%task, 0, 1)
+ // %offload_baseptrs = getelementptr(%offloading_arrays, 0, 0)
+ // %offload_ptrs = getelementptr(%offloading_arrays, 0, 1)
+ // %offload_sizes = getelementptr(%offloading_arrays, 0, 2)
+ // kernel_launch_function(%thread.id, %offload_baseptrs, %offload_ptrs,
+ // %offload_sizes, %structArg)
// }
//
// We need the proxy function because the signature of the task entry point
@@ -7238,21 +7346,21 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::emitTargetTask(
// that of the kernel_launch function.
//
// kernel_launch_function is generated by emitKernelLaunch and has the
- // always_inline attribute.
- // void kernel_launch_function(thread_id,
- // structArg) alwaysinline {
+ // always_inline attribute. For this example, it'll look like so
+ // void kernel_launch_function(%thread_id, %offload_baseptrs, %offload_ptrs,
+ // %offload_sizes, %structArg) alwaysinline {
// %kernel_args = alloca %struct.__tgt_kernel_arguments, align 8
- // offload_baseptrs = load(getelementptr structArg, 0, 0)
- // offload_ptrs = load(getelementptr structArg, 0, 1)
- // offload_mappers = load(getelementptr structArg, 0, 2)
+ // ; load aggregated data from %structArg
// ; setup kernel_args using offload_baseptrs, offload_ptrs and
- // ; offload_mappers
+ // ; offload_sizes
// call i32 @__tgt_target_kernel(...,
// outlined_device_function,
// ptr %kernel_args)
// }
- // void outlined_device_function(ptr a, ptr b, ptr c) {
- // *a = *b + *c
+ // void outlined_device_function(ptr a, ptr b, ptr n) {
+ // n = *n_ptr;
+ // do i = 1, 10
+ // a(i) = b(i) + n
// }
//
BasicBlock *TargetTaskBodyBB =
@@ -7273,6 +7381,7 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::emitTargetTask(
OI.ExcludeArgsFromAggregate.push_back(createFakeIntVal(
Builder, AllocaIP, ToBeDeleted, TargetTaskAllocaIP, "global.tid", false));
+ // Generate the task body which will subsequently be outlined.
Builder.restoreIP(TargetTaskBodyIP);
if (Error Err = TaskBodyCB(DeviceID, RTLoc, TargetTaskAllocaIP))
return Err;
@@ -7291,15 +7400,56 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::emitTargetTask(
emitBlock(OI.ExitBB, Builder.GetInsertBlock()->getParent(),
/*IsFinished=*/true);
- OI.PostOutlineCB = [this, ToBeDeleted, Dependencies, HasNoWait,
- DeviceID](Function &OutlinedFn) mutable {
+ SmallVector<Value *, 2> OffloadingArraysToPrivatize;
+ if (DeviceID && HasNoWait) {
+ for (auto *V :
+ {RTArgs.BasePointersArray, RTArgs.PointersArray, RTArgs.MappersArray,
+ RTArgs.MapNamesArray, RTArgs.MapTypesArray, RTArgs.MapTypesArrayEnd,
+ RTArgs.SizesArray}) {
+ if (V && !isa<ConstantPointerNull>(V) && !isa<GlobalVariable>(V)) {
+ OffloadingArraysToPrivatize.push_back(V);
+ OI.ExcludeArgsFromAggregate.push_back(V);
+ }
+ }
+ }
+ OI.PostOutlineCB = [this, ToBeDeleted, Dependencies, HasNoWait, DeviceID,
+ OffloadingArraysToPrivatize](
+ Function &OutlinedFn) mutable {
assert(OutlinedFn.getNumUses() == 1 &&
"there must be a single user for the outlined function");
CallInst *StaleCI = cast<CallInst>(OutlinedFn.user_back());
- bool HasShareds = StaleCI->arg_size() > 1;
- Function *ProxyFn = emitTargetTaskProxyFunction(*this, Builder, StaleCI);
+ // The first argument of StaleCI is always the thread id.
+ // The next few arguments are the pointers to offloading arrays
+ // if any. (See OffloadingArraysToPrivatize)
+ // Finally, all other local values that are live-in into the outlined region
+ // end up in a structure whose pointer is passed as the last argument. This
+ // piece of data is passed in the "shared" field of the task structure. So,
+ // we know we have to pass shareds to the task if the number of arguments is
+ // greater than OffloadingArraysToPrivatize.size() + 1 The 1 is for the
+ // thread id. Further, for safety, we assert that the number of arguments of
+ // StaleCI is exactly OffloadingArraysToPrivatize.size() + 2
+ const unsigned int NumStaleCIArgs = StaleCI->arg_size();
+ bool HasShareds = NumStaleCIArgs > OffloadingArraysToPrivatize.size() + 1;
+ assert(
+ !HasShareds ||
+ NumStaleCIArgs == (OffloadingArraysToPrivatize.size() + 2) &&
+ "Wrong number of arguments for StaleCI when shareds are present");
+ int SharedArgOperandNo =
+ HasShareds ? OffloadingArraysToPrivatize.size() + 1 : 0;
+
+ StructType *TaskWithPrivatesTy =
+ createTaskWithPrivatesTy(Task, OffloadingArraysToPrivatize);
+ StructType *PrivatesTy = nullptr;
+
+ if (OffloadingArraysToPrivatize.size())
+ PrivatesTy =
+ static_cast<StructType *>(TaskWithPrivatesTy->getElementType(1));
+
+ Function *ProxyFn = emitTargetTaskProxyFunction(
+ *this, Builder, StaleCI, PrivatesTy, TaskWithPrivatesTy,
+ OffloadingArraysToPrivatize.size(), SharedArgOperandNo);
LLVM_DEBUG(dbgs() << "Proxy task entry function created: " << *ProxyFn
<< "\n");
@@ -7330,17 +7480,19 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::emitTargetTask(
// Argument - `sizeof_kmp_task_t` (TaskSize)
// Tasksize refers to the size in bytes of kmp_task_t data...
[truncated]
``````````
</details>
https://github.com/llvm/llvm-project/pull/133499
More information about the Mlir-commits
mailing list