[llvm-branch-commits] [llvm] [mlir] [OpenMP][OMPIRBuilder] Use device shared memory for arg structures (PR #150925)
Michael Kruse via llvm-branch-commits
llvm-branch-commits at lists.llvm.org
Thu Aug 14 06:36:38 PDT 2025
================
@@ -1614,6 +1650,50 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::createParallel(
IfCondition, NumThreads, PrivTID, PrivTIDAddr,
ThreadID, ToBeDeletedVec);
};
+
+ std::optional<omp::OMPTgtExecModeFlags> ExecMode =
+ getTargetKernelExecMode(*OuterFn);
+
+ // If OuterFn is not a Generic kernel, skip custom allocation. This causes
+ // the CodeExtractor to follow its default behavior. Otherwise, we need to
+ // use device shared memory to allocate argument structures.
+ if (ExecMode && *ExecMode & OMP_TGT_EXEC_MODE_GENERIC) {
+ OI.CustomArgAllocatorCB = [this,
+ EntryBB](BasicBlock *, BasicBlock::iterator,
+ Type *ArgTy, const Twine &Name) {
+ // Instead of using the insertion point provided by the CodeExtractor,
+ // here we need to use the block that eventually calls the outlined
+ // function for the `parallel` construct.
+ //
+ // The reason is that the explicit deallocation call will be inserted
+ // within the outlined function, whereas the alloca insertion point
+ // might actually be located somewhere else in the caller. This becomes
+ // a problem when e.g. `parallel` is inside of a `distribute` construct,
+ // because the deallocation would be executed multiple times and the
+ // allocation just once (outside of the loop).
+ //
+ // TODO: Ideally, we'd want to do the allocation and deallocation
+ // outside of the `parallel` outlined function, hence using here the
+ // insertion point provided by the CodeExtractor. We can't do this at
+ // the moment because there is currently no way of passing an eligible
+ // insertion point for the explicit deallocation to the CodeExtractor,
+ // as that block is created (at least when nested inside of
+ // `distribute`) sometime after createParallel() completed, so it can't
+ // be stored in the OutlineInfo structure here.
----------------
Meinersbur wrote:
This was meant as on open question since I do not fully uderstand the problem. The idea with the temporary block was to create an unconnected BB and then later connect/move it to the expected location e.g. in the finalize() method or by the caller of `createParallel`, though I do not know how they would know where to insert it. That temporary BB could also be created by the caller, have pass it to createParallel (e.g. as `deallocIP`), then make it the caller's responsibility to connect it. It sounds like you have about the same in mind.
OK to defer it to some later point.
https://github.com/llvm/llvm-project/pull/150925
More information about the llvm-branch-commits
mailing list