[llvm-branch-commits] [llvm] [mlir] [OpenMP][OMPIRBuilder] Use device shared memory for arg structures (PR #150925)
Michael Kruse via llvm-branch-commits
llvm-branch-commits at lists.llvm.org
Thu Jul 31 05:49:34 PDT 2025
================
@@ -1614,6 +1650,50 @@ OpenMPIRBuilder::InsertPointOrErrorTy OpenMPIRBuilder::createParallel(
IfCondition, NumThreads, PrivTID, PrivTIDAddr,
ThreadID, ToBeDeletedVec);
};
+
+ std::optional<omp::OMPTgtExecModeFlags> ExecMode =
+ getTargetKernelExecMode(*OuterFn);
+
+ // If OuterFn is not a Generic kernel, skip custom allocation. This causes
+ // the CodeExtractor to follow its default behavior. Otherwise, we need to
+ // use device shared memory to allocate argument structures.
+ if (ExecMode && *ExecMode & OMP_TGT_EXEC_MODE_GENERIC) {
+ OI.CustomArgAllocatorCB = [this,
+ EntryBB](BasicBlock *, BasicBlock::iterator,
+ Type *ArgTy, const Twine &Name) {
+ // Instead of using the insertion point provided by the CodeExtractor,
+ // here we need to use the block that eventually calls the outlined
+ // function for the `parallel` construct.
+ //
+ // The reason is that the explicit deallocation call will be inserted
+ // within the outlined function, whereas the alloca insertion point
+ // might actually be located somewhere else in the caller. This becomes
+ // a problem when e.g. `parallel` is inside of a `distribute` construct,
+ // because the deallocation would be executed multiple times and the
+ // allocation just once (outside of the loop).
+ //
+ // TODO: Ideally, we'd want to do the allocation and deallocation
+ // outside of the `parallel` outlined function, hence using here the
+ // insertion point provided by the CodeExtractor. We can't do this at
+ // the moment because there is currently no way of passing an eligible
+ // insertion point for the explicit deallocation to the CodeExtractor,
+ // as that block is created (at least when nested inside of
+ // `distribute`) sometime after createParallel() completed, so it can't
+ // be stored in the OutlineInfo structure here.
----------------
Meinersbur wrote:
Could a temporary block be created that is then connected to the CFG later?
https://github.com/llvm/llvm-project/pull/150925
More information about the llvm-branch-commits
mailing list