[llvm] [OpenMP][Offload] Handle for non-memberof present/to/from entries irrespective of order. (PR #165494)
Abhinav Gaba via llvm-commits
llvm-commits at lists.llvm.org
Tue Oct 28 16:45:50 PDT 2025
https://github.com/abhinavgaba created https://github.com/llvm/llvm-project/pull/165494
For cases like:
```c
map(alloc: x) map(to: x)
```
If the entry of `map(to: x)` is encountered after the entry for `map(alloc:x)`, we still want to do a data-transfer even though the ref-count of `x` was already 0, because the new allocation for `x` happened as part of the current directive.
Similarly, for:
```c
... map(alloc: x) map(from: x)
```
If the entry for `map(from:x)` is encountered before the entry for `map(alloc:x)`, we want to do a data-transfer even though the ref-count was not 0 when looking at the `from` entry, because by the end of the directive, the ref-count of `x` will go down to zero.
And for:
```c
... map(from : x) map(alloc, present: x)
```
If the "present" entry is encountered after the "from" entry, then it becomes a no-op, as the "from" entry will do an allocation if no match was found.
In this PR, these are handled by the runtime via the following:
* For `to` and `present`, we also look-up in the existing table where we tracked new allocations when making the decision for the entry.
* For `from`, we keep track of any deferred data transfers and when the ref-count of a pointer goes to zero, see if there were any previously deferred `from` transfers for that pointer.
This can be done in the compiler, and that would avoid any runtime overhead, but it would require creating two separate offload struct entries for the entry and exit mappings (even for the `target` construct), with properly decayed maps, and either:
**(1)** Sorted in the following order:
* `present > to/tofrom > ...` for the implied `target enter data`; and
* `from/tofrom > ...` for the `target exit data` e.g.
```c
#pragma omp target map(to: x) map(present, alloc: x) map(always, from: x)
// has to be broken into:
// from becomes alloc on entry:
// #pragma omp target enter data map(present, alloc: x)
// map(to: x)
// map(alloc: x)
//
// "present" and "to" just "decay" into "alloc"
// #pragma omp target exit data map(always, from: x)
// map(alloc: x)
// map(alloc: x)
```
Or,
**(2)** Merged into one entry each on the `target enter/exit data` directives.
```c
#pragma omp target map(to: x) map(present, alloc: x) map(always, from: x)
// has to be broken into:
// from becomes alloc on entry:
// #pragma omp target enter data map(present, to: x)
//
// "present" and "to" just "decay" into "alloc"
// #pragma omp target exit data map(always, from: x)
```
The number of entries on the two would need to stay the same on the two to avoid ref-count mismatch.
(1) would be simpler, but won't likely work for cases like:
```c
... map(delete: x) map(from:x)
```
as there is no clear "winner" between the two. So, for such cases, the compiler would likely have to do (2), which is the cleanest solution, but will take longer to implement. For EXPR comparisons, it can build-upon the `AttachPtrExprComparator` that was implemented as part of #153683, but that should probably wait for the PR to be merged to avoid conflicts.
Another alternative is to sort the entries in the runtime, which may be slower than on-demand lookups/updates that this PR does, because we always would be doing this sorting even when not needed, but may be faster in others where the constant-time overhead of map/set insertions/lookups becomes too large because of the number of maps. But that will still have to worry about the `from` + `delete` case.
>From 8f583973a9692e00122cee93cb79a3cc730a8f6a Mon Sep 17 00:00:00 2001
From: Abhinav Gaba <abhinav.gaba at intel.com>
Date: Tue, 28 Oct 2025 14:16:23 -0700
Subject: [PATCH] [OpenMP][Offload] Handle for non-memberof present/to/from
entries irrespective of order.
For cases like:
```c
map(alloc: x) map(to: x)
```
If the entry of `map(to: x)` is encountered after the entry for
`map(alloc:x)`, we still want to do a data-transfer even though the
ref-count of `x` was already 0, because the new allocation for `x`
happened as part of the current directive.
Similarly, for:
```c
... map(alloc: x) map(from: x)
```
If the entry for `map(from:x)` is encountered before the entry for
`map(alloc:x)`, we want to do a data-transfer even though the
ref-count was not 0 when looking at the `from` entry, because by the end of
the directive, the ref-count of `x` will go down to zero.
And for:
```c
... map(from : x) map(alloc, present: x)
```
If the "present" entry is encountered after the "from" entry, then it becomes
a no-op, as the "from" entry will do an allocation if no match was found.
In this PR, these are handled by the runtime via the following:
* For `to` and `present`, we also look-up in the existing table where we tracked
new allocations when making the decision for the entry.
* For `from`, we keep track of any deferred data transfers and when the
ref-count of a pointer goes to zero, see if there were any previously
deferred `from` transfers for that pointer.
This can be done in the compiler, and that would avoid any runtime
overhead, but it would require creating two separate offload struct entries
for the entry and exit mappings (even for the `target` construct),
with properly decayed maps, and either:
(1) sorted in order of:
* `present > to > ...` for the implied `target enter data`; and
* `from > ...` for the `target exit data`
e.g.
```c
#pragma omp target map(to: x) map(present, alloc: x) map(always, from: x)
// has to be broken into:
// from becomes alloc on entry:
// #pragma omp target enter data map(present, alloc: x)
// map(to: x)
// map(alloc: x)
//
// "present" and "to" just "decay" into "alloc"
// #pragma omp target exit data map(always, from: x)
// map(alloc: x)
// map(alloc: x)
```
Or,
(2) Merged into one entry each on the `target enter/exit data`
directives.
```c
#pragma omp target map(to: x) map(present, alloc: x) map(always, from: x)
// has to be broken into:
// from becomes alloc on entry:
// #pragma omp target enter data map(present, to: x)
//
// "present" and "to" just "decay" into "alloc"
// #pragma omp target exit data map(always, from: x)
```
The number of entries on the two would need to stay the same on the two to avoid
ref-count mismatch.
(1) would be simpler, but won't likely work for cases like:
```c
... map(delete: x) map(from:x)
```
as there is no clear "winner" between the two. So, for such cases, the compiler
would likely have to do (2), which is the cleanest solution, but will take
longer to implement. For EXPR comparisons, it can build-upon the
`AttachPtrExprComparator` that was implemented as part of #153683,
but that should probably wait for the PR to be merged to avoid
conflicts.
Another alternative is to sort the entries in the runtime, which may be
slower than on-demand lookups/updates that this PR does, because we
always would be doing this sorting even when not needed, but may be
faster in others where the constant-time overhead of map/set
insertions/lookups becomes too large because of the number of maps. But
that will still have to worry about the `from` + `delete` case.
---
offload/include/OpenMP/Mapping.h | 34 +++--
offload/libomptarget/OpenMP/Mapping.cpp | 15 +-
offload/libomptarget/interface.cpp | 17 ++-
offload/libomptarget/omptarget.cpp | 137 +++++++++++++-----
.../mapping/map_ordering_tgt_alloc_from_to.c | 14 ++
.../map_ordering_tgt_alloc_present_tofrom.c | 27 ++++
.../mapping/map_ordering_tgt_alloc_tofrom.c | 14 ++
.../map_ordering_tgt_data_alloc_from.c | 14 ++
.../map_ordering_tgt_data_alloc_to_from.c | 17 +++
.../map_ordering_tgt_data_alloc_tofrom.c | 17 +++
10 files changed, 249 insertions(+), 57 deletions(-)
create mode 100644 offload/test/mapping/map_ordering_tgt_alloc_from_to.c
create mode 100644 offload/test/mapping/map_ordering_tgt_alloc_present_tofrom.c
create mode 100644 offload/test/mapping/map_ordering_tgt_alloc_tofrom.c
create mode 100644 offload/test/mapping/map_ordering_tgt_data_alloc_from.c
create mode 100644 offload/test/mapping/map_ordering_tgt_data_alloc_to_from.c
create mode 100644 offload/test/mapping/map_ordering_tgt_data_alloc_tofrom.c
diff --git a/offload/include/OpenMP/Mapping.h b/offload/include/OpenMP/Mapping.h
index 45bd9c6e7da8b..517f6c0a99244 100644
--- a/offload/include/OpenMP/Mapping.h
+++ b/offload/include/OpenMP/Mapping.h
@@ -484,20 +484,26 @@ struct AttachMapInfo {
MapType(Type), Pointername(Name) {}
};
-/// Structure to track ATTACH entries and new allocations across recursive calls
-/// (for handling mappers) to targetDataBegin for a given construct.
-struct AttachInfoTy {
- /// ATTACH map entries for deferred processing.
+/// Structure to track new allocations, ATTACH entries and deferred data
+/// transfer information for a given construct, across recursive calls (for
+/// handling mappers) to targetDataBegin/targetDataEnd.
+struct StateInfoTy {
+ /// ATTACH map entries for deferred processing until all other maps are done.
llvm::SmallVector<AttachMapInfo> AttachEntries;
+ /// Host pointers for which new memory was allocated.
/// Key: host pointer, Value: allocation size.
llvm::DenseMap<void *, int64_t> NewAllocations;
- AttachInfoTy() = default;
+ /// Host pointers that had a FROM entry, but for which a data transfer didn't
+ /// occur due to the ref-count not being zero.
+ llvm::SmallSet<void *, 32> DeferredFromPtrs;
+
+ StateInfoTy() = default;
// Delete copy constructor and copy assignment operator to prevent copying
- AttachInfoTy(const AttachInfoTy &) = delete;
- AttachInfoTy &operator=(const AttachInfoTy &) = delete;
+ StateInfoTy(const StateInfoTy &) = delete;
+ StateInfoTy &operator=(const StateInfoTy &) = delete;
};
// Function pointer type for targetData* functions (targetDataBegin,
@@ -505,7 +511,7 @@ struct AttachInfoTy {
typedef int (*TargetDataFuncPtrTy)(ident_t *, DeviceTy &, int32_t, void **,
void **, int64_t *, int64_t *,
map_var_info_t *, void **, AsyncInfoTy &,
- AttachInfoTy *, bool);
+ StateInfoTy *, bool);
void dumpTargetPointerMappings(const ident_t *Loc, DeviceTy &Device,
bool toStdOut = false);
@@ -514,24 +520,22 @@ int targetDataBegin(ident_t *Loc, DeviceTy &Device, int32_t ArgNum,
void **ArgsBase, void **Args, int64_t *ArgSizes,
int64_t *ArgTypes, map_var_info_t *ArgNames,
void **ArgMappers, AsyncInfoTy &AsyncInfo,
- AttachInfoTy *AttachInfo = nullptr,
- bool FromMapper = false);
+ StateInfoTy *StateInfo = nullptr, bool FromMapper = false);
int targetDataEnd(ident_t *Loc, DeviceTy &Device, int32_t ArgNum,
void **ArgBases, void **Args, int64_t *ArgSizes,
int64_t *ArgTypes, map_var_info_t *ArgNames,
void **ArgMappers, AsyncInfoTy &AsyncInfo,
- AttachInfoTy *AttachInfo = nullptr, bool FromMapper = false);
+ StateInfoTy *StateInfo = nullptr, bool FromMapper = false);
int targetDataUpdate(ident_t *Loc, DeviceTy &Device, int32_t ArgNum,
void **ArgsBase, void **Args, int64_t *ArgSizes,
int64_t *ArgTypes, map_var_info_t *ArgNames,
void **ArgMappers, AsyncInfoTy &AsyncInfo,
- AttachInfoTy *AttachInfo = nullptr,
- bool FromMapper = false);
+ StateInfoTy *StateInfo = nullptr, bool FromMapper = false);
// Process deferred ATTACH map entries collected during targetDataBegin.
-int processAttachEntries(DeviceTy &Device, AttachInfoTy &AttachInfo,
+int processAttachEntries(DeviceTy &Device, StateInfoTy &StateInfo,
AsyncInfoTy &AsyncInfo);
struct MappingInfoTy {
@@ -572,7 +576,7 @@ struct MappingInfoTy {
bool HasFlagTo, bool HasFlagAlways, bool IsImplicit, bool UpdateRefCount,
bool HasCloseModifier, bool HasPresentModifier, bool HasHoldModifier,
AsyncInfoTy &AsyncInfo, HostDataToTargetTy *OwnedTPR = nullptr,
- bool ReleaseHDTTMap = true);
+ bool ReleaseHDTTMap = true, StateInfoTy *StateInfo = nullptr);
/// Return the target pointer for \p HstPtrBegin in \p HDTTMap. The accessor
/// ensures exclusive access to the HDTT map.
diff --git a/offload/libomptarget/OpenMP/Mapping.cpp b/offload/libomptarget/OpenMP/Mapping.cpp
index 9b3533895f2a6..a3f634bc0a9eb 100644
--- a/offload/libomptarget/OpenMP/Mapping.cpp
+++ b/offload/libomptarget/OpenMP/Mapping.cpp
@@ -202,7 +202,8 @@ TargetPointerResultTy MappingInfoTy::getTargetPointer(
int64_t TgtPadding, int64_t Size, map_var_info_t HstPtrName, bool HasFlagTo,
bool HasFlagAlways, bool IsImplicit, bool UpdateRefCount,
bool HasCloseModifier, bool HasPresentModifier, bool HasHoldModifier,
- AsyncInfoTy &AsyncInfo, HostDataToTargetTy *OwnedTPR, bool ReleaseHDTTMap) {
+ AsyncInfoTy &AsyncInfo, HostDataToTargetTy *OwnedTPR, bool ReleaseHDTTMap,
+ StateInfoTy *StateInfo) {
LookupResult LR = lookupMapping(HDTTMap, HstPtrBegin, Size, OwnedTPR);
LR.TPR.Flags.IsPresent = true;
@@ -324,8 +325,18 @@ TargetPointerResultTy MappingInfoTy::getTargetPointer(
// If the target pointer is valid, and we need to transfer data, issue the
// data transfer.
+ auto WasNewlyAllocatedOnCurrentConstruct = [&]() {
+ if (!StateInfo)
+ return false;
+ return StateInfo->NewAllocations.contains(HstPtrBegin);
+ };
+
+ // Even if this isn't a new entry, we still need to do a data-transfer if
+ // the pointer was newly allocated previously on the same construct.
if (LR.TPR.TargetPointer && !LR.TPR.Flags.IsHostPointer && HasFlagTo &&
- (LR.TPR.Flags.IsNewEntry || HasFlagAlways) && Size != 0) {
+ (LR.TPR.Flags.IsNewEntry || HasFlagAlways ||
+ WasNewlyAllocatedOnCurrentConstruct()) &&
+ Size != 0) {
// If we have something like:
// #pragma omp target map(to: s.myarr[0:10]) map(to: s.myarr[0:10])
diff --git a/offload/libomptarget/interface.cpp b/offload/libomptarget/interface.cpp
index fe18289765906..ac03546860740 100644
--- a/offload/libomptarget/interface.cpp
+++ b/offload/libomptarget/interface.cpp
@@ -167,19 +167,22 @@ targetData(ident_t *Loc, int64_t DeviceId, int32_t ArgNum, void **ArgsBase,
int Rc = OFFLOAD_SUCCESS;
- // Only allocate AttachInfo for targetDataBegin
- std::unique_ptr<AttachInfoTy> AttachInfo;
- if (TargetDataFunction == targetDataBegin)
- AttachInfo = std::make_unique<AttachInfoTy>();
+ // Allocate StateInfo for targetDataBegin and targetDataEnd to track
+ // allocations, pointer attachments and deferred transfers.
+ // This is not needed for targetDataUpdate.
+ std::unique_ptr<StateInfoTy> StateInfo;
+ if (TargetDataFunction == targetDataBegin ||
+ TargetDataFunction == targetDataEnd)
+ StateInfo = std::make_unique<StateInfoTy>();
Rc = TargetDataFunction(Loc, *DeviceOrErr, ArgNum, ArgsBase, Args, ArgSizes,
ArgTypes, ArgNames, ArgMappers, AsyncInfo,
- AttachInfo.get(), /*FromMapper=*/false);
+ StateInfo.get(), /*FromMapper=*/false);
if (Rc == OFFLOAD_SUCCESS) {
// Process deferred ATTACH entries BEFORE synchronization
- if (AttachInfo && !AttachInfo->AttachEntries.empty())
- Rc = processAttachEntries(*DeviceOrErr, *AttachInfo, AsyncInfo);
+ if (StateInfo && !StateInfo->AttachEntries.empty())
+ Rc = processAttachEntries(*DeviceOrErr, *StateInfo, AsyncInfo);
if (Rc == OFFLOAD_SUCCESS)
Rc = AsyncInfo.synchronize();
diff --git a/offload/libomptarget/omptarget.cpp b/offload/libomptarget/omptarget.cpp
index 69725e77bae00..bef1488b2956f 100644
--- a/offload/libomptarget/omptarget.cpp
+++ b/offload/libomptarget/omptarget.cpp
@@ -294,7 +294,7 @@ int targetDataMapper(ident_t *Loc, DeviceTy &Device, void *ArgBase, void *Arg,
int64_t ArgSize, int64_t ArgType, map_var_info_t ArgNames,
void *ArgMapper, AsyncInfoTy &AsyncInfo,
TargetDataFuncPtrTy TargetDataFunction,
- AttachInfoTy *AttachInfo = nullptr) {
+ StateInfoTy *StateInfo = nullptr) {
DP("Calling the mapper function " DPxMOD "\n", DPxPTR(ArgMapper));
// The mapper function fills up Components.
@@ -325,7 +325,7 @@ int targetDataMapper(ident_t *Loc, DeviceTy &Device, void *ArgBase, void *Arg,
MapperArgsBase.data(), MapperArgs.data(),
MapperArgSizes.data(), MapperArgTypes.data(),
MapperArgNames.data(), /*arg_mappers*/ nullptr,
- AsyncInfo, AttachInfo, /*FromMapper=*/true);
+ AsyncInfo, StateInfo, /*FromMapper=*/true);
return Rc;
}
@@ -509,12 +509,12 @@ int targetDataBegin(ident_t *Loc, DeviceTy &Device, int32_t ArgNum,
void **ArgsBase, void **Args, int64_t *ArgSizes,
int64_t *ArgTypes, map_var_info_t *ArgNames,
void **ArgMappers, AsyncInfoTy &AsyncInfo,
- AttachInfoTy *AttachInfo, bool FromMapper) {
- assert(AttachInfo && "AttachInfo must be available for targetDataBegin for "
- "handling ATTACH map-types.");
+ StateInfoTy *StateInfo, bool FromMapper) {
+ assert(StateInfo && "StateInfo must be available for targetDataBegin for "
+ "handling ATTACH and TO/TOFROM map-types.");
// process each input.
for (int32_t I = 0; I < ArgNum; ++I) {
- // Ignore private variables and arrays - there is no mapping for them.
+ // Ignore private variables and arrays - there is no mapping for t.attahem.
if ((ArgTypes[I] & OMP_TGT_MAPTYPE_LITERAL) ||
(ArgTypes[I] & OMP_TGT_MAPTYPE_PRIVATE))
continue;
@@ -529,7 +529,7 @@ int targetDataBegin(ident_t *Loc, DeviceTy &Device, int32_t ArgNum,
map_var_info_t ArgName = (!ArgNames) ? nullptr : ArgNames[I];
int Rc = targetDataMapper(Loc, Device, ArgsBase[I], Args[I], ArgSizes[I],
ArgTypes[I], ArgName, ArgMappers[I], AsyncInfo,
- targetDataBegin, AttachInfo);
+ targetDataBegin, StateInfo);
if (Rc != OFFLOAD_SUCCESS) {
REPORT("Call to targetDataBegin via targetDataMapper for custom mapper"
@@ -556,7 +556,7 @@ int targetDataBegin(ident_t *Loc, DeviceTy &Device, int32_t ArgNum,
// similar to firstprivate (PRIVATE | TO) entries by
// PrivateArgumentManager.
if (!IsCorrespondingPointerInit)
- AttachInfo->AttachEntries.emplace_back(
+ StateInfo->AttachEntries.emplace_back(
/*PointerBase=*/HstPtrBase, /*PointeeBegin=*/HstPtrBegin,
/*PointerSize=*/DataSize, /*MapType=*/ArgTypes[I],
/*PointeeName=*/HstPtrName);
@@ -633,7 +633,7 @@ int targetDataBegin(ident_t *Loc, DeviceTy &Device, int32_t ArgNum,
// Track new allocation, for eventual use in attachment decision-making.
if (PointerTpr.Flags.IsNewEntry && !IsHostPtr)
- AttachInfo->NewAllocations[HstPtrBase] = sizeof(void *);
+ StateInfo->NewAllocations[HstPtrBase] = sizeof(void *);
DP("There are %zu bytes allocated at target address " DPxMOD " - is%s new"
"\n",
@@ -654,7 +654,8 @@ int targetDataBegin(ident_t *Loc, DeviceTy &Device, int32_t ArgNum,
auto TPR = Device.getMappingInfo().getTargetPointer(
HDTTMap, HstPtrBegin, HstPtrBase, TgtPadding, DataSize, HstPtrName,
HasFlagTo, HasFlagAlways, IsImplicit, UpdateRef, HasCloseModifier,
- HasPresentModifier, HasHoldModifier, AsyncInfo, PointerTpr.getEntry());
+ HasPresentModifier, HasHoldModifier, AsyncInfo, PointerTpr.getEntry(),
+ /*ReleaseHDTTMap=*/true, StateInfo);
void *TgtPtrBegin = TPR.TargetPointer;
IsHostPtr = TPR.Flags.IsHostPointer;
// If data_size==0, then the argument could be a zero-length pointer to
@@ -664,11 +665,30 @@ int targetDataBegin(ident_t *Loc, DeviceTy &Device, int32_t ArgNum,
HasPresentModifier ? "'present' map type modifier"
: "device failure or illegal mapping");
return OFFLOAD_FAIL;
+ } else if (TgtPtrBegin && HasPresentModifier &&
+ StateInfo->NewAllocations.contains(HstPtrBegin)) {
+ // For "PRESENT" entries, we may have cases like the following:
+ // map(alloc: p[0]) map(present, alloc: p[0])
+ // If the compiler does not merge these entries, then the "PRESENT" entry
+ // may be encountered after a previous entry allocated new storage for it.
+ // To catch such cases, we should also look at any existing allocations
+ // and error out if we have one matching the pointer. We don't need to
+ // worry about cases like:
+ // map(alloc: p[1:10]) map(present, alloc: p[2:5])
+ // as the list-items share storage, but are not identical, which is a
+ // user error as per OpenMP.
+ MESSAGE("device mapping required by 'present' map type modifier does not "
+ "exist for host address " DPxMOD " (%" PRId64 " bytes)\n",
+ DPxPTR(HstPtrBegin), DataSize);
+ REPORT("Pointer " DPxMOD
+ " was not present on the device upon entry to the region.\n",
+ DPxPTR(HstPtrBegin));
+ return OFFLOAD_FAIL;
}
- // Track new allocation, for eventual use in attachment decision-making.
+ // Track new allocation, for eventual use in attachment/to decision-making.
if (TPR.Flags.IsNewEntry && !IsHostPtr && TgtPtrBegin)
- AttachInfo->NewAllocations[HstPtrBegin] = DataSize;
+ StateInfo->NewAllocations[HstPtrBegin] = DataSize;
DP("There are %" PRId64 " bytes allocated at target address " DPxMOD
" - is%s new\n",
@@ -751,29 +771,29 @@ int targetDataBegin(ident_t *Loc, DeviceTy &Device, int32_t ArgNum,
///
/// For this purpose, we insert a data_fence before the first
/// pointer-attachment, (3), to ensure that all pending transfers finish first.
-int processAttachEntries(DeviceTy &Device, AttachInfoTy &AttachInfo,
+int processAttachEntries(DeviceTy &Device, StateInfoTy &StateInfo,
AsyncInfoTy &AsyncInfo) {
// Report all tracked allocations from both main loop and ATTACH processing
- if (!AttachInfo.NewAllocations.empty()) {
+ if (!StateInfo.NewAllocations.empty()) {
DP("Tracked %u total new allocations:\n",
- (unsigned)AttachInfo.NewAllocations.size());
- for ([[maybe_unused]] const auto &Alloc : AttachInfo.NewAllocations) {
+ (unsigned)StateInfo.NewAllocations.size());
+ for ([[maybe_unused]] const auto &Alloc : StateInfo.NewAllocations) {
DP(" Host ptr: " DPxMOD ", Size: %" PRId64 " bytes\n",
DPxPTR(Alloc.first), Alloc.second);
}
}
- if (AttachInfo.AttachEntries.empty())
+ if (StateInfo.AttachEntries.empty())
return OFFLOAD_SUCCESS;
DP("Processing %zu deferred ATTACH map entries\n",
- AttachInfo.AttachEntries.size());
+ StateInfo.AttachEntries.size());
int Ret = OFFLOAD_SUCCESS;
bool IsFirstPointerAttachment = true;
- for (size_t EntryIdx = 0; EntryIdx < AttachInfo.AttachEntries.size();
+ for (size_t EntryIdx = 0; EntryIdx < StateInfo.AttachEntries.size();
++EntryIdx) {
- const auto &AttachEntry = AttachInfo.AttachEntries[EntryIdx];
+ const auto &AttachEntry = StateInfo.AttachEntries[EntryIdx];
void **HstPtr = reinterpret_cast<void **>(AttachEntry.PointerBase);
@@ -792,7 +812,7 @@ int processAttachEntries(DeviceTy &Device, AttachInfoTy &AttachInfo,
// Lambda to check if a pointer was newly allocated
auto WasNewlyAllocated = [&](void *Ptr, const char *PtrName) {
bool IsNewlyAllocated =
- llvm::any_of(AttachInfo.NewAllocations, [&](const auto &Alloc) {
+ llvm::any_of(StateInfo.NewAllocations, [&](const auto &Alloc) {
void *AllocPtr = Alloc.first;
int64_t AllocSize = Alloc.second;
return Ptr >= AllocPtr &&
@@ -1009,7 +1029,9 @@ int targetDataEnd(ident_t *Loc, DeviceTy &Device, int32_t ArgNum,
void **ArgBases, void **Args, int64_t *ArgSizes,
int64_t *ArgTypes, map_var_info_t *ArgNames,
void **ArgMappers, AsyncInfoTy &AsyncInfo,
- AttachInfoTy *AttachInfo, bool FromMapper) {
+ StateInfoTy *StateInfo, bool FromMapper) {
+ assert(StateInfo && "StateInfo is required for targetDataEnd for handling "
+ "FROM data transfers");
int Ret = OFFLOAD_SUCCESS;
auto *PostProcessingPtrs = new SmallVector<PostProcessingInfo>();
// process each input.
@@ -1037,7 +1059,7 @@ int targetDataEnd(ident_t *Loc, DeviceTy &Device, int32_t ArgNum,
map_var_info_t ArgName = (!ArgNames) ? nullptr : ArgNames[I];
Ret = targetDataMapper(Loc, Device, ArgBases[I], Args[I], ArgSizes[I],
ArgTypes[I], ArgName, ArgMappers[I], AsyncInfo,
- targetDataEnd);
+ targetDataEnd, StateInfo);
if (Ret != OFFLOAD_SUCCESS) {
REPORT("Call to targetDataEnd via targetDataMapper for custom mapper"
@@ -1106,8 +1128,28 @@ int targetDataEnd(ident_t *Loc, DeviceTy &Device, int32_t ArgNum,
// Move data back to the host
const bool HasAlways = ArgTypes[I] & OMP_TGT_MAPTYPE_ALWAYS;
const bool HasFrom = ArgTypes[I] & OMP_TGT_MAPTYPE_FROM;
- if (HasFrom && (HasAlways || TPR.Flags.IsLast) &&
- !TPR.Flags.IsHostPointer && DataSize != 0) {
+ const bool IsMemberOf = ArgTypes[I] & OMP_TGT_MAPTYPE_MEMBER_OF;
+ // Lambda to check if there was a previously deferred FROM for this pointer
+ // due to its ref-count not being zero.
+ auto HasDeferredMapFrom = [&]() -> bool {
+ if (!StateInfo->DeferredFromPtrs.contains(HstPtrBegin))
+ return false;
+ DP("Found previously deferred FROM transfer for HstPtr=" DPxMOD "\n",
+ DPxPTR(HstPtrBegin));
+ // Remove it so we don't look at it again
+ StateInfo->DeferredFromPtrs.erase(HstPtrBegin);
+ return true;
+ };
+
+ bool IsMapFromOnNonHostNonZeroData =
+ HasFrom && !TPR.Flags.IsHostPointer && DataSize != 0;
+ bool IsLastOrHasAlways = TPR.Flags.IsLast || HasAlways;
+
+ if ((IsMapFromOnNonHostNonZeroData && IsLastOrHasAlways) ||
+ // Even if are not looking at an entry with FROM map-type, if there were
+ // any previously deferred FROM transfers for this pointer, we should
+ // do them when the ref-count goes down to zero.
+ (TPR.Flags.IsLast && HasDeferredMapFrom())) {
DP("Moving %" PRId64 " bytes (tgt:" DPxMOD ") -> (hst:" DPxMOD ")\n",
DataSize, DPxPTR(TgtPtrBegin), DPxPTR(HstPtrBegin));
TIMESCOPE_WITH_DETAILS_AND_IDENT(
@@ -1137,6 +1179,30 @@ int targetDataEnd(ident_t *Loc, DeviceTy &Device, int32_t ArgNum,
OFFLOAD_SUCCESS)
return OFFLOAD_FAIL;
}
+ } else if (IsMapFromOnNonHostNonZeroData && !IsLastOrHasAlways &&
+ !IsMemberOf) {
+ // We can have cases like the following:
+ // map(alloc: p[0:1]) map(from: p[0:1])
+ //
+ // For such cases, if we have different entries for the two maps, we
+ // may not see the ref-count go down to zero when handling the From entry.
+ //
+ // So, we defer the FROM data-transfer until the ref-count goes down to
+ // zero (if it does).
+ //
+ // This should be limited to non-member-of entries because for member-of,
+ // their ref-count should go down only once as part of the parent.
+ //
+ // Also, we don't need to worry about cases like:
+ // map(alloc: p[0:10]) map(from: p[0:1])
+ //
+ // because that is not OpenMP 6.0 compliant, so we can just save the
+ // pointer without saving the size, and assume that the size for the
+ // "alloc" map will match that of "from".
+ StateInfo->DeferredFromPtrs.insert(HstPtrBegin);
+ DP("Deferring FROM map transfer for HstPtr=" DPxMOD ", Size=%" PRId64
+ "\n",
+ DPxPTR(HstPtrBegin), DataSize);
}
// Add pointer to the buffer for post-synchronize processing.
@@ -1315,7 +1381,7 @@ int targetDataUpdate(ident_t *Loc, DeviceTy &Device, int32_t ArgNum,
void **ArgsBase, void **Args, int64_t *ArgSizes,
int64_t *ArgTypes, map_var_info_t *ArgNames,
void **ArgMappers, AsyncInfoTy &AsyncInfo,
- AttachInfoTy *AttachInfo, bool FromMapper) {
+ StateInfoTy *StateInfo, bool FromMapper) {
// process each input.
for (int32_t I = 0; I < ArgNum; ++I) {
if ((ArgTypes[I] & OMP_TGT_MAPTYPE_LITERAL) ||
@@ -1806,21 +1872,21 @@ static int processDataBefore(ident_t *Loc, int64_t DeviceId, void *HostPtr,
if (!DeviceOrErr)
FATAL_MESSAGE(DeviceId, "%s", toString(DeviceOrErr.takeError()).c_str());
- // Create AttachInfo for tracking any ATTACH entries, or new-allocations
+ // Create StateInfo for tracking any ATTACH entries, new allocations,
// when handling the "begin" mapping for a target constructs.
- AttachInfoTy AttachInfo;
+ StateInfoTy StateInfo;
int Ret = targetDataBegin(Loc, *DeviceOrErr, ArgNum, ArgBases, Args, ArgSizes,
ArgTypes, ArgNames, ArgMappers, AsyncInfo,
- &AttachInfo, false /*FromMapper=*/);
+ &StateInfo, false /*FromMapper=*/);
if (Ret != OFFLOAD_SUCCESS) {
REPORT("Call to targetDataBegin failed, abort target.\n");
return OFFLOAD_FAIL;
}
// Process collected ATTACH entries
- if (!AttachInfo.AttachEntries.empty()) {
- Ret = processAttachEntries(*DeviceOrErr, AttachInfo, AsyncInfo);
+ if (!StateInfo.AttachEntries.empty()) {
+ Ret = processAttachEntries(*DeviceOrErr, StateInfo, AsyncInfo);
if (Ret != OFFLOAD_SUCCESS) {
REPORT("Failed to process ATTACH entries.\n");
return OFFLOAD_FAIL;
@@ -1987,9 +2053,14 @@ static int processDataAfter(ident_t *Loc, int64_t DeviceId, void *HostPtr,
if (!DeviceOrErr)
FATAL_MESSAGE(DeviceId, "%s", toString(DeviceOrErr.takeError()).c_str());
+ // Create StateInfo for tracking map(from)s for which ref-count is non-zero
+ // when the entry is encountered.
+ StateInfoTy StateInfo;
+
// Move data from device.
- int Ret = targetDataEnd(Loc, *DeviceOrErr, ArgNum, ArgBases, Args, ArgSizes,
- ArgTypes, ArgNames, ArgMappers, AsyncInfo);
+ int Ret =
+ targetDataEnd(Loc, *DeviceOrErr, ArgNum, ArgBases, Args, ArgSizes,
+ ArgTypes, ArgNames, ArgMappers, AsyncInfo, &StateInfo);
if (Ret != OFFLOAD_SUCCESS) {
REPORT("Call to targetDataEnd failed, abort target.\n");
return OFFLOAD_FAIL;
diff --git a/offload/test/mapping/map_ordering_tgt_alloc_from_to.c b/offload/test/mapping/map_ordering_tgt_alloc_from_to.c
new file mode 100644
index 0000000000000..67c88e7238842
--- /dev/null
+++ b/offload/test/mapping/map_ordering_tgt_alloc_from_to.c
@@ -0,0 +1,14 @@
+// RUN: %libomptarget-compile-run-and-check-generic
+
+#include <stdio.h>
+
+int main() {
+ int x = 111;
+#pragma omp target map(alloc : x) map(from : x) map(to : x) map(alloc : x)
+ {
+ printf("%d\n", x); // CHECK: 111
+ x = x + 111;
+ }
+
+ printf("%d\n", x); // CHECK: 222
+}
diff --git a/offload/test/mapping/map_ordering_tgt_alloc_present_tofrom.c b/offload/test/mapping/map_ordering_tgt_alloc_present_tofrom.c
new file mode 100644
index 0000000000000..f8f397efc4acd
--- /dev/null
+++ b/offload/test/mapping/map_ordering_tgt_alloc_present_tofrom.c
@@ -0,0 +1,27 @@
+// RUN: %libomptarget-compile-generic
+// RUN: %libomptarget-run-fail-generic 2>&1 \
+// RUN: | %fcheck-generic
+
+#include <stdio.h>
+
+int main() {
+ // CHECK: addr=0x[[#%x,HOST_ADDR:]], size=[[#%u,SIZE:]]
+ int x = 111;
+ fprintf(stderr, "addr=%p, size=%ld\n", &x, sizeof(x));
+// CHECK: omptarget message: device mapping required by 'present' map type
+// modifier does not exist for host address 0x{{0*}}[[#HOST_ADDR]] ([[#SIZE]]
+// bytes)
+// CHECK: omptarget error: Pointer 0x{{0*}}[[#HOST_ADDR]] was not present
+// on the device upon entry to the region.
+// ('present' map type modifier).
+// CHECK: omptarget error: Call to targetDataBegin failed, abort target.
+// CHECK: omptarget error: Failed to process data before launching the kernel.
+// CHECK: omptarget fatal error 1: failure of target construct while offloading
+// is mandatory
+#pragma omp target map(alloc : x) map(present, alloc : x) map(tofrom : x)
+ {
+ printf("%d\n", x);
+ }
+
+ return 0;
+}
diff --git a/offload/test/mapping/map_ordering_tgt_alloc_tofrom.c b/offload/test/mapping/map_ordering_tgt_alloc_tofrom.c
new file mode 100644
index 0000000000000..c76e2b4bafa1a
--- /dev/null
+++ b/offload/test/mapping/map_ordering_tgt_alloc_tofrom.c
@@ -0,0 +1,14 @@
+// RUN: %libomptarget-compile-run-and-check-generic
+
+#include <stdio.h>
+
+int main() {
+ int x = 111;
+#pragma omp target map(alloc : x) map(tofrom : x) map(alloc : x)
+ {
+ printf("%d\n", x); // CHECK: 111
+ x = x + 111;
+ }
+
+ printf("%d\n", x); // CHECK: 222
+}
diff --git a/offload/test/mapping/map_ordering_tgt_data_alloc_from.c b/offload/test/mapping/map_ordering_tgt_data_alloc_from.c
new file mode 100644
index 0000000000000..e5905460bea19
--- /dev/null
+++ b/offload/test/mapping/map_ordering_tgt_data_alloc_from.c
@@ -0,0 +1,14 @@
+// RUN: %libomptarget-compile-run-and-check-generic
+
+#include <stdio.h>
+
+int main() {
+ int x = 111;
+#pragma omp target data map(alloc : x) map(from : x) map(alloc : x)
+ {
+#pragma omp target map(present, alloc : x)
+ x = 222;
+ }
+
+ printf("%d\n", x); // CHECK: 222
+}
diff --git a/offload/test/mapping/map_ordering_tgt_data_alloc_to_from.c b/offload/test/mapping/map_ordering_tgt_data_alloc_to_from.c
new file mode 100644
index 0000000000000..1ed41200cecde
--- /dev/null
+++ b/offload/test/mapping/map_ordering_tgt_data_alloc_to_from.c
@@ -0,0 +1,17 @@
+// RUN: %libomptarget-compile-run-and-check-generic
+
+#include <stdio.h>
+
+int main() {
+ int x = 111;
+#pragma omp target data map(alloc : x) map(to : x) map(from : x) map(alloc : x)
+ {
+#pragma omp target map(present, alloc : x)
+ {
+ printf("%d\n", x); // CHECK: 111
+ x = x + 111;
+ }
+ }
+
+ printf("%d\n", x); // CHECK: 222
+}
diff --git a/offload/test/mapping/map_ordering_tgt_data_alloc_tofrom.c b/offload/test/mapping/map_ordering_tgt_data_alloc_tofrom.c
new file mode 100644
index 0000000000000..6db30d2aa7f9d
--- /dev/null
+++ b/offload/test/mapping/map_ordering_tgt_data_alloc_tofrom.c
@@ -0,0 +1,17 @@
+// RUN: %libomptarget-compile-run-and-check-generic
+
+#include <stdio.h>
+
+int main() {
+ int x = 111;
+#pragma omp target data map(alloc : x) map(tofrom : x) map(alloc : x)
+ {
+#pragma omp target map(present, alloc : x)
+ {
+ printf("%d\n", x); // CHECK: 111
+ x = x + 111;
+ }
+ }
+
+ printf("%d\n", x); // CHECK: 222
+}
More information about the llvm-commits
mailing list