[Openmp-commits] [PATCH] D65001: [OpenMP][libomptarget] Add support for unified memory for regular maps

Jonas Hahnfeld via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Wed Jul 31 00:44:14 PDT 2019

Hahnfeld added a comment.

In D65001#1607199 <https://reviews.llvm.org/D65001#1607199>, @gtbercea wrote:

> @Hahnfeld I cannot replace the usage of the requires pragma with the call to the registration function directly because the registration function is the first function that gets called so if I explicitely invoke it will only be the 2nd call to that function. A subsequent call with a different set of flags will lead to a mismatch in requires clauses error. (first implicit call is without unified_shared_memory and the second call is with unified_shared_memory).

Yes, but you can have both, the `requires` directive and the manual call. I understand that this is not needed for newer Clang versions, but right now the test will fail for older releases that don't support `requires`.


For the tests, I think we should keep `requires.c` in `test/offloading/` because it's not specific to `unified_shared_memory`. Can you also rename `requires_unified_shared_memory_local.c` to `shared_update.c` (or something like that) to make the name more concise?

Comment at: libomptarget/src/api.cpp:118-123
+  // Under unified memory the host pointer can be returned by the
+  // getTgtPtrBegin() function which means that there is no device
+  // corresponding point for ptr. This function should return false
+  // in that situation.
+  if (Device.RTLRequiresFlags & OMP_REQ_UNIFIED_SHARED_MEMORY)
+    rc = (TgtPtr != ptr);
gtbercea wrote:
> Hahnfeld wrote:
> > gtbercea wrote:
> > > Hahnfeld wrote:
> > > > The spec says `This routine returns non-zero if the specified pointer would be found present on device device_num by a map clause; otherwise, it returns zero.` Programs with `unified_shared_memory` could imply that all pointers are present on all devices, so I would have expected the routine to always return true. What's the reasoning for checking that the pointers differ?
> > > I'm checking if the pointer is on the actual device. If unified memory is used then the pointers will match and the device present test will return false.
> > > I have now refactored this check to make it more precise: if the host pointer is used then we have a flag for that.
> > Okay, I think I get what you're saying: You want to check that this particular pointer has been `map`ped to that specific `device_num`, right?
> > 
> > But with unified shared memory, this shouldn't matter, right? Because all pointers can be accessed from all devices, no?
> Employing the API functions allows the user to bypass the unified memory behaviour and these functions allow the user to manage device pointers explicitly.
I'm still not sure about this, what do others think?

Comment at: libomptarget/src/device.cpp:290-291
 int DeviceTy::deallocTgtPtr(void *HstPtrBegin, int64_t Size, bool ForceDelete) {
+    return OFFLOAD_SUCCESS;
   // Check if the pointer is contained in any sub-nodes.
gtbercea wrote:
> Hahnfeld wrote:
> > If we can still alloc memory in `getOrAllocTgtPtr`, then this is wrong.
> Fixed.
Is `deallocTgtPtr` still called under `unified_shared_memory`?

Comment at: libomptarget/src/device.cpp:174-176
+      (IsImplicit || !((lr.Flags.IsContained ||
+                        lr.Flags.ExtendsBefore ||
+                        lr.Flags.ExtendsAfter) && !Size))) {
grokos wrote:
> I'm confused with this condition. What do we want to test?
> The condition can be simplified to:
> `IsImplicit || !lr.Flags.IsContained || Size`
> because if Size==0, then `lookupMapping` cannot return `ExtendsBefore/After`, it may only return `IsContained`. So we want a mapping that's either
>   # implicit or
>   # explicit and it is either not contained or has Size>0.
> The latter doesn't make sense to me... For instance, what if Size>0 and the mapping explicitly extends after? The condition will evaluate to true but this is invalid use no matter whether we use unified shared memory or not.
I agree with @grokos here.

Comment at: libomptarget/src/device.cpp:178
     auto &HT = *lr.Entry;
-    IsNew = false;
-    if (UpdateRefCount)
-      ++HT.RefCount;
-    uintptr_t tp = HT.TgtPtrBegin + ((uintptr_t)HstPtrBegin - HT.HstPtrBegin);
-    DP("Mapping exists%s with HstPtrBegin=" DPxMOD ", TgtPtrBegin=" DPxMOD ", "
-        "Size=%ld,%s RefCount=%s\n", (IsImplicit ? " (implicit)" : ""),
-        DPxPTR(HstPtrBegin), DPxPTR(tp), Size,
-        (UpdateRefCount ? " updated" : ""),
-        (CONSIDERED_INF(HT.RefCount)) ? "INF" :
-            std::to_string(HT.RefCount).c_str());
-    rc = (void *)tp;
-  } else if ((lr.Flags.ExtendsBefore || lr.Flags.ExtendsAfter) && !IsImplicit) {
-    // Explicit extension of mapped data - not allowed.
-    DP("Explicit extension of mapping is not allowed.\n");
-  } else if (Size) {
-    // If it is not contained and Size > 0 we should create a new entry for it.
-    IsNew = true;
-    uintptr_t tp = (uintptr_t)RTL->data_alloc(RTLDeviceID, Size, HstPtrBegin);
-    DP("Creating new map entry: HstBase=" DPxMOD ", HstBegin=" DPxMOD ", "
-        "HstEnd=" DPxMOD ", TgtBegin=" DPxMOD "\n", DPxPTR(HstPtrBase),
-        DPxPTR(HstPtrBegin), DPxPTR((uintptr_t)HstPtrBegin + Size), DPxPTR(tp));
-    HostDataToTargetMap.push_front(HostDataToTargetTy((uintptr_t)HstPtrBase,
-        (uintptr_t)HstPtrBegin, (uintptr_t)HstPtrBegin + Size, tp));
+    uintptr_t tp = HT.HstPtrBegin + ((uintptr_t)HstPtrBegin - HT.HstPtrBegin);
+    DP("Return HstPtrBegin " DPxMOD " Size=%ld RefCount=%s\n", DPxPTR(tp),
gtbercea wrote:
> grokos wrote:
> > This evaluates to just `HstPtrBegin`, the two `HT.HstPtrBegin` cancel out, so you can skip defining `tp` altogether.
> I tried and it leads to failing tests when handling structs.
Again I agree with the (static) analysis by @grokos. If that leads to failing tests, we need to understand why. What was the idea behind the calculation of `tp`?

Comment at: libomptarget/src/device.cpp:207-213
+      uintptr_t tp = (uintptr_t)NULL;
+      // If the use_device_ptr clause is used, we create a copy of the variable
+      // on the device even in the unified shared memory case.
+      // The explicit usage of the use_device_ptr clause forces the variable to
+      // have a device side version.
+      if (!(RTLRequiresFlags & OMP_REQ_UNIFIED_SHARED_MEMORY))
+        tp = (uintptr_t)RTL->data_alloc(RTLDeviceID, Size, HstPtrBegin);
`use_device_ptr` is no more, so this change shouldn't be needed anymore?

Comment at: libomptarget/src/omptarget.cpp:409-418
+          int rt;
+          if (Device.RTLRequiresFlags & OMP_REQ_UNIFIED_SHARED_MEMORY &&
+              TgtPtrBegin == HstPtrBegin)
+            rt = OFFLOAD_SUCCESS;
+          else
+            rt = Device.data_retrieve(HstPtrBegin, TgtPtrBegin, data_size);
           if (rt != OFFLOAD_SUCCESS) {
We can just move the error check into the branch, or even better move the check for `unified_shared_memory` to the outer condition.

Comment at: libomptarget/src/omptarget.cpp:511-512
       for (ShadowPtrListTy::iterator it = Device.ShadowPtrMap.begin();
           it != Device.ShadowPtrMap.end(); ++it) {
         void **ShadowHstPtrAddr = (void**) it->first;
Do we need the shadow pointers under `unified_shared_memory` and if `TgtPtrBegin == HstPtrBegin`? If not, we can just move the check to the outer condition.

Comment at: libomptarget/src/omptarget.cpp:544-545
+      // Skip shadow pointer loop for the unified shared memory case.
+      if (!(Device.RTLRequiresFlags & OMP_REQ_UNIFIED_SHARED_MEMORY)) {
+        uintptr_t lb = (uintptr_t) HstPtrBegin;
As above, but I think this needs to check `TgtPtrBegin == HstPtrBegin`.

I think we can just do the following around line 490:
} else if (Device.RTLRequiresFlags & OMP_REQ_UNIFIED_SHARED_MEMORY &&
                TgtPtrBegin == HstPtrBegin) {
  DP("hst data:" DPxMOD " unified and shared, becomes a noop\n", DPxPTR(HstPtrBegin));

Comment at: libomptarget/src/omptarget.cpp:693-703
+        int rt;
+        if (Device.RTLRequiresFlags & OMP_REQ_UNIFIED_SHARED_MEMORY &&
+            TgtPtrBegin == HstPtrBegin)
+          rt = OFFLOAD_SUCCESS;
+        else
+          rt = Device.data_submit(TgtPtrBegin, &Pointer_TgtPtrBegin,
+                                  sizeof(void *));
Can move the check `Device.RTLRequiresFlags & OMP_REQ_UNIFIED_SHARED_MEMORY && TgtPtrBegin == HstPtrBegin` to line 690, in a similar `else` branch behind `!Pointer_TgtPtrBegin`.

Comment at: libomptarget/test/unified_shared_memory/api.c:14
+void init(int A[], int B[], int C[]) {
+  for(int i=0; i<N; ++i) {
+    A[i] = 0;
Please format the test.

  rOMP OpenMP



More information about the Openmp-commits mailing list