[Openmp-commits] [PATCH] D65001: [OpenMP][libomptarget] Add support for unified memory for regular maps

Jonas Hahnfeld via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Tue Jul 30 00:01:46 PDT 2019

Hahnfeld added inline comments.

Comment at: libomptarget/src/omptarget.cpp:247-249
+    // TODO: Check if this is correct
+    bool IsInUseDevicePtrClause = arg_types[i] & OMP_TGT_MAPTYPE_TARGET_PARAM &&
+        arg_types[i] & OMP_TGT_MAPTYPE_RETURN_PARAM;
AlexEichenberger wrote:
> Hahnfeld wrote:
> > gtbercea wrote:
> > > Hahnfeld wrote:
> > > > gtbercea wrote:
> > > > > Hahnfeld wrote:
> > > > > > grokos wrote:
> > > > > > > This is correct, with one little exception. Although the OpenMP standard does not mandate it, upstream clang supports `use_device_ptr` on pointers which are struct members. Because they are struct members, they are not marked with `TARGET_PARAM` (only the combined entry is considered a target parameter, not the individual members). On the other hand, they are marked with `PTR_AND_OBJ` and have some value in the `MEMBER_OF` bits.
> > > > > > > 
> > > > > > > Once again, it's a non-standard extension so we are free to decide whether to support it or not in the unified shared memory scenario.
> > > > > > Can we please first answer my question why we need to care about the existence of `use_device_ptr`? Why does it make a difference for unified shared memory?
> > > > > Because use_device_ptr implies the use of a true device pointer and again that needs to be respected even unified memory is used.
> > > > Where is that in the spec?
> > > The pointers returned by omp_target_alloc and accessed through use_device_ptr are guaranteed to be pointer values that can support pointer arithmetic while still being native device pointers. (Section 2.4 page 61 bottom).
> > With `unified_shared_memory` we get: (Section 2.4, page 62, lines 4:7)
> > 
> > > Additionally, memory in the device data environment of any device visible to OpenMP, including but not limited to the host, is considered part of the device data environment of all devices accessible through OpenMP except as noted below. Every device address allocated through OpenMP device memory routines is a valid host pointer.
> > 
> > From my understanding, this implies all shared memory can be referenced by a host pointer. This guarantees pointer arithmetic per the C / C++ standard.
> Experience from our users show a significant performance degradation when every data is mapped to host memory under unified. Overwhelmingly, users want specific data to be allocated on device & mapped, while at the same time they want generic memory used by the device to be host memory (e.g. for linked list structures).
> Recognizing this important use case, the specs has introduced the "close" modifier to the map for users to indicate that the data being mapped should be set "close" to the device, aka for us allocated on the device. Similarly,  if the user went through the additional exercise of using target alloc / free, we want to respect this explicit request of the user by allocating the data on the device.
> This is why, even under unified memory model, some data is still mapped with duplicated copies between host and device. Thus requiring the maps & use_device_ptr to be tested and enforced when data was actually mapped.
> Fortunately for the overhead, we know that only a very small subset of all data accessed will be labeled with "close" or allocated with target_alloc, so the cost of performing this mapping will be very significantly reduced.
Alex, I fully agree with all of your points: Yes, we need means to allocate memory on the device (`omp_target_alloc` and `close`) and this must be reflected by `map`s and `use_device_ptr`. I'm not arguing against this, the behavior of the two methods is pretty clear in the spec.

What I'm asking about is something different, though related: If we have "generic" memory (as you call it, so normal memory on the host that is "shared" with all devices), do we need to allocate device memory if the user specifies `use_device_ptr`? Because that is what the current patch will do in `getOrAllocTgtPtr` if `IsInUseDevicePtrClause = true`.

Let's discuss a concrete example: (assume that the TU has a `requires unified_shared_memory`)
int A[N];
// init ...
int *Aptr = &A[0];

#pragma omp target data use_device_ptr(A)
  // What is the value of A?

I think we should have `A == Aptr`, but this patch will allocate disjoint device memory. (And while writing this, I wonder how libomptarget knows about the size of `A`? According to the spec, the clause expects a list of pointers, not array sections. So how will this work with opaque pointers passed to an orphaned function?)

  rOMP OpenMP



More information about the Openmp-commits mailing list