[Openmp-commits] [PATCH] D44186: [OpenMP] New clang/libomptarget map interface: remove translation code

Fri Mar 9 14:48:35 PST 2018

grokos added inline comments.

================
Comment at: libomptarget/src/omptarget.cpp:197-199
+    // Adjust for proper alignment if this is a combined entry (for structs).
+    // Look at the next argument - if that is MEMBER_OF this one, then this one
+    // is a combined entry.
----------------
Hahnfeld wrote:
> RaviNarayanaswamy wrote:
> > grokos wrote:
> > > Hahnfeld wrote:
> > > > grokos wrote:
> > > > > Hahnfeld wrote:
> > > > > > I thought this is now done in the compiler?
> > > > > No, it's not. The compiler could take care of this issue, but it's not its job. The compiler should just inform the runtime that we requested a mapping starting from some address. If `CUDA memcpy`, for instance, has some requirements regarding the alignment of addresses, that's not the compiler's business. The compiler doesn't and shouldn't care about what happens at the plugin level of libomptarget.
> > > > If that's specific to CUDA, why does it happen in the plugin agnostic part of libomptarget?
> > > That's a good point. A more elegant solution would be to extend the plugin interface (`__tgt_rtl_*` functions) with a new function which the agnostic library can query in order to get any alignment requirements. I'm in favour of this approach, but I need to ask other people what they think. In any case, implementing this potential change is not part of this patch. Thoughts?
> > I am not sure what you are trying to do here.  For structure members, the compiler should generate the begin address, offset and size.  The code generated for  the target is using the beginning of the struct to access the field.  So cannot just pad the field member.
> Good point, this needs clarification. Maybe @grokos could share a code example where this padding is needed and point to documentation where it says that `cuMemcpy` can only handle aligned pointers?
> 
> My guess which might be completely wrong: Maybe the begin address is just for transfer and the target code will use the subsequent entries which point to the member directly?
@RaviNarayanaswamy : What you refer to is the `Base` address, which is the starting address of the struct. And you are right, the target code uses this address to access members of the struct and we cannot modify it. What I am padding is the `Begin` address, which is the address of the first mapped member.

I was mistaken by the debug output, padding is not needed for memory transfers, but for the kernel execution itself. It ensures that the alignment of each mapped field remains what it should be. E.g.
```
struct S {
  int a;   // 4-aligned
  int b;   // 4-aligned
  int *p;  // 8-aligned
} s1;
...
#pragma omp target map(tofrom: s1.b, s1.p[0:N])
{
  s1.b = 5;
  for (int i...) s1.p[i] = ...;
}
```
In this example we are mapping `s1` starting from member `b`. So, `BaseAddress=&s1=&s1.a` and `BeginAddress=&s1.b`. Let's assume that the struct begins at address `0x100`. Then `&s1.a=0x100`, `&s1.b=0x104`, `&s1.p=0x108`. Each member obeys the alignment requirements for its type.

Now, when we allocate memory on the device, in CUDA's case `cuMemAlloc()` returns an address which is at least 256-aligned. This means that the chunk of the struct on the device will start at a 256-aligned address, let's say `0x200`. Then the address of `b` will be `0x200` and address of `p` will be a misaligned `0x204` (on the host there was no need to add padding between `b` and `p`, so `p` comes exactly 4 bytes after `b`). If the device kernel tries to access `s1.p`, a `misaligned address` error occurs (as reported by the CUDA plugin).

By padding the begin address down to a multiple of 8 and extending the size of the allocated chuck accordingly, the chuck on the device will start at `0x200` with the padding (4 bytes), then `&s1.b=0x204` and `&s1.p=0x208`, as they should be to satisfy the alignment requirements.

Repository:
  rOMP OpenMP

https://reviews.llvm.org/D44186