[Openmp-commits] [PATCH] D68100: [OpenMP 5.0] declare mapper runtime implementation

Fri Jul 24 19:26:21 PDT 2020

lildmh added a comment.

In D68100#2173703 <https://reviews.llvm.org/D68100#2173703>, @grokos wrote:

> In D68100#2173612 <https://reviews.llvm.org/D68100#2173612>, @lildmh wrote:
>
> > The first combined entry comes from mapping the whole structure. I think because of the alignment, the structure is actually 16 bytes. The 2nd combined entry is the real entry emitted to map the structure. Why it looks like there are 2 of them, because at the beginning of a mapper function, it maps the whole structure no matter what, which generate the 1st combined entry you saw here. Then we generate detailed mapping entry, which generates the 2nd combined entry you saw here. They are not necessarily the same. It happens to be similar in this example.
>
>
> I assure you that's not how structs are mapped.
>
> You don't map "the whole struct", you only map what is needed. For this you emit a combined entry which has a size large enough to encompass all members we are interested in. Then entries for individual members follow. One combined entry + as many member entries as needed. The first entry which "maps the whole struct" should not be there and is plain wrong.

This is an optimization brought up by Deepak. I guess you were in that meeting too but forgot. It could be quite useful when you map an array of struct/class. Assume you map 1000 of this structure, with this optimization most memory allocation can be done in a single allocation, instead of allocation 12 bytes memory 1000 times.

Thinking about it, it's actually important for correctness too. Assume you map `C a[2]`. If you map separately, a[0] and a[1] could be mapped to not contiguous locations, and it will cause error/segfault when GPU kernel access this array. If you allocate the whole array `a[2]` together, such problem won't happen.

> In D68100#2173612 <https://reviews.llvm.org/D68100#2173612>, @lildmh wrote:
> 
>> It indeed increases RefCount after I checked the code and you are right. I think it should not cause any problem? Because RefCount will be reduced before to 0 at exit (It looks like the combined entry's mapped twice, it should also be 'deleted' twice when the target region exits).
> 
> 
> The problem is that when individual members are processed in `target_data_end`, RefCount = 2 so these members will not be copied back to the host. RefCount must be 1 for data motion to take place and in this case it's not.
> 
> Anyway, I modified libomptarget locally to ignore the 16-byte combined entry and now all tests pass. Can you please submit a clang patch which removes the first combined entry?

I believe RefCount should be reduced to 1 when want to copy it back in `target_data_end`, could you post the whole trace of debug output how RefCount changes in `target_data_end`?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68100/new/

https://reviews.llvm.org/D68100