[Openmp-commits] [PATCH] D68100: [OpenMP 5.0] declare mapper runtime implementation
Lingda Li via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Fri Jul 24 16:43:37 PDT 2020
lildmh added a comment.
In D68100#2173541 <https://reviews.llvm.org/D68100#2173541>, @grokos wrote:
> In D68100#2173460 <https://reviews.llvm.org/D68100#2173460>, @lildmh wrote:
>
> > Thanks George for looking into this, and sorry for the late response.
> >
> > I believe this is not a bug, it's a design choice we made early. The design choice is we map the whole structure at the beginning for one piece so we don't map the individual parts of them separately, which may cause a lot of memcpy.
> >
> > For the RefCount, when the runtime check the 2nd component in your example, it will find it's already mapped and will not increase the RefCount, so I think it's not a bug and the behavior is correct.
>
>
> No, this is not related to our design choices. Here we are mapping the whole struct twice for no reason. The entries should be:
>
> 1) combined entry (i.e. the entry that maps the whole struct
> base = &c, begin = &c.a, size = sizeof(class S), type = TARGET_PARAM
> 2) member entry for c.a[0:NUM]
> base = &c.a, begin = &c.a[0], size = NUM*sizeof(int), type = MEMBER_OF(1) | PTR_AND_OBJ | TO | FROM
> 3) member entry for c.onHost
> base = &c, begin = &c.onHost, size = sizeof(int), type = MEMBER_OF(1) | TO | FROM
>
>
> But what happens now is that the combined entry is emitted twice, so `MapperComponents` looks like this:
>
> <combined entry>, <combined entry>, <entry for c.a[0:NUM]>, <entry for c.onHost>
>
>
> instead of
>
> <combined entry>, <entry for c.a[0:NUM]>, <entry for c.onHost>
>
>
> And what's more, the first combined entry has size=16 whereas the second combined entry has size=12. Where does this 16 come from? The size of the struct is 12 bytes (a pointer + an int). This also explains why the MEMBER_OF field is set to 2, because the second element in the list of arguments is also the combined entry.
The first combined entry comes from mapping the whole structure. I think because of the alignment, the structure is actually 16 bytes. The 2nd combined entry is the real entry emitted to map the structure. Why it looks like there are 2 of them, because at the beginning of a mapper function, it maps the whole structure no matter what, which generate the 1st combined entry you saw here. Then we generate detailed mapping entry, which generates the 2nd combined entry you saw here. They are not necessarily the same. It happens to be similar in this example.
> There is no rationale behind emitting the combined entry twice, on the contrary it leads to errors because the RefCount is indeed incremented when it shouldn't.
>
> This is libomptarget's debug output from the provided example upon entering the target region:
>
> Libomptarget --> Entering target region with entry point 0x0000000000401409 and device Id -1
> Libomptarget --> Checking whether device 0 is ready.
> Libomptarget --> Is the device 0 (local ID 0) initialized? 1
> Libomptarget --> Device 0 is ready to use.
> Libomptarget --> Entry 0: Base=0x00007ffc24203a68, Begin=0x00007ffc24203a68, Size=16, Type=0x23
> Libomptarget --> Calling target_data_mapper for the 0th argument
> Libomptarget --> Calling the mapper function 0x0000000000400e50
> Libomptarget --> __tgt_push_mapper_component(Handle=0x00007ffc24203368) adds an entry (Base=0x00007ffc24203a68, Begin=0x00007ffc24203a68, Size=16, Type=0x20).
> Libomptarget --> __tgt_mapper_num_components(Handle=0x00007ffc24203368) returns 1
> Libomptarget --> __tgt_push_mapper_component(Handle=0x00007ffc24203368) adds an entry (Base=0x00007ffc24203a68, Begin=0x00007ffc24203a68, Size=12, Type=0x20).
> Libomptarget --> __tgt_push_mapper_component(Handle=0x00007ffc24203368) adds an entry (Base=0x00007ffc24203a68, Begin=0x0000000000d89ff0, Size=4096, Type=0x2000000000013).
> Libomptarget --> __tgt_push_mapper_component(Handle=0x00007ffc24203368) adds an entry (Base=0x00007ffc24203a68, Begin=0x00007ffc24203a70, Size=4, Type=0x2000000000003).
> Libomptarget --> Looking up mapping(HstPtrBegin=0x00007ffc24203a68, Size=16)...
> Libomptarget --> Creating new map entry: HstBase=0x00007ffc24203a68, HstBegin=0x00007ffc24203a68, HstEnd=0x00007ffc24203a78, TgtBegin=0x00007fa582400000
> Libomptarget --> There are 16 bytes allocated at target address 0x00007fa582400000 - is new
> Libomptarget --> Looking up mapping(HstPtrBegin=0x00007ffc24203a68, Size=12)...
> Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffc24203a68, TgtPtrBegin=0x00007fa582400000, Size=12, updated RefCount=2
> Libomptarget --> There are 12 bytes allocated at target address 0x00007fa582400000 - is not new
> Libomptarget --> Has a pointer entry:
> Libomptarget --> Looking up mapping(HstPtrBegin=0x00007ffc24203a68, Size=8)...
> Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffc24203a68, TgtPtrBegin=0x00007fa582400000, Size=8, RefCount=2
> Libomptarget --> There are 8 bytes allocated at target address 0x00007fa582400000 - is not new
> Libomptarget --> Looking up mapping(HstPtrBegin=0x0000000000d89ff0, Size=4096)...
> Libomptarget --> Creating new map entry: HstBase=0x0000000000d89ff0, HstBegin=0x0000000000d89ff0, HstEnd=0x0000000000d8aff0, TgtBegin=0x00007fa582400200
> Libomptarget --> There are 4096 bytes allocated at target address 0x00007fa582400200 - is new
> Libomptarget --> Moving 4096 bytes (hst:0x0000000000d89ff0) -> (tgt:0x00007fa582400200)
> Libomptarget --> Update pointer (0x00007fa582400000) -> [0x00007fa582400200]
> Libomptarget --> Looking up mapping(HstPtrBegin=0x00007ffc24203a70, Size=4)...
> Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffc24203a70, TgtPtrBegin=0x00007fa582400008, Size=4, RefCount=2
> Libomptarget --> There are 4 bytes allocated at target address 0x00007fa582400008 - is not new
> Libomptarget --> DeviceTy::getMapEntry: requested entry found
> Libomptarget --> Looking up mapping(HstPtrBegin=0x00007ffc24203a68, Size=16)...
> Libomptarget --> Mapping exists with HstPtrBegin=0x00007ffc24203a68, TgtPtrBegin=0x00007fa582400000, Size=16, RefCount=2
> Libomptarget --> Obtained target argument 0x00007fa582400000 from host pointer 0x00007ffc24203a68
> Libomptarget --> loop trip count is 1024.
> Libomptarget --> Launching target execution __omp_offloading_801_1ee0443_main_l28 with pointer 0x0000000001427dc0 (index=0).
>
>
>
> When we process the 16-byte combined entry we allocate space for the struct and RefCount=1, then we process the 12-byte combined entry and RefCount is incremented to 2.
It indeed increases RefCount after I checked the code and you are right. I think it should not cause any problem? Because RefCount will be reduced before to 0 at exit (It looks like the combined entry's mapped twice, it should also be 'deleted' twice when the target region exits).
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D68100/new/
https://reviews.llvm.org/D68100
More information about the Openmp-commits
mailing list