[Openmp-commits] [PATCH] D86119: [OPENMP50]Allow overlapping mapping in target constrcuts.
Alexey Bataev via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Mon Dec 7 11:04:52 PST 2020
ABataev added inline comments.
================
Comment at: openmp/libomptarget/src/omptarget.cpp:233
MapperComponents
- .Components[target_data_function == targetDataEnd ? I : E - I - 1];
+ .Components[target_data_function == targetDataEnd ? E - I - 1 : I];
MapperArgsBase[I] = C.Base;
----------------
ye-luo wrote:
> ABataev wrote:
> > ye-luo wrote:
> > > ABataev wrote:
> > > > ye-luo wrote:
> > > > > ABataev wrote:
> > > > > > ye-luo wrote:
> > > > > > > ye-luo wrote:
> > > > > > > > ye-luo wrote:
> > > > > > > > > ABataev wrote:
> > > > > > > > > > ABataev wrote:
> > > > > > > > > > > ye-luo wrote:
> > > > > > > > > > > > ye-luo wrote:
> > > > > > > > > > > > > ye-luo wrote:
> > > > > > > > > > > > > > ABataev wrote:
> > > > > > > > > > > > > > > ye-luo wrote:
> > > > > > > > > > > > > > > > ABataev wrote:
> > > > > > > > > > > > > > > > > ABataev wrote:
> > > > > > > > > > > > > > > > > > ye-luo wrote:
> > > > > > > > > > > > > > > > > > > ye-luo wrote:
> > > > > > > > > > > > > > > > > > > > ABataev wrote:
> > > > > > > > > > > > > > > > > > > > > ye-luo wrote:
> > > > > > > > > > > > > > > > > > > > > > ABataev wrote:
> > > > > > > > > > > > > > > > > > > > > > > grokos wrote:
> > > > > > > > > > > > > > > > > > > > > > > > What is the current status of the order of the arguments clang emits? Is it still necessary to traverse arguments in reverse order here?
> > > > > > > > > > > > > > > > > > > > > > > Yes, still required
> > > > > > > > > > > > > > > > > > > > > > Based on the conversation in
> > > > > > > > > > > > > > > > > > > > > > https://reviews.llvm.org/D85216
> > > > > > > > > > > > > > > > > > > > > > This line of code neither before nor after the change plays well.
> > > > > > > > > > > > > > > > > > > > > > Shall we fix the order in targetDataEnd first?
> > > > > > > > > > > > > > > > > > > > > This change is part of this patch and cannot be committed separately.
> > > > > > > > > > > > > > > > > > > > I mean could you fix that issue as a parent of this patch?
> > > > > > > > > > > > > > > > > > > > This change is part of this patch and cannot be committed separately.
> > > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > > If fixing the reordering is part of this patch, I should have seen "target_data_function == targetDataEnd ?" branches disappear.
> > > > > > > > > > > > > > > > > > Nope, just with this patch. It reorders the maps and need to change the cleanup order too.
> > > > > > > > > > > > > > > > > It works just like constructors/destructors: allocate in direct order, deallocate in reversed to correctly handle map order.
> > > > > > > > > > > > > > > > The description says that "present and alloc mappings are processed first and then all others."
> > > > > > > > > > > > > > > > Why the order of arguments in targetDataBegin, targetDataEnd and targetDataUpdate all get reversed.
> > > > > > > > > > > > > > > Because this is for mappers. Mapper maps are ordered by the compiler in the direct order (alloc, maps, delete) but when we need to do exit, we need to release the data in reversed order (deletes, maps, allocs).
> > > > > > > > > > > > > > I was not making the question clear. My question about "reverse" is not about having a reverse order for targetDataBegin. My question was about "reversing" from the the old code. Your change put the opposite order for targetDataBegin, targetDataEnd and targetDataUpdate cases.
> > > > > > > > > > > > > > I was not making the question clear. My question about "reverse" is not about having a reverse order for targetDataBegin. My question was about "reversing" from the the old code. Your change put the opposite order for targetDataBegin, targetDataEnd and targetDataUpdate cases.
> > > > > > > > > > > > >
> > > > > > > > > > > > > typo correction
> > > > > > > > > > > > > I was not making the question clear. My question about "reverse" is not about having a reverse order for **targetDataEnd**. My question was about "reversing" from the the old code. Your change put the opposite order for targetDataBegin, targetDataEnd and targetDataUpdate cases.
> > > > > > > > > > > > My separate question specifically for targetDataEnd is the following.
> > > > > > > > > > > >
> > > > > > > > > > > > In target(), we call
> > > > > > > > > > > > ```
> > > > > > > > > > > > targetDataBegin(args)
> > > > > > > > > > > > { // forward order
> > > > > > > > > > > > for (int32_t i = 0; i < arg_num; ++i) { ... }
> > > > > > > > > > > > }
> > > > > > > > > > > > launch_kernels
> > > > > > > > > > > > targetDataEnd(args)
> > > > > > > > > > > > { // reverse order
> > > > > > > > > > > > for (int32_t I = ArgNum - 1; I >= 0; --I) { }
> > > > > > > > > > > > }
> > > > > > > > > > > > ```
> > > > > > > > > > > >
> > > > > > > > > > > > At a mapper,
> > > > > > > > > > > > ```
> > > > > > > > > > > > targetDataMapper
> > > > > > > > > > > > {
> > > > > > > > > > > > // generate args_reverse in reverse order for targetDataEnd
> > > > > > > > > > > > targetDataEnd(args_reverse)
> > > > > > > > > > > > }
> > > > > > > > > > > > ```
> > > > > > > > > > > > Are we actually getting the original forward order due to one reverse in targetDataMapper and second reverse in targetDataEnd? Is this the desired behavior? This part confused me. Do I miss something? Could you explain a bit?
> > > > > > > > > > > Yes, something like this. targetDataEnd reverses the order of mapping arrays. But mapper generator always generates mapping arrays in the direct order (it fills mapping arrays that later processed by the targetDataEnd function). We could fix this by passing extra Boolean flag to the generator function but it means the redesign of the mappers. That's why we have to reverse it in the libomptarget.
> > > > > > > > > > You can check it yourself. Apply the patch, restore the original behavior in libomptarget and run libomptarget tests. Mapper related tests will crash.
> > > > > > > > > Stick with mapper generator always generating mapping arrays in the direct order. The targetDataMapper reverse the mapping array and then passes args_reverse into targetDataEnd. Inside targetDataEnd, mapping
> > > > > > > > > Yes, something like this. targetDataEnd reverses the order of mapping arrays. But mapper generator always generates mapping arrays in the direct order (it fills mapping arrays that later processed by the targetDataEnd function). We could fix this by passing extra Boolean flag to the generator function but it means the redesign of the mappers. That's why we have to reverse it in the libomptarget.
> > > > > > > >
> > > > > > > > Stick with mapper generator always generating mapping arrays in the direct order.
> > > > > > > >
> > > > > > > > In the targetDataBegin case, targetDataMapper keep direct order args and calls targetDataBegin(args) and targetDataBegin process args in direct order.
> > > > > > > >
> > > > > > > > In the targetDataEnd case, targetDataMapper reverses the mapping array and then passes args_reverse into targetDataEnd. Inside targetDataEnd, args_reverse are processed in reverse order. So targetDataEnd is actually processing the args in original direct order. This seems contradictory to the constructor/deconstructor like behavior that all the mappings must be processed in the actual reverse order in targetDataEnd.
> > > > > > > >
> > > > > > > > This is my understanding. The current code should be wrong but obviously the current code is working. So why the current code is working? what is inconsistent in my analysis. Could you point out the missing piece.
> > > > > > > > You can check it yourself. Apply the patch, restore the original behavior in libomptarget and run libomptarget tests. Mapper related tests will crash.
> > > > > > >
> > > > > > > For sure without this line, tests would crash and that is why you included this line of change in the patch. Since you made the change, you could explain why, right?
> > > > > > I changed and simplified codegen for the mapper generator without changing its interface. I could do this because of the new ordering, before we could not rely on it. But it also requires a change in the runtime.
> > > > > > targetDataEnd calls targetDataMapper and targetDataMapper fills the array in the direct order, but targetDataEnd processes them in the reverse order, but mapper generator does not know about it. It also has to generate the data in the reverse order, just like targetDataEnd does.
> > > > > >
> > > > > > Before this patch mapper generator tried to do some ordering but it was not always correct. It was not expecting something like map(alloc:s) map(s.a) because it was not allowed by the compiler. That's why it worked before and won't work with this patch.
> > > > > > PS. The change in the mapper generator is also required and cannot be separated. Without this mappers tests won't work.
> > > > > I played a bit with your patch.
> > > > > ```
> > > > > #pragma omp target exit data map(from: c.a[0:NUM], c.b[0:NU2M]) map(delete: c)
> > > > > ```
> > > > > I put NUM=1024 and NU2M = 2048.
> > > > > LIBOMPTARGET_DEBUG reports
> > > > > ```
> > > > > Libomptarget --> Entry 0: Base=0x00007fff064080a8, Begin=0x00007fff064080a8, Size=16, Type=0x0, Name=(null)
> > > > > Libomptarget --> Entry 1: Base=0x00007fff064080a8, Begin=0x0000000000f9cbd0, Size=4096, Type=0x1000000000012, Name=(null)
> > > > > Libomptarget --> Entry 2: Base=0x00007fff064080b0, Begin=0x0000000000f86e10, Size=8192, Type=0x1000000000012, Name=(null)
> > > > > Libomptarget --> Entry 3: Base=0x00007fff064080a8, Begin=0x00007fff064080a8, Size=16, Type=0x1000000000008, Name=(null)
> > > > > ```
> > > > > Since targetDataEnd internally reverse the processing order, could you confirm that the frontend was emitting entries 3,2,1,0?
> > > > > I'm wondering if the frontend could emit 3, 0, 1, 2 so the processing order is 2,1,0,3? The spec requires struct element processed before the struct in "target exit data"
> > > > No, the frontend emits in the order 0, 1, 2, 3. targetDataEnd process in reversed order 3, 2, 1, 0, but the mapper does not know about it and still emits the data in the order 0, 1, 2, 3.
> > > > And it is only for mappers!
> > > > So, say you have an extra map something like map(a).
> > > > ```
> > > > map (a) map(mapper(id), tofrom: c)
> > > > ```
> > > > where mapper for с does something like you wrote.
> > > >
> > > > In this case the order would be 0, 1, 2, 3, 4, where 0 is mapping of a and 1-4 is mapping of c.
> > > > When we need to delete the data, the mapper still would generate 1,2,3,4 + 0 for mapping of a, but targetDataEnd expects 4,3,2,1,0. That's why we have to reverse the mapping data, produced by the mapper generator for targetDataEnd.
> > > Let us leave the mapper case aside which has extra mess.
> > > Double checked that "Libomptarget --> Entry 0" is printed at the __tgt_target_data_end_mapper. So the order is as you said 0, 1, 2, 3 from the frontend. What is the "Entry 0"? more specifically is the difference between entry 0 and 3? Entry 0 seems to be an implicit map while 3 is explicit.
> > >
> > > #pragma omp target exit data map(from: c.a[0:NUM], c.b[0:NU2M]) map(delete: c)
> > > the "map(delete: c)" has some state machine to protect the delete due to ordering.
> > > So I'm wondering why the frontend must issue both 0 and 3. Can the front end fuse 0 and 3?
> > > I mean the frontend generates 3, 1, 2 and the runtime processing 2,1,3 without the deleting issue?
> > Entry 0 is the address that should be passed to the kernel (for the captured variable, that's why it is marked as TGT_TARGET_PARAM - target kernel argument), entries 1-3 are the actual mappings. Yes, I assume the frontend can fuse these entries in many cases, but it is different problem that should be addressed in a separate patch.
> There is no kernel associated enter/exit data...
>
> Thanks for answering all my puzzles. If you think some of my questions can be better answered by some documentation, please point me if there are any.
>
> I think both the frontend and runtime needs to be further consolidated to reduce side effects in future patches. The current patch in the runtime library part looks good to me for the current need.
>
> Please ping appropriate reviewers for the frontend change, so we can keep this patch moving.
No problem!
Yes, there is no kernel, missed it, but still it is the reason. The codegen you see is not directly related to this patch, this is just how the mapping currently works, it is just not quite optimal. Sure, it can be improved in many cases but just like I said it is different problem that should be addressed in separate patch(es). Also, I believe some of the optimizations can be implemented in OpenMPOpt pass.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D86119/new/
https://reviews.llvm.org/D86119
More information about the Openmp-commits
mailing list