[PATCH] D79972: [OpenMP5.0] map item can be non-contiguous for target update
Chi Chun Chen via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Tue Jun 2 16:29:06 PDT 2020
cchen added a comment.
In D79972#2069435 <https://reviews.llvm.org/D79972#2069435>, @ABataev wrote:
> In D79972#2069366 <https://reviews.llvm.org/D79972#2069366>, @cchen wrote:
>
> > In D79972#2069358 <https://reviews.llvm.org/D79972#2069358>, @ABataev wrote:
> >
> > > In D79972#2069322 <https://reviews.llvm.org/D79972#2069322>, @cchen wrote:
> > >
> > > > In D79972#2068976 <https://reviews.llvm.org/D79972#2068976>, @ABataev wrote:
> > > >
> > > > > Still: Did you think about implementing it in the compiler instead of the runtime?
> > > >
> > > >
> > > > I'm not sure I understand your question, which part of code are you asking?
> > > > The main work compiler needs to do is to send the {offset, count, stride} struct to runtime.
> > >
> > >
> > > I mean did you think about calling `__tgt_target_data_update` function in a loop in the compiler-generated code instead of putting it into the runtime?
> >
> >
> > Oh, I would prefer to call `tgt_target_data_update` once in the compiler and I'm also doing it now.
>
>
> I was not quite correct. What I mean, is to generate the array with the array section as VLA in the compiler, and fill it in the loop generated by the compiler for non-contiguous sections but not in the runtime?
> Say, we have the code:
>
> int arr[3][3]
> ...
> #pragma omp update to(arr[1:2][1:2]
>
>
>
> In this case, we're going to transfer the next elements:
>
> 000
> 0xx
> 0xx
>
>
> In the compiler-generated code we emit something like this:
>
> void *bptr[<n>];
> void *ptr[<n>];
> int64 sizes[<n>];
> int64 maptypes[<n>];
> for (int i = 0; i < <n>; ++i) {
> bptr[i] = &arr[1+i][1];
> ptr[i] = &arr[1+i][1];
> sizes[i] = ...;'
> maptypes[i] = ...;
> }
> call void @__tgt_target_data_update(i64 -1, i32 <n>, bptr, ptr, sizes, maptypes);
>
>
> With this solution, you won't need to modify the runtime and add a new mapping flag.
For my current implementation, we have discussed in the bi-weekly meeting several weeks back, and there was a general consensus that it was an acceptable approach.
The major advantage of sending a descriptor to runtime can be elaborated in the following example:
#define N 10000
int a[N][2];
…
#pragma amp target update to (a[0:N][0:1])
This would require passing through O(N) entries in the tgt_target_data_update call, or 10000 entries. The current implementation only require a descriptor with 2 entries. I think this could be a real concern -
splitting out the transfers in compiler-generated code results in a list containing one entry per non-contiguous chunk (easily hitting scaling issues), while the descriptor approach is bounded by the number of dimensions.
That seems like a pretty compelling reason to use the descriptor - it’s much more space efficient.
Also, the descriptor idea is very similar to how Cray supported Fortran dope vectors for years (we send in a pointer to a dope vector rather than a pointer to the data, and a flag to indicate it’s a dope vector, and the runtime library handles it as a dope vector).
I think the runtime library changes will not be very extensive or difficult at all and we’re very willing to implement the runtime for non-contiguous.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D79972/new/
https://reviews.llvm.org/D79972
More information about the cfe-commits
mailing list