[PATCH] D51875: [OPENMP][NVPTX] Add support for lastprivates/reductions handling in SPMD constructs with lightweight runtime.

Fri Sep 28 08:20:18 PDT 2018

ABataev added a comment.

In https://reviews.llvm.org/D51875#1249159, @Hahnfeld wrote:

> In https://reviews.llvm.org/D51875#1249153, @ABataev wrote:
>
> > In https://reviews.llvm.org/D51875#1249136, @ABataev wrote:
> >
> > > In https://reviews.llvm.org/D51875#1249122, @Hahnfeld wrote:
> > >
> > > > In https://reviews.llvm.org/D51875#1249092, @ABataev wrote:
> > > >
> > > > > In https://reviews.llvm.org/D51875#1249088, @ABataev wrote:
> > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > I don't see why the distribute loop cares which thread actually executes the last iteration of the `for` loop, that's only relevant in the outlined parallel region.
> > >
> > >
> > > Because it marks as lastprivate not the last loop chunk executed by the last thread, but the set of loop chunks executed by the last team. It means that when you try to write the lastprivate value after the distribute loop you will have multiple writes from the different threads with the different values of lastprivates.
> >
> >
> > Say, last distribute chunk is `[L, U]`. In the inner `for` directive it is split into `[L,U1], [U1+1, U2], ..., [Un-1 + 1, U]`. `Distribute` marks all these chunks as last, not the last `[Un-1 + 1, U]`.
>
>
> I got that. This is why the outer `distribute` only passes the global address for its last chunk. Then the inner `for` decides which thread executes `[Un-1 + 1, U]` and writes the lastprivate value.

Yes, that's right! You got it.

Repository:
  rL LLVM

https://reviews.llvm.org/D51875