[Openmp-commits] [PATCH] D51875: [OPENMP][NVPTX] Add support for lastprivates/reductions handling in SPMD constructs with lightweight runtime.
Alexey Bataev via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Fri Sep 28 08:27:18 PDT 2018
ABataev added a comment.
In https://reviews.llvm.org/D51875#1249164, @Hahnfeld wrote:
> In https://reviews.llvm.org/D51875#1249162, @ABataev wrote:
> > In https://reviews.llvm.org/D51875#1249159, @Hahnfeld wrote:
> > > In https://reviews.llvm.org/D51875#1249153, @ABataev wrote:
> > >
> > > > Say, last distribute chunk is `[L, U]`. In the inner `for` directive it is split into `[L,U1], [U1+1, U2], ..., [Un-1 + 1, U]`. `Distribute` marks all these chunks as last, not the last `[Un-1 + 1, U]`.
> > >
> > >
> > > I got that. This is why the outer `distribute` only passes the global address for its last chunk. Then the inner `for` decides which thread executes `[Un-1 + 1, U]` and writes the lastprivate value.
> > Yes, that's right! You got it.
> So now you are agreeing to "my" solution which is different than what Clang currently does - I'm confused.
No, I do not agree with your solution, I thought you agreed with the implemented one. You said that you understood that actually inner `for` loop decides which chunk is actually the last one. And because of that, we need to share the `distribute` private copy of the lastprivate variable, so all the threads in the inner `parallel` region could modify it.
More information about the Openmp-commits