[Openmp-commits] [PATCH] D62393: [OPENMP][NVPTX]Mark parallel level counter as volatile.

Wed Jun 19 01:29:45 PDT 2019

jdoerfert added a comment.

In D62393#1548388 <https://reviews.llvm.org/D62393#1548388>, @ABataev wrote:

> In D62393#1543858 <https://reviews.llvm.org/D62393#1543858>, @jdoerfert wrote:
>
> > In D62393#1542731 <https://reviews.llvm.org/D62393#1542731>, @ABataev wrote:
> >
> > > In D62393#1542638 <https://reviews.llvm.org/D62393#1542638>, @jdoerfert wrote:
> > >
> > > > In D62393#1542513 <https://reviews.llvm.org/D62393#1542513>, @ABataev wrote:
> > > >
> > > > > In D62393#1542471 <https://reviews.llvm.org/D62393#1542471>, @jdoerfert wrote:
> > > > >
> > > > > > I want to investigate the racy accesses further and make sure it is not a miscompile inside LLVM.
> > > > >
> > > > >
> > > > > This is not a problem inside LLVM. The problem  appears after optimizations performed by the ptxas tool (when it compiles PTX to SASS) at O3 <https://reviews.llvm.org/owners/package/3/> with the inlined runtime.
> > > > >
> > > > > > I extracted the test case (see below) but I was not seeing the `ERROR`. How did you run the test case to see a different value for `Count`?
> > > > >
> > > > > You need to compile it with the inlined runtime at O2 <https://reviews.llvm.org/owners/package/2/> or O3 <https://reviews.llvm.org/owners/package/3/>.
> > > >
> > > >
> > > > When I run 
> > > >  `./bin/clang -fopenmp-targets=nvptx64-nvida-cuda -O3 -fopenmp --cuda-path=/soft/compilers/cuda/cuda-9.1.85  -Xopenmp-target -march=sm_70  -fopenmp=libomp  test.c -o test.ll -emit-llvm -S`
> > > >  I get
> > > >
> > > >   https://gist.github.com/jdoerfert/4376a251d98171326d625f2fb67b5259
> > > >
> > > > which shows the inlined and optimized libomptarget.
> > > >
> > > > > And you need the latest version of the libomptarget
> > > >
> > > > My version is from today Jun 13 15:24:11 2019, git: 3bc6e2a7aa3853b06045c42e81af094647c48676 <https://reviews.llvm.org/rG3bc6e2a7aa3853b06045c42e81af094647c48676>
> > >
> > >
> > > We have problems in Cuda 8, at least, for arch sm_35
> >
> >
> > I couldn't get that version to run properly so I asked someone who had a system set up. 
> >  Unfortunately, the test.c [1] did not trigger the problem. In test.c we run the new test part in `spmd_parallel_regions.cpp` 1000 times and check the result each time.
> >  It was run with Cuda 8.0 for sm_35, sm_37, and sm_70.
> >
> > Could you share more information on how the system has to look to trigger the problem?
> >  Could you take a look at the test case we run and make sure it triggers the problem on your end?
> >
> > [1] https://gist.github.com/jdoerfert/d2b18ca8bb5c3443cc1d26b23236866f
>
>
> You need to apply the patch D62318 <https://reviews.llvm.org/D62318> to reproduce the problem for sure.

This means the problem, as of right now, does not exist, correct?
If so, what part of the D62318 <https://reviews.llvm.org/D62318> patch is causing the problem?

Does the `test.c` that I floated earlier expose the problem then or do I need a different test case?
What configuration are you running? Is it reproducible with Cuda 9/10 and sm_70?

Repository:
  rOMP OpenMP

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62393/new/

https://reviews.llvm.org/D62393