[Openmp-commits] [PATCH] D62393: [OPENMP][NVPTX]Mark parallel level counter as volatile.

Fri Jun 14 13:08:35 PDT 2019

ABataev added a comment.

In D62393#1543858 <https://reviews.llvm.org/D62393#1543858>, @jdoerfert wrote:

> In D62393#1542731 <https://reviews.llvm.org/D62393#1542731>, @ABataev wrote:
>
> > In D62393#1542638 <https://reviews.llvm.org/D62393#1542638>, @jdoerfert wrote:
> >
> > > In D62393#1542513 <https://reviews.llvm.org/D62393#1542513>, @ABataev wrote:
> > >
> > > > In D62393#1542471 <https://reviews.llvm.org/D62393#1542471>, @jdoerfert wrote:
> > > >
> > > > > I want to investigate the racy accesses further and make sure it is not a miscompile inside LLVM.
> > > >
> > > >
> > > > This is not a problem inside LLVM. The problem  appears after optimizations performed by the ptxas tool (when it compiles PTX to SASS) at O3 <https://reviews.llvm.org/owners/package/3/> with the inlined runtime.
> > > >
> > > > > I extracted the test case (see below) but I was not seeing the `ERROR`. How did you run the test case to see a different value for `Count`?
> > > >
> > > > You need to compile it with the inlined runtime at O2 <https://reviews.llvm.org/owners/package/2/> or O3 <https://reviews.llvm.org/owners/package/3/>.
> > >
> > >
> > > When I run 
> > >  `./bin/clang -fopenmp-targets=nvptx64-nvida-cuda -O3 -fopenmp --cuda-path=/soft/compilers/cuda/cuda-9.1.85  -Xopenmp-target -march=sm_70  -fopenmp=libomp  test.c -o test.ll -emit-llvm -S`
> > >  I get
> > >
> > >   https://gist.github.com/jdoerfert/4376a251d98171326d625f2fb67b5259
> > >
> > > which shows the inlined and optimized libomptarget.
> > >
> > > > And you need the latest version of the libomptarget
> > >
> > > My version is from today Jun 13 15:24:11 2019, git: 3bc6e2a7aa3853b06045c42e81af094647c48676 <https://reviews.llvm.org/rG3bc6e2a7aa3853b06045c42e81af094647c48676>
> >
> >
> > We have problems in Cuda 8, at least, for arch sm_35
>
>
> I couldn't get that version to run properly so I asked someone who had a system set up. 
>  Unfortunately, the test.c [1] did not trigger the problem. In test.c we run the new test part in `spmd_parallel_regions.cpp` 1000 times and check the result each time.
>  It was run with Cuda 8.0 for sm_35, sm_37, and sm_70.
>
> Could you share more information on how the system has to look to trigger the problem?
>  Could you take a look at the test case we run and make sure it triggers the problem on your end?
>
> [1] https://gist.github.com/jdoerfert/d2b18ca8bb5c3443cc1d26b23236866f

Will provide additional info on Tuesday. Most probably, this simplified test does not trigger the problem in your c9nfiguration. Will send the original complex test that triggers the problem.

Repository:
  rOMP OpenMP

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D62393/new/

https://reviews.llvm.org/D62393