[Openmp-dev] Target CUDA RTL --> The primary context is inactive, set its flags to CU_CTX_SCHED_BLOCKING_SYNC

Ye Luo via Openmp-dev openmp-dev at lists.llvm.org
Mon Sep 28 15:44:53 PDT 2020


Could you provide
`which nvcc`
`nvcc --version`
`ldd /p/project/cjzam11/kitayama1/opt/clang/current/lib/
libomptarget.rtl.cuda.so`
and nvidia-smi output?
Ye

===================
Ye Luo, Ph.D.
Computational Science Division & Leadership Computing Facility
Argonne National Laboratory


On Mon, Sep 28, 2020 at 5:11 PM Itaru Kitayama via Openmp-dev <
openmp-dev at lists.llvm.org> wrote:

> This happens an unpredictable way even though I launch the app the same
> way.
>
> On Mon, Sep 28, 2020 at 7:34 AM Itaru Kitayama <itaru.kitayama at gmail.com>
> wrote:
> >
> > No, I take that back. Here's the backtrace:
> >
> > (gdb) where
> > #0  0x00002aaaaaacd6c2 in clock_gettime ()
> > #1  0x00002aaaabd167fd in clock_gettime () from /usr/lib64/libc.so.6
> > #2  0x00002aaaac97837e in ?? ()
> >    from
> /usr/local/software/jureca/Stages/2019a/software/nvidia/driver/lib64/libcuda.so.1
> > #3  0x00002aaaaca3c4f7 in ?? ()
> >    from
> /usr/local/software/jureca/Stages/2019a/software/nvidia/driver/lib64/libcuda.so.1
> > #4  0x00002aaaac87240a in ?? ()
> >    from
> /usr/local/software/jureca/Stages/2019a/software/nvidia/driver/lib64/libcuda.so.1
> > #5  0x00002aaaac91bfbe in ?? ()
> >    from
> /usr/local/software/jureca/Stages/2019a/software/nvidia/driver/lib64/libcuda.so.1
> > #6  0x00002aaaac91e0d7 in ?? ()
> >    from
> /usr/local/software/jureca/Stages/2019a/software/nvidia/driver/lib64/libcuda.so.1
> > #7  0x00002aaaac848719 in ?? ()
> >    from
> /usr/local/software/jureca/Stages/2019a/software/nvidia/driver/lib64/libcuda.so.1
> > #8  0x00002aaaac9ba15e in cuDevicePrimaryCtxRetain ()
> >    from
> /usr/local/software/jureca/Stages/2019a/software/nvidia/driver/lib64/libcuda.so.1
> > #9  0x00002aaaac514757 in __tgt_rtl_init_device ()
> >    from /p/project/cjzam11/kitayama1/opt/clang/current/lib/
> libomptarget.rtl.cuda.so
> > #10 0x00002aaaab9b88bb in DeviceTy::init() ()
> >    from
> /p/project/cjzam11/kitayama1/opt/clang/current/lib/libomptarget.so
> > #11 0x00002aaaac279348 in std::__1::__call_once(unsigned long
> > volatile&, void*, void (*)(void*)) ()
> >    from /p/project/cjzam11/kitayama1/opt/clang/current/lib/libc++.so.1
> > #12 0x00002aaaab9b8d88 in device_is_ready(int) ()
> >    from
> /p/project/cjzam11/kitayama1/opt/clang/current/lib/libomptarget.so
> > #13 0x00002aaaab9c5296 in CheckDeviceAndCtors(long) ()
> >    from
> /p/project/cjzam11/kitayama1/opt/clang/current/lib/libomptarget.so
> > #14 0x00002aaaab9bbead in __tgt_target_data_begin_mapper ()
> >    from
> /p/project/cjzam11/kitayama1/opt/clang/current/lib/libomptarget.so
> > #15 0x00002aaaaabfaa58 in nest::SimulationManager::initialize()
> (this=0x5d3290)
> >     at
> /p/project/cjzam11/kitayama1/projects/nest-simulator/nestkernel/simulation_manager.cpp:76
> > #16 0x00002aaaaabf2c69 in nest::KernelManager::initialize()
> (this=0x5d3190)
> >     at
> /p/project/cjzam11/kitayama1/projects/nest-simulator/nestkernel/kernel_manager.cpp:88
> > #17 0x0000000000405769 in neststartup(int*, char***, SLIInterpreter&) (
> >     argc=argc at entry=0x7fffffff0a84, argv=argv at entry=0x7fffffff0a88,
> engine=...)
> >     at
> /p/project/cjzam11/kitayama1/projects/nest-simulator/nest/neststartup.cpp:87
> > #18 0x0000000000405650 in main (argc=<optimized out>, argv=<optimized
> out>)
> >     at
> /p/project/cjzam11/kitayama1/projects/nest-simulator/nest/main.cpp:42
> >
> > On Mon, Sep 28, 2020 at 5:22 AM Itaru Kitayama <itaru.kitayama at gmail.com>
> wrote:
> > >
> > > I obtained a desired result (a crash) without a Spack environment.
> > >
> > > On Sun, Sep 27, 2020 at 1:13 PM Itaru Kitayama <
> itaru.kitayama at gmail.com> wrote:
> > > >
> > > > (gdb) where
> > > > #0  0x00002aaaaaacd6c2 in clock_gettime ()
> > > > #1  0x00002aaaabd347fd in clock_gettime () from /usr/lib64/libc.so.6
> > > > #2  0x00002aaaac98737e in ?? ()
> > > >    from
> /usr/local/software/jureca/Stages/2019a/software/nvidia/driver/lib64/libcuda.so.1
> > > > #3  0x00002aaaaca4b4f7 in ?? ()
> > > >    from
> /usr/local/software/jureca/Stages/2019a/software/nvidia/driver/lib64/libcuda.so.1
> > > > #4  0x00002aaaac88140a in ?? ()
> > > >    from
> /usr/local/software/jureca/Stages/2019a/software/nvidia/driver/lib64/libcuda.so.1
> > > > #5  0x00002aaaac92afbe in ?? ()
> > > >    from
> /usr/local/software/jureca/Stages/2019a/software/nvidia/driver/lib64/libcuda.so.1
> > > > #6  0x00002aaaac92d0d7 in ?? ()
> > > >    from
> /usr/local/software/jureca/Stages/2019a/software/nvidia/driver/lib64/libcuda.so.1
> > > > #7  0x00002aaaac857719 in ?? ()
> > > >    from
> /usr/local/software/jureca/Stages/2019a/software/nvidia/driver/lib64/libcuda.so.1
> > > > #8  0x00002aaaac9c915e in cuDevicePrimaryCtxRetain ()
> > > >    from
> /usr/local/software/jureca/Stages/2019a/software/nvidia/driver/lib64/libcuda.so.1
> > > > #9  0x00002aaaac523757 in __tgt_rtl_init_device ()
> > > >    from /p/project/cjzam11/kitayama1/opt/clang/current/lib/
> libomptarget.rtl.cuda.so
> > > > #10 0x00002aaaaaca28bb in DeviceTy::init() ()
> > > >    from
> /p/project/cjzam11/kitayama1/opt/clang/current/lib/libomptarget.so
> > > > #11 0x00002aaaac297348 in std::__1::__call_once(unsigned long
> > > > volatile&, void*, void (*)(void*)) ()
> > > >    from
> /p/project/cjzam11/kitayama1/opt/clang/current/lib/libc++.so.1
> > > > #12 0x00002aaaaaca2d88 in device_is_ready(int) ()
> > > >    from
> /p/project/cjzam11/kitayama1/opt/clang/current/lib/libomptarget.so
> > > > #13 0x00002aaaaacaf296 in CheckDeviceAndCtors(long) ()
> > > >    from
> /p/project/cjzam11/kitayama1/opt/clang/current/lib/libomptarget.so
> > > > #14 0x00002aaaaaca5ead in __tgt_target_data_begin_mapper ()
> > > >    from
> /p/project/cjzam11/kitayama1/opt/clang/current/lib/libomptarget.so
> > > > #15 0x00002aaaab3a4958 in nest::SimulationManager::initialize()
> (this=0x5d3480)
> > > >     at
> /p/project/cjzam11/kitayama1/projects/nest-simulator/nestkernel/simulation_manager.cpp:76
> > > > #16 0x00002aaaab39cbb9 in nest::KernelManager::initialize()
> (this=0x5d3380)
> > > >     at
> /p/project/cjzam11/kitayama1/projects/nest-simulator/nestkernel/kernel_manager.cpp:88
> > > > #17 0x0000000000405769 in neststartup(int*, char***,
> SLIInterpreter&) (
> > > >     argc=argc at entry=0x7ffffffee554, argv=argv at entry=0x7ffffffee558,
> engine=...)
> > > >     at
> /p/project/cjzam11/kitayama1/projects/nest-simulator/nest/neststartup.cpp:87
> > > > #18 0x0000000000405650 in main (argc=<optimized out>,
> argv=<optimized out>)
> > > >     at
> /p/project/cjzam11/kitayama1/projects/nest-simulator/nest/main.cpp:42
> > > >
> > > > On Sun, Sep 27, 2020 at 12:55 PM Itaru Kitayama
> > > > <itaru.kitayama at gmail.com> wrote:
> > > > >
> > > > >  and when this happens, no signal can get caught immediately by
> the system.
> > > > >
> > > > > On Sun, Sep 27, 2020 at 12:52 PM Itaru Kitayama
> > > > > <itaru.kitayama at gmail.com> wrote:
> > > > > >
> > > > > > I see often when executing my work-in-the-progress offloading
> app on X86
> > > > > > with an older NVIDIA GPU (sm_35). Can someone enlighten me on
> this so I
> > > > > > can solve it quickly?
> > > > > >
> > > > > > Thanks,
> _______________________________________________
> Openmp-dev mailing list
> Openmp-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20200928/97d6c04c/attachment-0001.html>


More information about the Openmp-dev mailing list