[Openmp-dev] [EXTERNAL] Re: OpenMP offloading app gets in unresponsive

Ye Luo via Openmp-dev openmp-dev at lists.llvm.org
Wed Sep 23 20:33:25 PDT 2020


I'm not aware of a way to find what header it was.
I think it is worth trying to use the same CUDA toolkit for building
clang+libomptarget and your app.
Ye
===================
Ye Luo, Ph.D.
Computational Science Division & Leadership Computing Facility
Argonne National Laboratory


On Wed, Sep 23, 2020 at 10:00 PM Itaru Kitayama <itaru.kitayama at gmail.com>
wrote:

> Hi Ye,
> How do I check the header consistency in libomptarget?
>
> On Thu, Sep 24, 2020 at 11:48 AM Ye Luo <xw111luoye at gmail.com> wrote:
> >
> > 1. Please show full call stack.
> > 2. are you able to run a very simple omp code like just an empty "omp
> target"
> > 3. My current feeling is that when you build libomptarget plugins, the
> cuda.h may not be consistent with
> /usr/local/software/jureca/Stages/2019a/software/nvidia/driver/lib64/libcuda.so.1
> > Ye
> > ===================
> > Ye Luo, Ph.D.
> > Computational Science Division & Leadership Computing Facility
> > Argonne National Laboratory
> >
> >
> > On Wed, Sep 23, 2020 at 9:22 PM Itaru Kitayama via Openmp-dev <
> openmp-dev at lists.llvm.org> wrote:
> >>
> >> If I run it with CUDA-gdb I get:
> >>
> >> Target CUDA RTL --> Init requires flags to 1
> >> Target CUDA RTL --> Getting device 0
> >> Target CUDA RTL --> The primary context is inactive, set its flags to
> >> CU_CTX_SCHED_BLOCKING_SYNC
> >> [New Thread 0x2aaaae5e3700 (LWP 4154)]
> >> ^C
> >> Thread 1 "nest" received signal SIGINT, Interrupt.
> >> 0x00002aaaad2e5a1c in cuVDPAUCtxCreate ()
> >>    from
> /usr/local/software/jureca/Stages/2019a/software/nvidia/driver/lib64/libcuda.so.1
> >>
> >> On Thu, Sep 24, 2020 at 8:56 AM Itaru Kitayama <
> itaru.kitayama at gmail.com> wrote:
> >> >
> >> >  With the Trunk Clang running with CUDA Toolkit 10.1.105 on JURECA at
> >> > JSC, I started seeing a hang up:
> >> >
> >> > Libomptarget --> Call to omp_get_num_devices returning 1
> >> > Libomptarget --> Default TARGET OFFLOAD policy is now mandatory
> >> > (devices were found)
> >> > Libomptarget --> Entering data begin region for device -1 with 1
> mappings
> >> > Libomptarget --> Use default device id 0
> >> > Libomptarget --> Checking whether device 0 is ready.
> >> > Libomptarget --> Is the device 0 (local ID 0) initialized? 0
> >> > Target CUDA RTL --> Init requires flags to 1
> >> > Target CUDA RTL --> Getting device 0
> >> > Target CUDA RTL --> The primary context is inactive, set its flags to
> >> > CU_CTX_SCHED_BLOCKING_SYNC
> >> >
> >> > Getting back to the prompt takes time or I needed to hit Ctrl + C or Z
> >> > hard many times.
> >> >
> >> > On Thu, Sep 24, 2020 at 7:31 AM Itaru Kitayama <
> itaru.kitayama at gmail.com> wrote:
> >> > >
> >> > > Should I back off from my ThunderX2 while fix is being developed?
> >> > >
> >> > > On Thu, Sep 24, 2020 at 5:36 AM Johannes Doerfert
> >> > > <johannesdoerfert at gmail.com> wrote:
> >> > > >
> >> > > > This could be a side effect of something else, namely the runtime
> >> > > > unloading order.
> >> > > > @Jon @Shilei where are we with fixing those issues?
> >> > > >
> >> > > > On 9/23/20 2:58 AM, Itaru Kitayama wrote:
> >> > > > > I think I was running my offloading app with CUDA Toolkit which
> is
> >> > > > > I've loaded via Spack, but
> >> > > > > the app itself is built with Clang (+CUDA Toolkit local admin
> provided
> >> > > > > via modules).
> >> > > > >
> >> > > > > However, the effect is this drastic; I mean locking totally up a
> >> > > > > ThunderX2 node?
> >> > > > >
> >> > > > > On Mon, Sep 21, 2020 at 9:58 PM Huber, Joseph <huberjn at ornl.gov>
> wrote:
> >> > > > >> The runtime library just calls abort() immediately after
> printing that last "Failure while offloading was mandatory" message. I'm
> not sure what would be causing the process to hang after that if SIGABRT
> isn't being caught.
> >> > > > >> ________________________________
> >> > > > >> From: Itaru Kitayama <itaru.kitayama at gmail.com>
> >> > > > >> Sent: Saturday, September 19, 2020 2:52 AM
> >> > > > >> To: Johannes Doerfert <johannesdoerfert at gmail.com>
> >> > > > >> Cc: openmp-dev <openmp-dev at lists.llvm.org>; Huber, Joseph <
> huberjn at ornl.gov>
> >> > > > >> Subject: [EXTERNAL] Re: [Openmp-dev] OpenMP offloading app
> gets in unresponsive
> >> > > > >>
> >> > > > >> I mean; the kernel gets aborted and I see a session prompt on
> JURECA at JSC.
> >> > > > >>
> >> > > > >> On Sat, Sep 19, 2020 at 3:38 PM Itaru Kitayama <
> itaru.kitayama at gmail.com> wrote:
> >> > > > >>> While it was observed on ThunderX2 with V100 system, I don't
> see it on
> >> > > > >>> JURECA (with GPUs).
> >> > > > >>>
> >> > > > >>> On Sat, Sep 19, 2020 at 1:21 PM Johannes Doerfert
> >> > > > >>> <johannesdoerfert at gmail.com> wrote:
> >> > > > >>>> I don't think so.
> >> > > > >>>> The only thing that comes to mind is that we switched to
> `abort` instead
> >> > > > >>>> of `exit` after the fatal error message.
> >> > > > >>>> Though, I'm not sure why that would cause the program to
> hang, except if
> >> > > > >>>> SIGABRT is somehow caught.
> >> > > > >>>>
> >> > > > >>>> ~ Johannes
> >> > > > >>>>
> >> > > > >>>> On 9/18/20 9:35 PM, Itaru Kitayama via Openmp-dev wrote:
> >> > > > >>>>> [...]
> >> > > > >>>>> Libomptarget error: Failed to synchronize device.
> >> > > > >>>>> Libomptarget error: Call to targetDataEnd failed, abort
> target.
> >> > > > >>>>> Libomptarget error: Failed to process data after launching
> the kernel.
> >> > > > >>>>> Libomptarget error: run with env LIBOMPTARGET_INFO>1 to dump
> >> > > > >>>>> host-targetpointer maps
> >> > > > >>>>> Libomptarget fatal error 1: failure of target construct
> while
> >> > > > >>>>> offloading is mandatory
> >> > > > >>>>>
> >> > > > >>>>> after this point, the process gets in the state of
> unresponsive and
> >> > > > >>>>> don't receive a signal from the user. Is this due to a new
> feature of
> >> > > > >>>>> LLVM?
> >> > > > >>>>> _______________________________________________
> >> > > > >>>>> Openmp-dev mailing list
> >> > > > >>>>> Openmp-dev at lists.llvm.org
> >> > > > >>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
> >> _______________________________________________
> >> Openmp-dev mailing list
> >> Openmp-dev at lists.llvm.org
> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20200923/74c516a7/attachment.html>


More information about the Openmp-dev mailing list