[Openmp-dev] [EXTERNAL] Re: OpenMP offloading app gets in unresponsive

Itaru Kitayama via Openmp-dev openmp-dev at lists.llvm.org
Wed Sep 23 16:56:36 PDT 2020


 With the Trunk Clang running with CUDA Toolkit 10.1.105 on JURECA at
JSC, I started seeing a hang up:

Libomptarget --> Call to omp_get_num_devices returning 1
Libomptarget --> Default TARGET OFFLOAD policy is now mandatory
(devices were found)
Libomptarget --> Entering data begin region for device -1 with 1 mappings
Libomptarget --> Use default device id 0
Libomptarget --> Checking whether device 0 is ready.
Libomptarget --> Is the device 0 (local ID 0) initialized? 0
Target CUDA RTL --> Init requires flags to 1
Target CUDA RTL --> Getting device 0
Target CUDA RTL --> The primary context is inactive, set its flags to
CU_CTX_SCHED_BLOCKING_SYNC

Getting back to the prompt takes time or I needed to hit Ctrl + C or Z
hard many times.

On Thu, Sep 24, 2020 at 7:31 AM Itaru Kitayama <itaru.kitayama at gmail.com> wrote:
>
> Should I back off from my ThunderX2 while fix is being developed?
>
> On Thu, Sep 24, 2020 at 5:36 AM Johannes Doerfert
> <johannesdoerfert at gmail.com> wrote:
> >
> > This could be a side effect of something else, namely the runtime
> > unloading order.
> > @Jon @Shilei where are we with fixing those issues?
> >
> > On 9/23/20 2:58 AM, Itaru Kitayama wrote:
> > > I think I was running my offloading app with CUDA Toolkit which is
> > > I've loaded via Spack, but
> > > the app itself is built with Clang (+CUDA Toolkit local admin provided
> > > via modules).
> > >
> > > However, the effect is this drastic; I mean locking totally up a
> > > ThunderX2 node?
> > >
> > > On Mon, Sep 21, 2020 at 9:58 PM Huber, Joseph <huberjn at ornl.gov> wrote:
> > >> The runtime library just calls abort() immediately after printing that last "Failure while offloading was mandatory" message. I'm not sure what would be causing the process to hang after that if SIGABRT isn't being caught.
> > >> ________________________________
> > >> From: Itaru Kitayama <itaru.kitayama at gmail.com>
> > >> Sent: Saturday, September 19, 2020 2:52 AM
> > >> To: Johannes Doerfert <johannesdoerfert at gmail.com>
> > >> Cc: openmp-dev <openmp-dev at lists.llvm.org>; Huber, Joseph <huberjn at ornl.gov>
> > >> Subject: [EXTERNAL] Re: [Openmp-dev] OpenMP offloading app gets in unresponsive
> > >>
> > >> I mean; the kernel gets aborted and I see a session prompt on JURECA at JSC.
> > >>
> > >> On Sat, Sep 19, 2020 at 3:38 PM Itaru Kitayama <itaru.kitayama at gmail.com> wrote:
> > >>> While it was observed on ThunderX2 with V100 system, I don't see it on
> > >>> JURECA (with GPUs).
> > >>>
> > >>> On Sat, Sep 19, 2020 at 1:21 PM Johannes Doerfert
> > >>> <johannesdoerfert at gmail.com> wrote:
> > >>>> I don't think so.
> > >>>> The only thing that comes to mind is that we switched to `abort` instead
> > >>>> of `exit` after the fatal error message.
> > >>>> Though, I'm not sure why that would cause the program to hang, except if
> > >>>> SIGABRT is somehow caught.
> > >>>>
> > >>>> ~ Johannes
> > >>>>
> > >>>> On 9/18/20 9:35 PM, Itaru Kitayama via Openmp-dev wrote:
> > >>>>> [...]
> > >>>>> Libomptarget error: Failed to synchronize device.
> > >>>>> Libomptarget error: Call to targetDataEnd failed, abort target.
> > >>>>> Libomptarget error: Failed to process data after launching the kernel.
> > >>>>> Libomptarget error: run with env LIBOMPTARGET_INFO>1 to dump
> > >>>>> host-targetpointer maps
> > >>>>> Libomptarget fatal error 1: failure of target construct while
> > >>>>> offloading is mandatory
> > >>>>>
> > >>>>> after this point, the process gets in the state of unresponsive and
> > >>>>> don't receive a signal from the user. Is this due to a new feature of
> > >>>>> LLVM?
> > >>>>> _______________________________________________
> > >>>>> Openmp-dev mailing list
> > >>>>> Openmp-dev at lists.llvm.org
> > >>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev


More information about the Openmp-dev mailing list