[Openmp-dev] Libomptarget fatal error 1: '#pragma omp requires unified_shared_memory' not used consistently!

Itaru Kitayama via Openmp-dev openmp-dev at lists.llvm.org
Mon Mar 1 14:41:24 PST 2021


Ok, thanks. But I wonder given there are USM tests in C; no one noticed the
errors so far?

On Tue, Mar 2, 2021 at 7:39 Johannes Doerfert <johannesdoerfert at gmail.com>
wrote:

> Don't use required USM for now I would assume.
>
> On 3/1/21 4:35 PM, Itaru Kitayama wrote:
> > I’m on JURECA and some nodes are attached to A100 GPUs.
> >
> > On Tue, Mar 2, 2021 at 7:34 Itaru Kitayama <itaru.kitayama at gmail.com>
> wrote:
> >
> >> Hi all,
> >> In the mean time, what do I do?
> >>
> >> On Tue, Mar 2, 2021 at 3:23 Johannes Doerfert <
> johannesdoerfert at gmail.com>
> >> wrote:
> >>
> >>> I think that is it. I heard of problems with our USM before.
> >>> We need to use the managed allocators if USM is active, they are
> >>> about to be upstreamed (I hope).
> >>>
> >>>
> >>> On 3/1/21 12:15 PM, Alexey.Bataev wrote:
> >>>> Looks like this example is for Explicit USM and I assume if you
> allocate
> >>>> the memory for a in managed memory explicitly, the OpenMP example also
> >>>> should work.
> >>>>
> >>>> There are other USM modes though, where the memory is shared
> implicitly
> >>>> between the host and the devices. Looks like currently LLVM
> >>>> implementation relies on this thing
> >>>>
> >>>
> https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-system-allocator
> >>>> where Implicit USM is supported.
> >>>>
> >>>> -------------
> >>>> Best regards,
> >>>> Alexey Bataev
> >>>>
> >>>> 3/1/2021 1:04 PM, Joachim Protze пишет:
> >>>>> Are the Kernel/Hardware requirements llvm specific?
> >>>>>
> >>>>> I can compile and execute the add_grid.cu example sucessfully:
> >>>>> https://developer.nvidia.com/blog/unified-memory-cuda-beginners/
> >>>>>
> >>>>> So, I would expect that an OpenMP program should also run
> sucessfully.
> >>>>>
> >>>>> - Joachim
> >>>>>
> >>>>>
> >>>>> Am 01.03.21 um 18:49 schrieb Alexey.Bataev:
> >>>>>> Hi, I you sure that you system supports Unified Shared Memory? As
> far
> >>> as
> >>>>>> I know it requires special linux kernel and the hardware must
> support
> >>>>>> it. If you system does not support it, the code will crash for sure
> at
> >>>>>> the runtime.
> >>>>>>
> >>>>>> In this mode, IIRC, we just ignore map clauses since the accelerator
> >>>>>> devices can access the host memory directly without the need for
> >>>>>> allocating the device-specific memory.
> >>>>>>
> >>>>>>
> >>>>>> -------------
> >>>>>> Best regards,
> >>>>>> Alexey Bataev
> >>>>>>
> >>>>>> 3/1/2021 12:41 PM, Joachim Protze пишет:
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> even a more simple example segfaults, when the requires directive
> is
> >>> there:
> >>>>>>> #include <iostream>
> >>>>>>> #include <omp.h>
> >>>>>>> #include <stdio.h>
> >>>>>>>
> >>>>>>> #pragma omp requires unified_shared_memory
> >>>>>>> #define N 1024
> >>>>>>>
> >>>>>>> int main() {
> >>>>>>>     int a[N];
> >>>>>>>     printf("a=%p\n", a);
> >>>>>>> #pragma omp target map(tofrom : a[0:N])
> >>>>>>>     {
> >>>>>>>       printf("a=%p\n", a);
> >>>>>>>       for (int i = 0; i < 1024; i++) {
> >>>>>>>         a[i] = 123;
> >>>>>>>       }
> >>>>>>>     }
> >>>>>>>     printf("a[0]=%i, a[%i]=%i\n", a[0], N/2, a[N/2]);
> >>>>>>> }
> >>>>>>>
> >>>>>>> The code runs sucessfully when the requires directive is removed
> >>> because
> >>>>>>> the mapping of `a` is explicitly specified.
> >>>>>>>
> >>>>>>> For this code to run successfully, would it be necessary to
> allocate
> >>> `a`
> >>>>>>> specially as cuda managed memory? I don't see any special treatment
> >>> of
> >>>>>>> `a` in llvm ir. As I understand the OpenMP spec, the requires
> >>> directive
> >>>>>>> should lead to a compile error if clang fails to generate such
> code.
> >>>>>>>
> >>>>>>> The requires example from the OpenMP Examples also fails with the
> >>> same
> >>>>>>> runtime error:
> >>>>>>>
> >>>>>>>
> >>>
> https://github.com/OpenMP/Examples/blob/main/sources/Example_requires.1.cpp
> >>>>>>> - Joachim
> >>>>>>>
> >>>>>>> Am 28.02.21 um 11:12 schrieb Itaru Kitayama via Openmp-dev:
> >>>>>>>> This is the code:
> >>>>>>>>
> >>>>>>>> #include <iostream>
> >>>>>>>> #include <omp.h>
> >>>>>>>>
> >>>>>>>> #pragma omp requires unified_shared_memory
> >>>>>>>> #define N 1024
> >>>>>>>>
> >>>>>>>> int main() {
> >>>>>>>>     int a[N] = {0};
> >>>>>>>>     int *device_data =  new int[N];
> >>>>>>>> #pragma omp target map(tofrom : device_data[0:N])
> >>>>>>>>     {
> >>>>>>>>       device_data = &a[0];
> >>>>>>>>       for (int i = 0; i < 1024; i++) {
> >>>>>>>>         device_data[i] = 123;
> >>>>>>>>       }
> >>>>>>>>     }
> >>>>>>>>     std::cout << a[0] << std::endl;
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> On Sun, Feb 28, 2021 at 1:34 PM Johannes Doerfert
> >>>>>>>> <johannesdoerfert at gmail.com> wrote:
> >>>>>>>>> You have an illegal memory access, some memory is not properly
> >>>>>>>>> mapped.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On 2/27/21 7:47 PM, Itaru Kitayama wrote:
> >>>>>>>>>> Removed the internal function, but I get:
> >>>>>>>>>>
> >>>>>>>>>> CUDA device 0 info: Device supports up to 65536 CUDA blocks and
> >>> 1024
> >>>>>>>>>> threads with a warp size of 32
> >>>>>>>>>> CUDA device 0 info: Launching kernel
> >>>>>>>>>> __omp_offloading_34_8009dd23_main_l12 with 1 blocks and 33
> >>> threads in
> >>>>>>>>>> Generic mode
> >>>>>>>>>> CUDA error: Error when synchronizing stream. stream =
> >>>>>>>>>> 0x0000000001d22ae0, async info ptr = 0x00007ffe73ea2728
> >>>>>>>>>> CUDA error: an illegal memory access was encountered
> >>>>>>>>>> Libomptarget error: Failed to synchronize device.
> >>>>>>>>>> Libomptarget error: Call to targetDataEnd failed, abort target.
> >>>>>>>>>> Libomptarget error: Failed to process data after launching the
> >>> kernel.
> >>>>>>>>>> Libomptarget error: Source location information not present.
> >>> Compile
> >>>>>>>>>> with -g or -gline-tables-only.
> >>>>>>>>>> Libomptarget fatal error 1: failure of target construct while
> >>>>>>>>>> offloading is mandatory
> >>>>>>>>>> /var/spool/parastation/jobs/8941317: line 23: 20812 Aborted
> >>>>>>>>>>         (core dumped) ./a.out
> >>>>>>>>>>
> >>>>>>>>>> On Sun, Feb 28, 2021 at 10:35 AM Alexey Bataev <
> >>> a.bataev at hotmail.com> wrote:
> >>>>>>>>>>> Do not call __tgt_register_requires directly, this is the
> >>> internal function called by global constructor and its arg value
> depends on
> >>> #pragma omp requires. Use just this pragma.
> >>>>>>>>>>> Best regards,
> >>>>>>>>>>> Alexey Bataev
> >>>>>>>>>>>
> >>>>>>>>>>>> 27 февр. 2021 г., в 20:28, Itaru Kitayama via Openmp-dev <
> >>> openmp-dev at lists.llvm.org> написал(а):
> >>>>>>>>>>>> I'm trying to build a test C++ code that uses part of
> >>>>>>>>>>>> unified_shared_memory/shared_update.c
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Sun, Feb 28, 2021 at 10:25 AM Johannes Doerfert
> >>>>>>>>>>>>> <johannesdoerfert at gmail.com> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> I don't see this test, nor do I understand what you are
> trying
> >>> to say.
> >>>>>>>>>>>>> Is the test failing? If so, which test is this?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> ~ Johannes
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> On 2/27/21 7:17 PM, Itaru Kitayama via Openmp-dev wrote:
> >>>>>>>>>>>>>> The below C++ code builds, but the executable fails at
> >>> runtime.
> >>>>>>>>>>>>>> (It is taken from the C code under the libomptarget subdir's
> >>> test directory)
> >>>>>>>>>>>>>> #include <omp.h>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> #pragma omp requires unified_shared_memory
> >>>>>>>>>>>>>> #define N 1024
> >>>>>>>>>>>>>> extern "C" void __tgt_register_requires(int64_t);
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> int main() {
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>      int a[N] = {0};
> >>>>>>>>>>>>>>      int b[N] = {0};
> >>>>>>>>>>>>>>      int *device_data;
> >>>>>>>>>>>>>>      __tgt_register_requires(1);
> >>>>>>>>>>>>>> #pragma omp target map(tofrom : device_data)
> >>>>>>>>>>>>>>      {
> >>>>>>>>>>>>>>        device_data = &a[0];
> >>>>>>>>>>>>>>        for (int i = 0; i < 1024; i++) {
> >>>>>>>>>>>>>>          a[i] += 1;
> >>>>>>>>>>>>>>        }
> >>>>>>>>>>>>>>      }
> >>>>>>>>>>>>>> }
> >>>>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>>>> Openmp-dev mailing list
> >>>>>>>>>>>>>> Openmp-dev at lists.llvm.org
> >>>>>>>>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
> >>>>>>>>>>>> _______________________________________________
> >>>>>>>>>>>> Openmp-dev mailing list
> >>>>>>>>>>>> Openmp-dev at lists.llvm.org
> >>>>>>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
> >>>>>>>> _______________________________________________
> >>>>>>>> Openmp-dev mailing list
> >>>>>>>> Openmp-dev at lists.llvm.org
> >>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
> >>>>>>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20210302/d1fb7b43/attachment-0001.html>


More information about the Openmp-dev mailing list