[Openmp-dev] Libomptarget fatal error 1: '#pragma omp requires unified_shared_memory' not used consistently!

Mon Mar 1 14:49:01 PST 2021

Are you sure your machine/OS supports USM? Summit like nodes P9+V100 are
the only architecture with NVIDIA GPU which supports USM that I'm aware of.
Ye
===================
Ye Luo, Ph.D.
Computational Science Division & Leadership Computing Facility
Argonne National Laboratory

On Mon, Mar 1, 2021 at 4:35 PM Itaru Kitayama via Openmp-dev <
openmp-dev at lists.llvm.org> wrote:

> I’m on JURECA and some nodes are attached to A100 GPUs.
>
> On Tue, Mar 2, 2021 at 7:34 Itaru Kitayama <itaru.kitayama at gmail.com>
> wrote:
>
>> Hi all,
>> In the mean time, what do I do?
>>
>> On Tue, Mar 2, 2021 at 3:23 Johannes Doerfert <johannesdoerfert at gmail.com>
>> wrote:
>>
>>> I think that is it. I heard of problems with our USM before.
>>> We need to use the managed allocators if USM is active, they are
>>> about to be upstreamed (I hope).
>>>
>>>
>>> On 3/1/21 12:15 PM, Alexey.Bataev wrote:
>>> > Looks like this example is for Explicit USM and I assume if you
>>> allocate
>>> > the memory for a in managed memory explicitly, the OpenMP example also
>>> > should work.
>>> >
>>> > There are other USM modes though, where the memory is shared implicitly
>>> > between the host and the devices. Looks like currently LLVM
>>> > implementation relies on this thing
>>> >
>>> https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-system-allocator
>>> > where Implicit USM is supported.
>>> >
>>> > -------------
>>> > Best regards,
>>> > Alexey Bataev
>>> >
>>> > 3/1/2021 1:04 PM, Joachim Protze пишет:
>>> >> Are the Kernel/Hardware requirements llvm specific?
>>> >>
>>> >> I can compile and execute the add_grid.cu example sucessfully:
>>> >> https://developer.nvidia.com/blog/unified-memory-cuda-beginners/
>>> >>
>>> >> So, I would expect that an OpenMP program should also run sucessfully.
>>> >>
>>> >> - Joachim
>>> >>
>>> >>
>>> >> Am 01.03.21 um 18:49 schrieb Alexey.Bataev:
>>> >>> Hi, I you sure that you system supports Unified Shared Memory? As
>>> far as
>>> >>> I know it requires special linux kernel and the hardware must support
>>> >>> it. If you system does not support it, the code will crash for sure
>>> at
>>> >>> the runtime.
>>> >>>
>>> >>> In this mode, IIRC, we just ignore map clauses since the accelerator
>>> >>> devices can access the host memory directly without the need for
>>> >>> allocating the device-specific memory.
>>> >>>
>>> >>>
>>> >>> -------------
>>> >>> Best regards,
>>> >>> Alexey Bataev
>>> >>>
>>> >>> 3/1/2021 12:41 PM, Joachim Protze пишет:
>>> >>>> Hi all,
>>> >>>>
>>> >>>> even a more simple example segfaults, when the requires directive
>>> is there:
>>> >>>>
>>> >>>> #include <iostream>
>>> >>>> #include <omp.h>
>>> >>>> #include <stdio.h>
>>> >>>>
>>> >>>> #pragma omp requires unified_shared_memory
>>> >>>> #define N 1024
>>> >>>>
>>> >>>> int main() {
>>> >>>>    int a[N];
>>> >>>>    printf("a=%p\n", a);
>>> >>>> #pragma omp target map(tofrom : a[0:N])
>>> >>>>    {
>>> >>>>      printf("a=%p\n", a);
>>> >>>>      for (int i = 0; i < 1024; i++) {
>>> >>>>        a[i] = 123;
>>> >>>>      }
>>> >>>>    }
>>> >>>>    printf("a[0]=%i, a[%i]=%i\n", a[0], N/2, a[N/2]);
>>> >>>> }
>>> >>>>
>>> >>>> The code runs sucessfully when the requires directive is removed
>>> because
>>> >>>> the mapping of `a` is explicitly specified.
>>> >>>>
>>> >>>> For this code to run successfully, would it be necessary to
>>> allocate `a`
>>> >>>> specially as cuda managed memory? I don't see any special treatment
>>> of
>>> >>>> `a` in llvm ir. As I understand the OpenMP spec, the requires
>>> directive
>>> >>>> should lead to a compile error if clang fails to generate such code.
>>> >>>>
>>> >>>> The requires example from the OpenMP Examples also fails with the
>>> same
>>> >>>> runtime error:
>>> >>>>
>>> >>>>
>>> https://github.com/OpenMP/Examples/blob/main/sources/Example_requires.1.cpp
>>> >>>>
>>> >>>> - Joachim
>>> >>>>
>>> >>>> Am 28.02.21 um 11:12 schrieb Itaru Kitayama via Openmp-dev:
>>> >>>>> This is the code:
>>> >>>>>
>>> >>>>> #include <iostream>
>>> >>>>> #include <omp.h>
>>> >>>>>
>>> >>>>> #pragma omp requires unified_shared_memory
>>> >>>>> #define N 1024
>>> >>>>>
>>> >>>>> int main() {
>>> >>>>>    int a[N] = {0};
>>> >>>>>    int *device_data =  new int[N];
>>> >>>>> #pragma omp target map(tofrom : device_data[0:N])
>>> >>>>>    {
>>> >>>>>      device_data = &a[0];
>>> >>>>>      for (int i = 0; i < 1024; i++) {
>>> >>>>>        device_data[i] = 123;
>>> >>>>>      }
>>> >>>>>    }
>>> >>>>>    std::cout << a[0] << std::endl;
>>> >>>>> }
>>> >>>>>
>>> >>>>> On Sun, Feb 28, 2021 at 1:34 PM Johannes Doerfert
>>> >>>>> <johannesdoerfert at gmail.com> wrote:
>>> >>>>>> You have an illegal memory access, some memory is not properly
>>> >>>>>> mapped.
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> On 2/27/21 7:47 PM, Itaru Kitayama wrote:
>>> >>>>>>> Removed the internal function, but I get:
>>> >>>>>>>
>>> >>>>>>> CUDA device 0 info: Device supports up to 65536 CUDA blocks and
>>> 1024
>>> >>>>>>> threads with a warp size of 32
>>> >>>>>>> CUDA device 0 info: Launching kernel
>>> >>>>>>> __omp_offloading_34_8009dd23_main_l12 with 1 blocks and 33
>>> threads in
>>> >>>>>>> Generic mode
>>> >>>>>>> CUDA error: Error when synchronizing stream. stream =
>>> >>>>>>> 0x0000000001d22ae0, async info ptr = 0x00007ffe73ea2728
>>> >>>>>>> CUDA error: an illegal memory access was encountered
>>> >>>>>>> Libomptarget error: Failed to synchronize device.
>>> >>>>>>> Libomptarget error: Call to targetDataEnd failed, abort target.
>>> >>>>>>> Libomptarget error: Failed to process data after launching the
>>> kernel.
>>> >>>>>>> Libomptarget error: Source location information not present.
>>> Compile
>>> >>>>>>> with -g or -gline-tables-only.
>>> >>>>>>> Libomptarget fatal error 1: failure of target construct while
>>> >>>>>>> offloading is mandatory
>>> >>>>>>> /var/spool/parastation/jobs/8941317: line 23: 20812 Aborted
>>> >>>>>>>        (core dumped) ./a.out
>>> >>>>>>>
>>> >>>>>>> On Sun, Feb 28, 2021 at 10:35 AM Alexey Bataev <
>>> a.bataev at hotmail.com> wrote:
>>> >>>>>>>> Do not call __tgt_register_requires directly, this is the
>>> internal function called by global constructor and its arg value depends on
>>> #pragma omp requires. Use just this pragma.
>>> >>>>>>>>
>>> >>>>>>>> Best regards,
>>> >>>>>>>> Alexey Bataev
>>> >>>>>>>>
>>> >>>>>>>>> 27 февр. 2021 г., в 20:28, Itaru Kitayama via Openmp-dev <
>>> openmp-dev at lists.llvm.org> написал(а):
>>> >>>>>>>>>
>>> >>>>>>>>> I'm trying to build a test C++ code that uses part of
>>> >>>>>>>>> unified_shared_memory/shared_update.c
>>> >>>>>>>>>
>>> >>>>>>>>>> On Sun, Feb 28, 2021 at 10:25 AM Johannes Doerfert
>>> >>>>>>>>>> <johannesdoerfert at gmail.com> wrote:
>>> >>>>>>>>>>
>>> >>>>>>>>>> I don't see this test, nor do I understand what you are
>>> trying to say.
>>> >>>>>>>>>> Is the test failing? If so, which test is this?
>>> >>>>>>>>>>
>>> >>>>>>>>>> ~ Johannes
>>> >>>>>>>>>>
>>> >>>>>>>>>>
>>> >>>>>>>>>>> On 2/27/21 7:17 PM, Itaru Kitayama via Openmp-dev wrote:
>>> >>>>>>>>>>> The below C++ code builds, but the executable fails at
>>> runtime.
>>> >>>>>>>>>>> (It is taken from the C code under the libomptarget subdir's
>>> test directory)
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> #include <omp.h>
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> #pragma omp requires unified_shared_memory
>>> >>>>>>>>>>> #define N 1024
>>> >>>>>>>>>>> extern "C" void __tgt_register_requires(int64_t);
>>> >>>>>>>>>>>
>>> >>>>>>>>>>> int main() {
>>> >>>>>>>>>>>
>>> >>>>>>>>>>>     int a[N] = {0};
>>> >>>>>>>>>>>     int b[N] = {0};
>>> >>>>>>>>>>>     int *device_data;
>>> >>>>>>>>>>>     __tgt_register_requires(1);
>>> >>>>>>>>>>> #pragma omp target map(tofrom : device_data)
>>> >>>>>>>>>>>     {
>>> >>>>>>>>>>>       device_data = &a[0];
>>> >>>>>>>>>>>       for (int i = 0; i < 1024; i++) {
>>> >>>>>>>>>>>         a[i] += 1;
>>> >>>>>>>>>>>       }
>>> >>>>>>>>>>>     }
>>> >>>>>>>>>>> }
>>> >>>>>>>>>>> _______________________________________________
>>> >>>>>>>>>>> Openmp-dev mailing list
>>> >>>>>>>>>>> Openmp-dev at lists.llvm.org
>>> >>>>>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>>> >>>>>>>>> _______________________________________________
>>> >>>>>>>>> Openmp-dev mailing list
>>> >>>>>>>>> Openmp-dev at lists.llvm.org
>>> >>>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>>> >>>>> _______________________________________________
>>> >>>>> Openmp-dev mailing list
>>> >>>>> Openmp-dev at lists.llvm.org
>>> >>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>>> >>>>>
>>> >
>>>
>> _______________________________________________
> Openmp-dev mailing list
> Openmp-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20210301/64057710/attachment.html>