[Openmp-dev] Libomptarget fatal error 1: '#pragma omp requires unified_shared_memory' not used consistently!

Mon Mar 1 14:50:33 PST 2021

I’ll ask the JURECA’s admins at JSC.

On Tue, Mar 2, 2021 at 7:49 Ye Luo <xw111luoye at gmail.com> wrote:

> Are you sure your machine/OS supports USM? Summit like nodes P9+V100 are
> the only architecture with NVIDIA GPU which supports USM that I'm aware of.
> Ye
> ===================
> Ye Luo, Ph.D.
> Computational Science Division & Leadership Computing Facility
> Argonne National Laboratory
>
>
> On Mon, Mar 1, 2021 at 4:35 PM Itaru Kitayama via Openmp-dev <
> openmp-dev at lists.llvm.org> wrote:
>
>> I’m on JURECA and some nodes are attached to A100 GPUs.
>>
>> On Tue, Mar 2, 2021 at 7:34 Itaru Kitayama <itaru.kitayama at gmail.com>
>> wrote:
>>
>>> Hi all,
>>> In the mean time, what do I do?
>>>
>>> On Tue, Mar 2, 2021 at 3:23 Johannes Doerfert <
>>> johannesdoerfert at gmail.com> wrote:
>>>
>>>> I think that is it. I heard of problems with our USM before.
>>>> We need to use the managed allocators if USM is active, they are
>>>> about to be upstreamed (I hope).
>>>>
>>>>
>>>> On 3/1/21 12:15 PM, Alexey.Bataev wrote:
>>>> > Looks like this example is for Explicit USM and I assume if you
>>>> allocate
>>>> > the memory for a in managed memory explicitly, the OpenMP example also
>>>> > should work.
>>>> >
>>>> > There are other USM modes though, where the memory is shared
>>>> implicitly
>>>> > between the host and the devices. Looks like currently LLVM
>>>> > implementation relies on this thing
>>>> >
>>>> https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-system-allocator
>>>> > where Implicit USM is supported.
>>>> >
>>>> > -------------
>>>> > Best regards,
>>>> > Alexey Bataev
>>>> >
>>>> > 3/1/2021 1:04 PM, Joachim Protze пишет:
>>>> >> Are the Kernel/Hardware requirements llvm specific?
>>>> >>
>>>> >> I can compile and execute the add_grid.cu example sucessfully:
>>>> >> https://developer.nvidia.com/blog/unified-memory-cuda-beginners/
>>>> >>
>>>> >> So, I would expect that an OpenMP program should also run
>>>> sucessfully.
>>>> >>
>>>> >> - Joachim
>>>> >>
>>>> >>
>>>> >> Am 01.03.21 um 18:49 schrieb Alexey.Bataev:
>>>> >>> Hi, I you sure that you system supports Unified Shared Memory? As
>>>> far as
>>>> >>> I know it requires special linux kernel and the hardware must
>>>> support
>>>> >>> it. If you system does not support it, the code will crash for sure
>>>> at
>>>> >>> the runtime.
>>>> >>>
>>>> >>> In this mode, IIRC, we just ignore map clauses since the accelerator
>>>> >>> devices can access the host memory directly without the need for
>>>> >>> allocating the device-specific memory.
>>>> >>>
>>>> >>>
>>>> >>> -------------
>>>> >>> Best regards,
>>>> >>> Alexey Bataev
>>>> >>>
>>>> >>> 3/1/2021 12:41 PM, Joachim Protze пишет:
>>>> >>>> Hi all,
>>>> >>>>
>>>> >>>> even a more simple example segfaults, when the requires directive
>>>> is there:
>>>> >>>>
>>>> >>>> #include <iostream>
>>>> >>>> #include <omp.h>
>>>> >>>> #include <stdio.h>
>>>> >>>>
>>>> >>>> #pragma omp requires unified_shared_memory
>>>> >>>> #define N 1024
>>>> >>>>
>>>> >>>> int main() {
>>>> >>>>    int a[N];
>>>> >>>>    printf("a=%p\n", a);
>>>> >>>> #pragma omp target map(tofrom : a[0:N])
>>>> >>>>    {
>>>> >>>>      printf("a=%p\n", a);
>>>> >>>>      for (int i = 0; i < 1024; i++) {
>>>> >>>>        a[i] = 123;
>>>> >>>>      }
>>>> >>>>    }
>>>> >>>>    printf("a[0]=%i, a[%i]=%i\n", a[0], N/2, a[N/2]);
>>>> >>>> }
>>>> >>>>
>>>> >>>> The code runs sucessfully when the requires directive is removed
>>>> because
>>>> >>>> the mapping of `a` is explicitly specified.
>>>> >>>>
>>>> >>>> For this code to run successfully, would it be necessary to
>>>> allocate `a`
>>>> >>>> specially as cuda managed memory? I don't see any special
>>>> treatment of
>>>> >>>> `a` in llvm ir. As I understand the OpenMP spec, the requires
>>>> directive
>>>> >>>> should lead to a compile error if clang fails to generate such
>>>> code.
>>>> >>>>
>>>> >>>> The requires example from the OpenMP Examples also fails with the
>>>> same
>>>> >>>> runtime error:
>>>> >>>>
>>>> >>>>
>>>> https://github.com/OpenMP/Examples/blob/main/sources/Example_requires.1.cpp
>>>> >>>>
>>>> >>>> - Joachim
>>>> >>>>
>>>> >>>> Am 28.02.21 um 11:12 schrieb Itaru Kitayama via Openmp-dev:
>>>> >>>>> This is the code:
>>>> >>>>>
>>>> >>>>> #include <iostream>
>>>> >>>>> #include <omp.h>
>>>> >>>>>
>>>> >>>>> #pragma omp requires unified_shared_memory
>>>> >>>>> #define N 1024
>>>> >>>>>
>>>> >>>>> int main() {
>>>> >>>>>    int a[N] = {0};
>>>> >>>>>    int *device_data =  new int[N];
>>>> >>>>> #pragma omp target map(tofrom : device_data[0:N])
>>>> >>>>>    {
>>>> >>>>>      device_data = &a[0];
>>>> >>>>>      for (int i = 0; i < 1024; i++) {
>>>> >>>>>        device_data[i] = 123;
>>>> >>>>>      }
>>>> >>>>>    }
>>>> >>>>>    std::cout << a[0] << std::endl;
>>>> >>>>> }
>>>> >>>>>
>>>> >>>>> On Sun, Feb 28, 2021 at 1:34 PM Johannes Doerfert
>>>> >>>>> <johannesdoerfert at gmail.com> wrote:
>>>> >>>>>> You have an illegal memory access, some memory is not properly
>>>> >>>>>> mapped.
>>>> >>>>>>
>>>> >>>>>>
>>>> >>>>>> On 2/27/21 7:47 PM, Itaru Kitayama wrote:
>>>> >>>>>>> Removed the internal function, but I get:
>>>> >>>>>>>
>>>> >>>>>>> CUDA device 0 info: Device supports up to 65536 CUDA blocks and
>>>> 1024
>>>> >>>>>>> threads with a warp size of 32
>>>> >>>>>>> CUDA device 0 info: Launching kernel
>>>> >>>>>>> __omp_offloading_34_8009dd23_main_l12 with 1 blocks and 33
>>>> threads in
>>>> >>>>>>> Generic mode
>>>> >>>>>>> CUDA error: Error when synchronizing stream. stream =
>>>> >>>>>>> 0x0000000001d22ae0, async info ptr = 0x00007ffe73ea2728
>>>> >>>>>>> CUDA error: an illegal memory access was encountered
>>>> >>>>>>> Libomptarget error: Failed to synchronize device.
>>>> >>>>>>> Libomptarget error: Call to targetDataEnd failed, abort target.
>>>> >>>>>>> Libomptarget error: Failed to process data after launching the
>>>> kernel.
>>>> >>>>>>> Libomptarget error: Source location information not present.
>>>> Compile
>>>> >>>>>>> with -g or -gline-tables-only.
>>>> >>>>>>> Libomptarget fatal error 1: failure of target construct while
>>>> >>>>>>> offloading is mandatory
>>>> >>>>>>> /var/spool/parastation/jobs/8941317: line 23: 20812 Aborted
>>>> >>>>>>>        (core dumped) ./a.out
>>>> >>>>>>>
>>>> >>>>>>> On Sun, Feb 28, 2021 at 10:35 AM Alexey Bataev <
>>>> a.bataev at hotmail.com> wrote:
>>>> >>>>>>>> Do not call __tgt_register_requires directly, this is the
>>>> internal function called by global constructor and its arg value depends on
>>>> #pragma omp requires. Use just this pragma.
>>>> >>>>>>>>
>>>> >>>>>>>> Best regards,
>>>> >>>>>>>> Alexey Bataev
>>>> >>>>>>>>
>>>> >>>>>>>>> 27 февр. 2021 г., в 20:28, Itaru Kitayama via Openmp-dev <
>>>> openmp-dev at lists.llvm.org> написал(а):
>>>> >>>>>>>>>
>>>> >>>>>>>>> I'm trying to build a test C++ code that uses part of
>>>> >>>>>>>>> unified_shared_memory/shared_update.c
>>>> >>>>>>>>>
>>>> >>>>>>>>>> On Sun, Feb 28, 2021 at 10:25 AM Johannes Doerfert
>>>> >>>>>>>>>> <johannesdoerfert at gmail.com> wrote:
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> I don't see this test, nor do I understand what you are
>>>> trying to say.
>>>> >>>>>>>>>> Is the test failing? If so, which test is this?
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> ~ Johannes
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>
>>>> >>>>>>>>>>> On 2/27/21 7:17 PM, Itaru Kitayama via Openmp-dev wrote:
>>>> >>>>>>>>>>> The below C++ code builds, but the executable fails at
>>>> runtime.
>>>> >>>>>>>>>>> (It is taken from the C code under the libomptarget
>>>> subdir's test directory)
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> #include <omp.h>
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> #pragma omp requires unified_shared_memory
>>>> >>>>>>>>>>> #define N 1024
>>>> >>>>>>>>>>> extern "C" void __tgt_register_requires(int64_t);
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>> int main() {
>>>> >>>>>>>>>>>
>>>> >>>>>>>>>>>     int a[N] = {0};
>>>> >>>>>>>>>>>     int b[N] = {0};
>>>> >>>>>>>>>>>     int *device_data;
>>>> >>>>>>>>>>>     __tgt_register_requires(1);
>>>> >>>>>>>>>>> #pragma omp target map(tofrom : device_data)
>>>> >>>>>>>>>>>     {
>>>> >>>>>>>>>>>       device_data = &a[0];
>>>> >>>>>>>>>>>       for (int i = 0; i < 1024; i++) {
>>>> >>>>>>>>>>>         a[i] += 1;
>>>> >>>>>>>>>>>       }
>>>> >>>>>>>>>>>     }
>>>> >>>>>>>>>>> }
>>>> >>>>>>>>>>> _______________________________________________
>>>> >>>>>>>>>>> Openmp-dev mailing list
>>>> >>>>>>>>>>> Openmp-dev at lists.llvm.org
>>>> >>>>>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>>>> >>>>>>>>> _______________________________________________
>>>> >>>>>>>>> Openmp-dev mailing list
>>>> >>>>>>>>> Openmp-dev at lists.llvm.org
>>>> >>>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>>>> >>>>> _______________________________________________
>>>> >>>>> Openmp-dev mailing list
>>>> >>>>> Openmp-dev at lists.llvm.org
>>>> >>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>>>> >>>>>
>>>> >
>>>>
>>> _______________________________________________
>> Openmp-dev mailing list
>> Openmp-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20210302/0c0577e7/attachment-0001.html>