[Openmp-dev] Libomptarget fatal error 1: '#pragma omp requires unified_shared_memory' not used consistently!

Johannes Doerfert via Openmp-dev openmp-dev at lists.llvm.org
Mon Mar 1 14:39:46 PST 2021


Don't use required USM for now I would assume.

On 3/1/21 4:35 PM, Itaru Kitayama wrote:
> I’m on JURECA and some nodes are attached to A100 GPUs.
>
> On Tue, Mar 2, 2021 at 7:34 Itaru Kitayama <itaru.kitayama at gmail.com> wrote:
>
>> Hi all,
>> In the mean time, what do I do?
>>
>> On Tue, Mar 2, 2021 at 3:23 Johannes Doerfert <johannesdoerfert at gmail.com>
>> wrote:
>>
>>> I think that is it. I heard of problems with our USM before.
>>> We need to use the managed allocators if USM is active, they are
>>> about to be upstreamed (I hope).
>>>
>>>
>>> On 3/1/21 12:15 PM, Alexey.Bataev wrote:
>>>> Looks like this example is for Explicit USM and I assume if you allocate
>>>> the memory for a in managed memory explicitly, the OpenMP example also
>>>> should work.
>>>>
>>>> There are other USM modes though, where the memory is shared implicitly
>>>> between the host and the devices. Looks like currently LLVM
>>>> implementation relies on this thing
>>>>
>>> https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-system-allocator
>>>> where Implicit USM is supported.
>>>>
>>>> -------------
>>>> Best regards,
>>>> Alexey Bataev
>>>>
>>>> 3/1/2021 1:04 PM, Joachim Protze пишет:
>>>>> Are the Kernel/Hardware requirements llvm specific?
>>>>>
>>>>> I can compile and execute the add_grid.cu example sucessfully:
>>>>> https://developer.nvidia.com/blog/unified-memory-cuda-beginners/
>>>>>
>>>>> So, I would expect that an OpenMP program should also run sucessfully.
>>>>>
>>>>> - Joachim
>>>>>
>>>>>
>>>>> Am 01.03.21 um 18:49 schrieb Alexey.Bataev:
>>>>>> Hi, I you sure that you system supports Unified Shared Memory? As far
>>> as
>>>>>> I know it requires special linux kernel and the hardware must support
>>>>>> it. If you system does not support it, the code will crash for sure at
>>>>>> the runtime.
>>>>>>
>>>>>> In this mode, IIRC, we just ignore map clauses since the accelerator
>>>>>> devices can access the host memory directly without the need for
>>>>>> allocating the device-specific memory.
>>>>>>
>>>>>>
>>>>>> -------------
>>>>>> Best regards,
>>>>>> Alexey Bataev
>>>>>>
>>>>>> 3/1/2021 12:41 PM, Joachim Protze пишет:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> even a more simple example segfaults, when the requires directive is
>>> there:
>>>>>>> #include <iostream>
>>>>>>> #include <omp.h>
>>>>>>> #include <stdio.h>
>>>>>>>
>>>>>>> #pragma omp requires unified_shared_memory
>>>>>>> #define N 1024
>>>>>>>
>>>>>>> int main() {
>>>>>>>     int a[N];
>>>>>>>     printf("a=%p\n", a);
>>>>>>> #pragma omp target map(tofrom : a[0:N])
>>>>>>>     {
>>>>>>>       printf("a=%p\n", a);
>>>>>>>       for (int i = 0; i < 1024; i++) {
>>>>>>>         a[i] = 123;
>>>>>>>       }
>>>>>>>     }
>>>>>>>     printf("a[0]=%i, a[%i]=%i\n", a[0], N/2, a[N/2]);
>>>>>>> }
>>>>>>>
>>>>>>> The code runs sucessfully when the requires directive is removed
>>> because
>>>>>>> the mapping of `a` is explicitly specified.
>>>>>>>
>>>>>>> For this code to run successfully, would it be necessary to allocate
>>> `a`
>>>>>>> specially as cuda managed memory? I don't see any special treatment
>>> of
>>>>>>> `a` in llvm ir. As I understand the OpenMP spec, the requires
>>> directive
>>>>>>> should lead to a compile error if clang fails to generate such code.
>>>>>>>
>>>>>>> The requires example from the OpenMP Examples also fails with the
>>> same
>>>>>>> runtime error:
>>>>>>>
>>>>>>>
>>> https://github.com/OpenMP/Examples/blob/main/sources/Example_requires.1.cpp
>>>>>>> - Joachim
>>>>>>>
>>>>>>> Am 28.02.21 um 11:12 schrieb Itaru Kitayama via Openmp-dev:
>>>>>>>> This is the code:
>>>>>>>>
>>>>>>>> #include <iostream>
>>>>>>>> #include <omp.h>
>>>>>>>>
>>>>>>>> #pragma omp requires unified_shared_memory
>>>>>>>> #define N 1024
>>>>>>>>
>>>>>>>> int main() {
>>>>>>>>     int a[N] = {0};
>>>>>>>>     int *device_data =  new int[N];
>>>>>>>> #pragma omp target map(tofrom : device_data[0:N])
>>>>>>>>     {
>>>>>>>>       device_data = &a[0];
>>>>>>>>       for (int i = 0; i < 1024; i++) {
>>>>>>>>         device_data[i] = 123;
>>>>>>>>       }
>>>>>>>>     }
>>>>>>>>     std::cout << a[0] << std::endl;
>>>>>>>> }
>>>>>>>>
>>>>>>>> On Sun, Feb 28, 2021 at 1:34 PM Johannes Doerfert
>>>>>>>> <johannesdoerfert at gmail.com> wrote:
>>>>>>>>> You have an illegal memory access, some memory is not properly
>>>>>>>>> mapped.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2/27/21 7:47 PM, Itaru Kitayama wrote:
>>>>>>>>>> Removed the internal function, but I get:
>>>>>>>>>>
>>>>>>>>>> CUDA device 0 info: Device supports up to 65536 CUDA blocks and
>>> 1024
>>>>>>>>>> threads with a warp size of 32
>>>>>>>>>> CUDA device 0 info: Launching kernel
>>>>>>>>>> __omp_offloading_34_8009dd23_main_l12 with 1 blocks and 33
>>> threads in
>>>>>>>>>> Generic mode
>>>>>>>>>> CUDA error: Error when synchronizing stream. stream =
>>>>>>>>>> 0x0000000001d22ae0, async info ptr = 0x00007ffe73ea2728
>>>>>>>>>> CUDA error: an illegal memory access was encountered
>>>>>>>>>> Libomptarget error: Failed to synchronize device.
>>>>>>>>>> Libomptarget error: Call to targetDataEnd failed, abort target.
>>>>>>>>>> Libomptarget error: Failed to process data after launching the
>>> kernel.
>>>>>>>>>> Libomptarget error: Source location information not present.
>>> Compile
>>>>>>>>>> with -g or -gline-tables-only.
>>>>>>>>>> Libomptarget fatal error 1: failure of target construct while
>>>>>>>>>> offloading is mandatory
>>>>>>>>>> /var/spool/parastation/jobs/8941317: line 23: 20812 Aborted
>>>>>>>>>>         (core dumped) ./a.out
>>>>>>>>>>
>>>>>>>>>> On Sun, Feb 28, 2021 at 10:35 AM Alexey Bataev <
>>> a.bataev at hotmail.com> wrote:
>>>>>>>>>>> Do not call __tgt_register_requires directly, this is the
>>> internal function called by global constructor and its arg value depends on
>>> #pragma omp requires. Use just this pragma.
>>>>>>>>>>> Best regards,
>>>>>>>>>>> Alexey Bataev
>>>>>>>>>>>
>>>>>>>>>>>> 27 февр. 2021 г., в 20:28, Itaru Kitayama via Openmp-dev <
>>> openmp-dev at lists.llvm.org> написал(а):
>>>>>>>>>>>> I'm trying to build a test C++ code that uses part of
>>>>>>>>>>>> unified_shared_memory/shared_update.c
>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Feb 28, 2021 at 10:25 AM Johannes Doerfert
>>>>>>>>>>>>> <johannesdoerfert at gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> I don't see this test, nor do I understand what you are trying
>>> to say.
>>>>>>>>>>>>> Is the test failing? If so, which test is this?
>>>>>>>>>>>>>
>>>>>>>>>>>>> ~ Johannes
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 2/27/21 7:17 PM, Itaru Kitayama via Openmp-dev wrote:
>>>>>>>>>>>>>> The below C++ code builds, but the executable fails at
>>> runtime.
>>>>>>>>>>>>>> (It is taken from the C code under the libomptarget subdir's
>>> test directory)
>>>>>>>>>>>>>> #include <omp.h>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> #pragma omp requires unified_shared_memory
>>>>>>>>>>>>>> #define N 1024
>>>>>>>>>>>>>> extern "C" void __tgt_register_requires(int64_t);
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> int main() {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>      int a[N] = {0};
>>>>>>>>>>>>>>      int b[N] = {0};
>>>>>>>>>>>>>>      int *device_data;
>>>>>>>>>>>>>>      __tgt_register_requires(1);
>>>>>>>>>>>>>> #pragma omp target map(tofrom : device_data)
>>>>>>>>>>>>>>      {
>>>>>>>>>>>>>>        device_data = &a[0];
>>>>>>>>>>>>>>        for (int i = 0; i < 1024; i++) {
>>>>>>>>>>>>>>          a[i] += 1;
>>>>>>>>>>>>>>        }
>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> Openmp-dev mailing list
>>>>>>>>>>>>>> Openmp-dev at lists.llvm.org
>>>>>>>>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> Openmp-dev mailing list
>>>>>>>>>>>> Openmp-dev at lists.llvm.org
>>>>>>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>>>>>>>> _______________________________________________
>>>>>>>> Openmp-dev mailing list
>>>>>>>> Openmp-dev at lists.llvm.org
>>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>>>>>>>>


More information about the Openmp-dev mailing list