[Openmp-dev] Libomptarget fatal error 1: '#pragma omp requires unified_shared_memory' not used consistently!

Joachim Protze via Openmp-dev openmp-dev at lists.llvm.org
Mon Mar 1 09:41:11 PST 2021


Hi all,

even a more simple example segfaults, when the requires directive is there:

#include <iostream>
#include <omp.h>
#include <stdio.h>

#pragma omp requires unified_shared_memory
#define N 1024

int main() {
  int a[N];
  printf("a=%p\n", a);
#pragma omp target map(tofrom : a[0:N])
  {
    printf("a=%p\n", a);
    for (int i = 0; i < 1024; i++) {
      a[i] = 123;
    }
  }
  printf("a[0]=%i, a[%i]=%i\n", a[0], N/2, a[N/2]);
}

The code runs sucessfully when the requires directive is removed because
the mapping of `a` is explicitly specified.

For this code to run successfully, would it be necessary to allocate `a`
specially as cuda managed memory? I don't see any special treatment of
`a` in llvm ir. As I understand the OpenMP spec, the requires directive
should lead to a compile error if clang fails to generate such code.

The requires example from the OpenMP Examples also fails with the same
runtime error:

https://github.com/OpenMP/Examples/blob/main/sources/Example_requires.1.cpp

- Joachim

Am 28.02.21 um 11:12 schrieb Itaru Kitayama via Openmp-dev:
> This is the code:
> 
> #include <iostream>
> #include <omp.h>
> 
> #pragma omp requires unified_shared_memory
> #define N 1024
> 
> int main() {
>   int a[N] = {0};
>   int *device_data =  new int[N];
> #pragma omp target map(tofrom : device_data[0:N])
>   {
>     device_data = &a[0];
>     for (int i = 0; i < 1024; i++) {
>       device_data[i] = 123;
>     }
>   }
>   std::cout << a[0] << std::endl;
> }
> 
> On Sun, Feb 28, 2021 at 1:34 PM Johannes Doerfert
> <johannesdoerfert at gmail.com> wrote:
>>
>> You have an illegal memory access, some memory is not properly
>> mapped.
>>
>>
>> On 2/27/21 7:47 PM, Itaru Kitayama wrote:
>>> Removed the internal function, but I get:
>>>
>>> CUDA device 0 info: Device supports up to 65536 CUDA blocks and 1024
>>> threads with a warp size of 32
>>> CUDA device 0 info: Launching kernel
>>> __omp_offloading_34_8009dd23_main_l12 with 1 blocks and 33 threads in
>>> Generic mode
>>> CUDA error: Error when synchronizing stream. stream =
>>> 0x0000000001d22ae0, async info ptr = 0x00007ffe73ea2728
>>> CUDA error: an illegal memory access was encountered
>>> Libomptarget error: Failed to synchronize device.
>>> Libomptarget error: Call to targetDataEnd failed, abort target.
>>> Libomptarget error: Failed to process data after launching the kernel.
>>> Libomptarget error: Source location information not present. Compile
>>> with -g or -gline-tables-only.
>>> Libomptarget fatal error 1: failure of target construct while
>>> offloading is mandatory
>>> /var/spool/parastation/jobs/8941317: line 23: 20812 Aborted
>>>       (core dumped) ./a.out
>>>
>>> On Sun, Feb 28, 2021 at 10:35 AM Alexey Bataev <a.bataev at hotmail.com> wrote:
>>>> Do not call __tgt_register_requires directly, this is the internal function called by global constructor and its arg value depends on #pragma omp requires. Use just this pragma.
>>>>
>>>> Best regards,
>>>> Alexey Bataev
>>>>
>>>>> 27 февр. 2021 г., в 20:28, Itaru Kitayama via Openmp-dev <openmp-dev at lists.llvm.org> написал(а):
>>>>>
>>>>> I'm trying to build a test C++ code that uses part of
>>>>> unified_shared_memory/shared_update.c
>>>>>
>>>>>> On Sun, Feb 28, 2021 at 10:25 AM Johannes Doerfert
>>>>>> <johannesdoerfert at gmail.com> wrote:
>>>>>>
>>>>>> I don't see this test, nor do I understand what you are trying to say.
>>>>>> Is the test failing? If so, which test is this?
>>>>>>
>>>>>> ~ Johannes
>>>>>>
>>>>>>
>>>>>>> On 2/27/21 7:17 PM, Itaru Kitayama via Openmp-dev wrote:
>>>>>>> The below C++ code builds, but the executable fails at runtime.
>>>>>>> (It is taken from the C code under the libomptarget subdir's test directory)
>>>>>>>
>>>>>>> #include <omp.h>
>>>>>>>
>>>>>>> #pragma omp requires unified_shared_memory
>>>>>>> #define N 1024
>>>>>>> extern "C" void __tgt_register_requires(int64_t);
>>>>>>>
>>>>>>> int main() {
>>>>>>>
>>>>>>>    int a[N] = {0};
>>>>>>>    int b[N] = {0};
>>>>>>>    int *device_data;
>>>>>>>    __tgt_register_requires(1);
>>>>>>> #pragma omp target map(tofrom : device_data)
>>>>>>>    {
>>>>>>>      device_data = &a[0];
>>>>>>>      for (int i = 0; i < 1024; i++) {
>>>>>>>        a[i] += 1;
>>>>>>>      }
>>>>>>>    }
>>>>>>> }
>>>>>>> _______________________________________________
>>>>>>> Openmp-dev mailing list
>>>>>>> Openmp-dev at lists.llvm.org
>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>>>>> _______________________________________________
>>>>> Openmp-dev mailing list
>>>>> Openmp-dev at lists.llvm.org
>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
> _______________________________________________
> Openmp-dev mailing list
> Openmp-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
> 



More information about the Openmp-dev mailing list