[Openmp-dev] Libomptarget fatal error 1: '#pragma omp requires unified_shared_memory' not used consistently!

Mon Mar 1 14:34:35 PST 2021

Hi all,
In the mean time, what do I do?

On Tue, Mar 2, 2021 at 3:23 Johannes Doerfert <johannesdoerfert at gmail.com>
wrote:

> I think that is it. I heard of problems with our USM before.
> We need to use the managed allocators if USM is active, they are
> about to be upstreamed (I hope).
>
>
> On 3/1/21 12:15 PM, Alexey.Bataev wrote:
> > Looks like this example is for Explicit USM and I assume if you allocate
> > the memory for a in managed memory explicitly, the OpenMP example also
> > should work.
> >
> > There are other USM modes though, where the memory is shared implicitly
> > between the host and the devices. Looks like currently LLVM
> > implementation relies on this thing
> >
> https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-system-allocator
> > where Implicit USM is supported.
> >
> > -------------
> > Best regards,
> > Alexey Bataev
> >
> > 3/1/2021 1:04 PM, Joachim Protze пишет:
> >> Are the Kernel/Hardware requirements llvm specific?
> >>
> >> I can compile and execute the add_grid.cu example sucessfully:
> >> https://developer.nvidia.com/blog/unified-memory-cuda-beginners/
> >>
> >> So, I would expect that an OpenMP program should also run sucessfully.
> >>
> >> - Joachim
> >>
> >>
> >> Am 01.03.21 um 18:49 schrieb Alexey.Bataev:
> >>> Hi, I you sure that you system supports Unified Shared Memory? As far
> as
> >>> I know it requires special linux kernel and the hardware must support
> >>> it. If you system does not support it, the code will crash for sure at
> >>> the runtime.
> >>>
> >>> In this mode, IIRC, we just ignore map clauses since the accelerator
> >>> devices can access the host memory directly without the need for
> >>> allocating the device-specific memory.
> >>>
> >>>
> >>> -------------
> >>> Best regards,
> >>> Alexey Bataev
> >>>
> >>> 3/1/2021 12:41 PM, Joachim Protze пишет:
> >>>> Hi all,
> >>>>
> >>>> even a more simple example segfaults, when the requires directive is
> there:
> >>>>
> >>>> #include <iostream>
> >>>> #include <omp.h>
> >>>> #include <stdio.h>
> >>>>
> >>>> #pragma omp requires unified_shared_memory
> >>>> #define N 1024
> >>>>
> >>>> int main() {
> >>>>    int a[N];
> >>>>    printf("a=%p\n", a);
> >>>> #pragma omp target map(tofrom : a[0:N])
> >>>>    {
> >>>>      printf("a=%p\n", a);
> >>>>      for (int i = 0; i < 1024; i++) {
> >>>>        a[i] = 123;
> >>>>      }
> >>>>    }
> >>>>    printf("a[0]=%i, a[%i]=%i\n", a[0], N/2, a[N/2]);
> >>>> }
> >>>>
> >>>> The code runs sucessfully when the requires directive is removed
> because
> >>>> the mapping of `a` is explicitly specified.
> >>>>
> >>>> For this code to run successfully, would it be necessary to allocate
> `a`
> >>>> specially as cuda managed memory? I don't see any special treatment of
> >>>> `a` in llvm ir. As I understand the OpenMP spec, the requires
> directive
> >>>> should lead to a compile error if clang fails to generate such code.
> >>>>
> >>>> The requires example from the OpenMP Examples also fails with the same
> >>>> runtime error:
> >>>>
> >>>>
> https://github.com/OpenMP/Examples/blob/main/sources/Example_requires.1.cpp
> >>>>
> >>>> - Joachim
> >>>>
> >>>> Am 28.02.21 um 11:12 schrieb Itaru Kitayama via Openmp-dev:
> >>>>> This is the code:
> >>>>>
> >>>>> #include <iostream>
> >>>>> #include <omp.h>
> >>>>>
> >>>>> #pragma omp requires unified_shared_memory
> >>>>> #define N 1024
> >>>>>
> >>>>> int main() {
> >>>>>    int a[N] = {0};
> >>>>>    int *device_data =  new int[N];
> >>>>> #pragma omp target map(tofrom : device_data[0:N])
> >>>>>    {
> >>>>>      device_data = &a[0];
> >>>>>      for (int i = 0; i < 1024; i++) {
> >>>>>        device_data[i] = 123;
> >>>>>      }
> >>>>>    }
> >>>>>    std::cout << a[0] << std::endl;
> >>>>> }
> >>>>>
> >>>>> On Sun, Feb 28, 2021 at 1:34 PM Johannes Doerfert
> >>>>> <johannesdoerfert at gmail.com> wrote:
> >>>>>> You have an illegal memory access, some memory is not properly
> >>>>>> mapped.
> >>>>>>
> >>>>>>
> >>>>>> On 2/27/21 7:47 PM, Itaru Kitayama wrote:
> >>>>>>> Removed the internal function, but I get:
> >>>>>>>
> >>>>>>> CUDA device 0 info: Device supports up to 65536 CUDA blocks and
> 1024
> >>>>>>> threads with a warp size of 32
> >>>>>>> CUDA device 0 info: Launching kernel
> >>>>>>> __omp_offloading_34_8009dd23_main_l12 with 1 blocks and 33 threads
> in
> >>>>>>> Generic mode
> >>>>>>> CUDA error: Error when synchronizing stream. stream =
> >>>>>>> 0x0000000001d22ae0, async info ptr = 0x00007ffe73ea2728
> >>>>>>> CUDA error: an illegal memory access was encountered
> >>>>>>> Libomptarget error: Failed to synchronize device.
> >>>>>>> Libomptarget error: Call to targetDataEnd failed, abort target.
> >>>>>>> Libomptarget error: Failed to process data after launching the
> kernel.
> >>>>>>> Libomptarget error: Source location information not present.
> Compile
> >>>>>>> with -g or -gline-tables-only.
> >>>>>>> Libomptarget fatal error 1: failure of target construct while
> >>>>>>> offloading is mandatory
> >>>>>>> /var/spool/parastation/jobs/8941317: line 23: 20812 Aborted
> >>>>>>>        (core dumped) ./a.out
> >>>>>>>
> >>>>>>> On Sun, Feb 28, 2021 at 10:35 AM Alexey Bataev <
> a.bataev at hotmail.com> wrote:
> >>>>>>>> Do not call __tgt_register_requires directly, this is the
> internal function called by global constructor and its arg value depends on
> #pragma omp requires. Use just this pragma.
> >>>>>>>>
> >>>>>>>> Best regards,
> >>>>>>>> Alexey Bataev
> >>>>>>>>
> >>>>>>>>> 27 февр. 2021 г., в 20:28, Itaru Kitayama via Openmp-dev <
> openmp-dev at lists.llvm.org> написал(а):
> >>>>>>>>>
> >>>>>>>>> I'm trying to build a test C++ code that uses part of
> >>>>>>>>> unified_shared_memory/shared_update.c
> >>>>>>>>>
> >>>>>>>>>> On Sun, Feb 28, 2021 at 10:25 AM Johannes Doerfert
> >>>>>>>>>> <johannesdoerfert at gmail.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>> I don't see this test, nor do I understand what you are trying
> to say.
> >>>>>>>>>> Is the test failing? If so, which test is this?
> >>>>>>>>>>
> >>>>>>>>>> ~ Johannes
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> On 2/27/21 7:17 PM, Itaru Kitayama via Openmp-dev wrote:
> >>>>>>>>>>> The below C++ code builds, but the executable fails at runtime.
> >>>>>>>>>>> (It is taken from the C code under the libomptarget subdir's
> test directory)
> >>>>>>>>>>>
> >>>>>>>>>>> #include <omp.h>
> >>>>>>>>>>>
> >>>>>>>>>>> #pragma omp requires unified_shared_memory
> >>>>>>>>>>> #define N 1024
> >>>>>>>>>>> extern "C" void __tgt_register_requires(int64_t);
> >>>>>>>>>>>
> >>>>>>>>>>> int main() {
> >>>>>>>>>>>
> >>>>>>>>>>>     int a[N] = {0};
> >>>>>>>>>>>     int b[N] = {0};
> >>>>>>>>>>>     int *device_data;
> >>>>>>>>>>>     __tgt_register_requires(1);
> >>>>>>>>>>> #pragma omp target map(tofrom : device_data)
> >>>>>>>>>>>     {
> >>>>>>>>>>>       device_data = &a[0];
> >>>>>>>>>>>       for (int i = 0; i < 1024; i++) {
> >>>>>>>>>>>         a[i] += 1;
> >>>>>>>>>>>       }
> >>>>>>>>>>>     }
> >>>>>>>>>>> }
> >>>>>>>>>>> _______________________________________________
> >>>>>>>>>>> Openmp-dev mailing list
> >>>>>>>>>>> Openmp-dev at lists.llvm.org
> >>>>>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
> >>>>>>>>> _______________________________________________
> >>>>>>>>> Openmp-dev mailing list
> >>>>>>>>> Openmp-dev at lists.llvm.org
> >>>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
> >>>>> _______________________________________________
> >>>>> Openmp-dev mailing list
> >>>>> Openmp-dev at lists.llvm.org
> >>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
> >>>>>
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20210302/0e5df306/attachment.html>