<div dir="auto">Ok, thanks. But I wonder given there are USM tests in C; no one noticed the errors so far?</div><div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Mar 2, 2021 at 7:39 Johannes Doerfert <<a href="mailto:johannesdoerfert@gmail.com">johannesdoerfert@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Don't use required USM for now I would assume.<br>

<br>

On 3/1/21 4:35 PM, Itaru Kitayama wrote:<br>

> I’m on JURECA and some nodes are attached to A100 GPUs.<br>

><br>

> On Tue, Mar 2, 2021 at 7:34 Itaru Kitayama <<a href="mailto:itaru.kitayama@gmail.com" target="_blank">itaru.kitayama@gmail.com</a>> wrote:<br>

><br>

>> Hi all,<br>

>> In the mean time, what do I do?<br>

>><br>

>> On Tue, Mar 2, 2021 at 3:23 Johannes Doerfert <<a href="mailto:johannesdoerfert@gmail.com" target="_blank">johannesdoerfert@gmail.com</a>><br>

>> wrote:<br>

>><br>

>>> I think that is it. I heard of problems with our USM before.<br>

>>> We need to use the managed allocators if USM is active, they are<br>

>>> about to be upstreamed (I hope).<br>

>>><br>

>>><br>

>>> On 3/1/21 12:15 PM, Alexey.Bataev wrote:<br>

>>>> Looks like this example is for Explicit USM and I assume if you allocate<br>

>>>> the memory for a in managed memory explicitly, the OpenMP example also<br>

>>>> should work.<br>

>>>><br>

>>>> There are other USM modes though, where the memory is shared implicitly<br>

>>>> between the host and the devices. Looks like currently LLVM<br>

>>>> implementation relies on this thing<br>

>>>><br>

>>> <a href="https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-system-allocator" rel="noreferrer" target="_blank">https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-system-allocator</a><br>

>>>> where Implicit USM is supported.<br>

>>>><br>

>>>> -------------<br>

>>>> Best regards,<br>

>>>> Alexey Bataev<br>

>>>><br>

>>>> 3/1/2021 1:04 PM, Joachim Protze пишет:<br>

>>>>> Are the Kernel/Hardware requirements llvm specific?<br>

>>>>><br>

>>>>> I can compile and execute the <a href="http://add_grid.cu" rel="noreferrer" target="_blank">add_grid.cu</a> example sucessfully:<br>

>>>>> <a href="https://developer.nvidia.com/blog/unified-memory-cuda-beginners/" rel="noreferrer" target="_blank">https://developer.nvidia.com/blog/unified-memory-cuda-beginners/</a><br>

>>>>><br>

>>>>> So, I would expect that an OpenMP program should also run sucessfully.<br>

>>>>><br>

>>>>> - Joachim<br>

>>>>><br>

>>>>><br>

>>>>> Am 01.03.21 um 18:49 schrieb Alexey.Bataev:<br>

>>>>>> Hi, I you sure that you system supports Unified Shared Memory? As far<br>

>>> as<br>

>>>>>> I know it requires special linux kernel and the hardware must support<br>

>>>>>> it. If you system does not support it, the code will crash for sure at<br>

>>>>>> the runtime.<br>

>>>>>><br>

>>>>>> In this mode, IIRC, we just ignore map clauses since the accelerator<br>

>>>>>> devices can access the host memory directly without the need for<br>

>>>>>> allocating the device-specific memory.<br>

>>>>>><br>

>>>>>><br>

>>>>>> -------------<br>

>>>>>> Best regards,<br>

>>>>>> Alexey Bataev<br>

>>>>>><br>

>>>>>> 3/1/2021 12:41 PM, Joachim Protze пишет:<br>

>>>>>>> Hi all,<br>

>>>>>>><br>

>>>>>>> even a more simple example segfaults, when the requires directive is<br>

>>> there:<br>

>>>>>>> #include <iostream><br>

>>>>>>> #include <omp.h><br>

>>>>>>> #include <stdio.h><br>

>>>>>>><br>

>>>>>>> #pragma omp requires unified_shared_memory<br>

>>>>>>> #define N 1024<br>

>>>>>>><br>

>>>>>>> int main() {<br>

>>>>>>>     int a[N];<br>

>>>>>>>     printf("a=%p\n", a);<br>

>>>>>>> #pragma omp target map(tofrom : a[0:N])<br>

>>>>>>>     {<br>

>>>>>>>       printf("a=%p\n", a);<br>

>>>>>>>       for (int i = 0; i < 1024; i++) {<br>

>>>>>>>         a[i] = 123;<br>

>>>>>>>       }<br>

>>>>>>>     }<br>

>>>>>>>     printf("a[0]=%i, a[%i]=%i\n", a[0], N/2, a[N/2]);<br>

>>>>>>> }<br>

>>>>>>><br>

>>>>>>> The code runs sucessfully when the requires directive is removed<br>

>>> because<br>

>>>>>>> the mapping of `a` is explicitly specified.<br>

>>>>>>><br>

>>>>>>> For this code to run successfully, would it be necessary to allocate<br>

>>> `a`<br>

>>>>>>> specially as cuda managed memory? I don't see any special treatment<br>

>>> of<br>

>>>>>>> `a` in llvm ir. As I understand the OpenMP spec, the requires<br>

>>> directive<br>

>>>>>>> should lead to a compile error if clang fails to generate such code.<br>

>>>>>>><br>

>>>>>>> The requires example from the OpenMP Examples also fails with the<br>

>>> same<br>

>>>>>>> runtime error:<br>

>>>>>>><br>

>>>>>>><br>

>>> <a href="https://github.com/OpenMP/Examples/blob/main/sources/Example_requires.1.cpp" rel="noreferrer" target="_blank">https://github.com/OpenMP/Examples/blob/main/sources/Example_requires.1.cpp</a><br>

>>>>>>> - Joachim<br>

>>>>>>><br>

>>>>>>> Am 28.02.21 um 11:12 schrieb Itaru Kitayama via Openmp-dev:<br>

>>>>>>>> This is the code:<br>

>>>>>>>><br>

>>>>>>>> #include <iostream><br>

>>>>>>>> #include <omp.h><br>

>>>>>>>><br>

>>>>>>>> #pragma omp requires unified_shared_memory<br>

>>>>>>>> #define N 1024<br>

>>>>>>>><br>

>>>>>>>> int main() {<br>

>>>>>>>>     int a[N] = {0};<br>

>>>>>>>>     int *device_data =  new int[N];<br>

>>>>>>>> #pragma omp target map(tofrom : device_data[0:N])<br>

>>>>>>>>     {<br>

>>>>>>>>       device_data = &a[0];<br>

>>>>>>>>       for (int i = 0; i < 1024; i++) {<br>

>>>>>>>>         device_data[i] = 123;<br>

>>>>>>>>       }<br>

>>>>>>>>     }<br>

>>>>>>>>     std::cout << a[0] << std::endl;<br>

>>>>>>>> }<br>

>>>>>>>><br>

>>>>>>>> On Sun, Feb 28, 2021 at 1:34 PM Johannes Doerfert<br>

>>>>>>>> <<a href="mailto:johannesdoerfert@gmail.com" target="_blank">johannesdoerfert@gmail.com</a>> wrote:<br>

>>>>>>>>> You have an illegal memory access, some memory is not properly<br>

>>>>>>>>> mapped.<br>

>>>>>>>>><br>

>>>>>>>>><br>

>>>>>>>>> On 2/27/21 7:47 PM, Itaru Kitayama wrote:<br>

>>>>>>>>>> Removed the internal function, but I get:<br>

>>>>>>>>>><br>

>>>>>>>>>> CUDA device 0 info: Device supports up to 65536 CUDA blocks and<br>

>>> 1024<br>

>>>>>>>>>> threads with a warp size of 32<br>

>>>>>>>>>> CUDA device 0 info: Launching kernel<br>

>>>>>>>>>> __omp_offloading_34_8009dd23_main_l12 with 1 blocks and 33<br>

>>> threads in<br>

>>>>>>>>>> Generic mode<br>

>>>>>>>>>> CUDA error: Error when synchronizing stream. stream =<br>

>>>>>>>>>> 0x0000000001d22ae0, async info ptr = 0x00007ffe73ea2728<br>

>>>>>>>>>> CUDA error: an illegal memory access was encountered<br>

>>>>>>>>>> Libomptarget error: Failed to synchronize device.<br>

>>>>>>>>>> Libomptarget error: Call to targetDataEnd failed, abort target.<br>

>>>>>>>>>> Libomptarget error: Failed to process data after launching the<br>

>>> kernel.<br>

>>>>>>>>>> Libomptarget error: Source location information not present.<br>

>>> Compile<br>

>>>>>>>>>> with -g or -gline-tables-only.<br>

>>>>>>>>>> Libomptarget fatal error 1: failure of target construct while<br>

>>>>>>>>>> offloading is mandatory<br>

>>>>>>>>>> /var/spool/parastation/jobs/8941317: line 23: 20812 Aborted<br>

>>>>>>>>>>         (core dumped) ./a.out<br>

>>>>>>>>>><br>

>>>>>>>>>> On Sun, Feb 28, 2021 at 10:35 AM Alexey Bataev <<br>

>>> <a href="mailto:a.bataev@hotmail.com" target="_blank">a.bataev@hotmail.com</a>> wrote:<br>

>>>>>>>>>>> Do not call __tgt_register_requires directly, this is the<br>

>>> internal function called by global constructor and its arg value depends on<br>

>>> #pragma omp requires. Use just this pragma.<br>

>>>>>>>>>>> Best regards,<br>

>>>>>>>>>>> Alexey Bataev<br>

>>>>>>>>>>><br>

>>>>>>>>>>>> 27 февр. 2021 г., в 20:28, Itaru Kitayama via Openmp-dev <<br>

>>> <a href="mailto:openmp-dev@lists.llvm.org" target="_blank">openmp-dev@lists.llvm.org</a>> написал(а):<br>

>>>>>>>>>>>> I'm trying to build a test C++ code that uses part of<br>

>>>>>>>>>>>> unified_shared_memory/shared_update.c<br>

>>>>>>>>>>>><br>

>>>>>>>>>>>>> On Sun, Feb 28, 2021 at 10:25 AM Johannes Doerfert<br>

>>>>>>>>>>>>> <<a href="mailto:johannesdoerfert@gmail.com" target="_blank">johannesdoerfert@gmail.com</a>> wrote:<br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>> I don't see this test, nor do I understand what you are trying<br>

>>> to say.<br>

>>>>>>>>>>>>> Is the test failing? If so, which test is this?<br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>> ~ Johannes<br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>><br>

>>>>>>>>>>>>>> On 2/27/21 7:17 PM, Itaru Kitayama via Openmp-dev wrote:<br>

>>>>>>>>>>>>>> The below C++ code builds, but the executable fails at<br>

>>> runtime.<br>

>>>>>>>>>>>>>> (It is taken from the C code under the libomptarget subdir's<br>

>>> test directory)<br>

>>>>>>>>>>>>>> #include <omp.h><br>

>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>> #pragma omp requires unified_shared_memory<br>

>>>>>>>>>>>>>> #define N 1024<br>

>>>>>>>>>>>>>> extern "C" void __tgt_register_requires(int64_t);<br>

>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>> int main() {<br>

>>>>>>>>>>>>>><br>

>>>>>>>>>>>>>>      int a[N] = {0};<br>

>>>>>>>>>>>>>>      int b[N] = {0};<br>

>>>>>>>>>>>>>>      int *device_data;<br>

>>>>>>>>>>>>>>      __tgt_register_requires(1);<br>

>>>>>>>>>>>>>> #pragma omp target map(tofrom : device_data)<br>

>>>>>>>>>>>>>>      {<br>

>>>>>>>>>>>>>>        device_data = &a[0];<br>

>>>>>>>>>>>>>>        for (int i = 0; i < 1024; i++) {<br>

>>>>>>>>>>>>>>          a[i] += 1;<br>

>>>>>>>>>>>>>>        }<br>

>>>>>>>>>>>>>>      }<br>

>>>>>>>>>>>>>> }<br>

>>>>>>>>>>>>>> _______________________________________________<br>

>>>>>>>>>>>>>> Openmp-dev mailing list<br>

>>>>>>>>>>>>>> <a href="mailto:Openmp-dev@lists.llvm.org" target="_blank">Openmp-dev@lists.llvm.org</a><br>

>>>>>>>>>>>>>> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev</a><br>

>>>>>>>>>>>> _______________________________________________<br>

>>>>>>>>>>>> Openmp-dev mailing list<br>

>>>>>>>>>>>> <a href="mailto:Openmp-dev@lists.llvm.org" target="_blank">Openmp-dev@lists.llvm.org</a><br>

>>>>>>>>>>>> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev</a><br>

>>>>>>>> _______________________________________________<br>

>>>>>>>> Openmp-dev mailing list<br>

>>>>>>>> <a href="mailto:Openmp-dev@lists.llvm.org" target="_blank">Openmp-dev@lists.llvm.org</a><br>

>>>>>>>> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev</a><br>

>>>>>>>><br>

</blockquote></div></div>