<div dir="auto">Ok, thanks. But I wonder given there are USM tests in C; no one noticed the errors so far?</div><div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Mar 2, 2021 at 7:39 Johannes Doerfert <<a href="mailto:johannesdoerfert@gmail.com">johannesdoerfert@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Don't use required USM for now I would assume.<br>
<br>
On 3/1/21 4:35 PM, Itaru Kitayama wrote:<br>
> I’m on JURECA and some nodes are attached to A100 GPUs.<br>
><br>
> On Tue, Mar 2, 2021 at 7:34 Itaru Kitayama <<a href="mailto:itaru.kitayama@gmail.com" target="_blank">itaru.kitayama@gmail.com</a>> wrote:<br>
><br>
>> Hi all,<br>
>> In the mean time, what do I do?<br>
>><br>
>> On Tue, Mar 2, 2021 at 3:23 Johannes Doerfert <<a href="mailto:johannesdoerfert@gmail.com" target="_blank">johannesdoerfert@gmail.com</a>><br>
>> wrote:<br>
>><br>
>>> I think that is it. I heard of problems with our USM before.<br>
>>> We need to use the managed allocators if USM is active, they are<br>
>>> about to be upstreamed (I hope).<br>
>>><br>
>>><br>
>>> On 3/1/21 12:15 PM, Alexey.Bataev wrote:<br>
>>>> Looks like this example is for Explicit USM and I assume if you allocate<br>
>>>> the memory for a in managed memory explicitly, the OpenMP example also<br>
>>>> should work.<br>
>>>><br>
>>>> There are other USM modes though, where the memory is shared implicitly<br>
>>>> between the host and the devices. Looks like currently LLVM<br>
>>>> implementation relies on this thing<br>
>>>><br>
>>> <a href="https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-system-allocator" rel="noreferrer" target="_blank">https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-system-allocator</a><br>
>>>> where Implicit USM is supported.<br>
>>>><br>
>>>> -------------<br>
>>>> Best regards,<br>
>>>> Alexey Bataev<br>
>>>><br>
>>>> 3/1/2021 1:04 PM, Joachim Protze пишет:<br>
>>>>> Are the Kernel/Hardware requirements llvm specific?<br>
>>>>><br>
>>>>> I can compile and execute the <a href="http://add_grid.cu" rel="noreferrer" target="_blank">add_grid.cu</a> example sucessfully:<br>
>>>>> <a href="https://developer.nvidia.com/blog/unified-memory-cuda-beginners/" rel="noreferrer" target="_blank">https://developer.nvidia.com/blog/unified-memory-cuda-beginners/</a><br>
>>>>><br>
>>>>> So, I would expect that an OpenMP program should also run sucessfully.<br>
>>>>><br>
>>>>> - Joachim<br>
>>>>><br>
>>>>><br>
>>>>> Am 01.03.21 um 18:49 schrieb Alexey.Bataev:<br>
>>>>>> Hi, I you sure that you system supports Unified Shared Memory? As far<br>
>>> as<br>
>>>>>> I know it requires special linux kernel and the hardware must support<br>
>>>>>> it. If you system does not support it, the code will crash for sure at<br>
>>>>>> the runtime.<br>
>>>>>><br>
>>>>>> In this mode, IIRC, we just ignore map clauses since the accelerator<br>
>>>>>> devices can access the host memory directly without the need for<br>
>>>>>> allocating the device-specific memory.<br>
>>>>>><br>
>>>>>><br>
>>>>>> -------------<br>
>>>>>> Best regards,<br>
>>>>>> Alexey Bataev<br>
>>>>>><br>
>>>>>> 3/1/2021 12:41 PM, Joachim Protze пишет:<br>
>>>>>>> Hi all,<br>
>>>>>>><br>
>>>>>>> even a more simple example segfaults, when the requires directive is<br>
>>> there:<br>
>>>>>>> #include <iostream><br>
>>>>>>> #include <omp.h><br>
>>>>>>> #include <stdio.h><br>
>>>>>>><br>
>>>>>>> #pragma omp requires unified_shared_memory<br>
>>>>>>> #define N 1024<br>
>>>>>>><br>
>>>>>>> int main() {<br>
>>>>>>> int a[N];<br>
>>>>>>> printf("a=%p\n", a);<br>
>>>>>>> #pragma omp target map(tofrom : a[0:N])<br>
>>>>>>> {<br>
>>>>>>> printf("a=%p\n", a);<br>
>>>>>>> for (int i = 0; i < 1024; i++) {<br>
>>>>>>> a[i] = 123;<br>
>>>>>>> }<br>
>>>>>>> }<br>
>>>>>>> printf("a[0]=%i, a[%i]=%i\n", a[0], N/2, a[N/2]);<br>
>>>>>>> }<br>
>>>>>>><br>
>>>>>>> The code runs sucessfully when the requires directive is removed<br>
>>> because<br>
>>>>>>> the mapping of `a` is explicitly specified.<br>
>>>>>>><br>
>>>>>>> For this code to run successfully, would it be necessary to allocate<br>
>>> `a`<br>
>>>>>>> specially as cuda managed memory? I don't see any special treatment<br>
>>> of<br>
>>>>>>> `a` in llvm ir. As I understand the OpenMP spec, the requires<br>
>>> directive<br>
>>>>>>> should lead to a compile error if clang fails to generate such code.<br>
>>>>>>><br>
>>>>>>> The requires example from the OpenMP Examples also fails with the<br>
>>> same<br>
>>>>>>> runtime error:<br>
>>>>>>><br>
>>>>>>><br>
>>> <a href="https://github.com/OpenMP/Examples/blob/main/sources/Example_requires.1.cpp" rel="noreferrer" target="_blank">https://github.com/OpenMP/Examples/blob/main/sources/Example_requires.1.cpp</a><br>
>>>>>>> - Joachim<br>
>>>>>>><br>
>>>>>>> Am 28.02.21 um 11:12 schrieb Itaru Kitayama via Openmp-dev:<br>
>>>>>>>> This is the code:<br>
>>>>>>>><br>
>>>>>>>> #include <iostream><br>
>>>>>>>> #include <omp.h><br>
>>>>>>>><br>
>>>>>>>> #pragma omp requires unified_shared_memory<br>
>>>>>>>> #define N 1024<br>
>>>>>>>><br>
>>>>>>>> int main() {<br>
>>>>>>>> int a[N] = {0};<br>
>>>>>>>> int *device_data = new int[N];<br>
>>>>>>>> #pragma omp target map(tofrom : device_data[0:N])<br>
>>>>>>>> {<br>
>>>>>>>> device_data = &a[0];<br>
>>>>>>>> for (int i = 0; i < 1024; i++) {<br>
>>>>>>>> device_data[i] = 123;<br>
>>>>>>>> }<br>
>>>>>>>> }<br>
>>>>>>>> std::cout << a[0] << std::endl;<br>
>>>>>>>> }<br>
>>>>>>>><br>
>>>>>>>> On Sun, Feb 28, 2021 at 1:34 PM Johannes Doerfert<br>
>>>>>>>> <<a href="mailto:johannesdoerfert@gmail.com" target="_blank">johannesdoerfert@gmail.com</a>> wrote:<br>
>>>>>>>>> You have an illegal memory access, some memory is not properly<br>
>>>>>>>>> mapped.<br>
>>>>>>>>><br>
>>>>>>>>><br>
>>>>>>>>> On 2/27/21 7:47 PM, Itaru Kitayama wrote:<br>
>>>>>>>>>> Removed the internal function, but I get:<br>
>>>>>>>>>><br>
>>>>>>>>>> CUDA device 0 info: Device supports up to 65536 CUDA blocks and<br>
>>> 1024<br>
>>>>>>>>>> threads with a warp size of 32<br>
>>>>>>>>>> CUDA device 0 info: Launching kernel<br>
>>>>>>>>>> __omp_offloading_34_8009dd23_main_l12 with 1 blocks and 33<br>
>>> threads in<br>
>>>>>>>>>> Generic mode<br>
>>>>>>>>>> CUDA error: Error when synchronizing stream. stream =<br>
>>>>>>>>>> 0x0000000001d22ae0, async info ptr = 0x00007ffe73ea2728<br>
>>>>>>>>>> CUDA error: an illegal memory access was encountered<br>
>>>>>>>>>> Libomptarget error: Failed to synchronize device.<br>
>>>>>>>>>> Libomptarget error: Call to targetDataEnd failed, abort target.<br>
>>>>>>>>>> Libomptarget error: Failed to process data after launching the<br>
>>> kernel.<br>
>>>>>>>>>> Libomptarget error: Source location information not present.<br>
>>> Compile<br>
>>>>>>>>>> with -g or -gline-tables-only.<br>
>>>>>>>>>> Libomptarget fatal error 1: failure of target construct while<br>
>>>>>>>>>> offloading is mandatory<br>
>>>>>>>>>> /var/spool/parastation/jobs/8941317: line 23: 20812 Aborted<br>
>>>>>>>>>> (core dumped) ./a.out<br>
>>>>>>>>>><br>
>>>>>>>>>> On Sun, Feb 28, 2021 at 10:35 AM Alexey Bataev <<br>
>>> <a href="mailto:a.bataev@hotmail.com" target="_blank">a.bataev@hotmail.com</a>> wrote:<br>
>>>>>>>>>>> Do not call __tgt_register_requires directly, this is the<br>
>>> internal function called by global constructor and its arg value depends on<br>
>>> #pragma omp requires. Use just this pragma.<br>
>>>>>>>>>>> Best regards,<br>
>>>>>>>>>>> Alexey Bataev<br>
>>>>>>>>>>><br>
>>>>>>>>>>>> 27 февр. 2021 г., в 20:28, Itaru Kitayama via Openmp-dev <<br>
>>> <a href="mailto:openmp-dev@lists.llvm.org" target="_blank">openmp-dev@lists.llvm.org</a>> написал(а):<br>
>>>>>>>>>>>> I'm trying to build a test C++ code that uses part of<br>
>>>>>>>>>>>> unified_shared_memory/shared_update.c<br>
>>>>>>>>>>>><br>
>>>>>>>>>>>>> On Sun, Feb 28, 2021 at 10:25 AM Johannes Doerfert<br>
>>>>>>>>>>>>> <<a href="mailto:johannesdoerfert@gmail.com" target="_blank">johannesdoerfert@gmail.com</a>> wrote:<br>
>>>>>>>>>>>>><br>
>>>>>>>>>>>>> I don't see this test, nor do I understand what you are trying<br>
>>> to say.<br>
>>>>>>>>>>>>> Is the test failing? If so, which test is this?<br>
>>>>>>>>>>>>><br>
>>>>>>>>>>>>> ~ Johannes<br>
>>>>>>>>>>>>><br>
>>>>>>>>>>>>><br>
>>>>>>>>>>>>>> On 2/27/21 7:17 PM, Itaru Kitayama via Openmp-dev wrote:<br>
>>>>>>>>>>>>>> The below C++ code builds, but the executable fails at<br>
>>> runtime.<br>
>>>>>>>>>>>>>> (It is taken from the C code under the libomptarget subdir's<br>
>>> test directory)<br>
>>>>>>>>>>>>>> #include <omp.h><br>
>>>>>>>>>>>>>><br>
>>>>>>>>>>>>>> #pragma omp requires unified_shared_memory<br>
>>>>>>>>>>>>>> #define N 1024<br>
>>>>>>>>>>>>>> extern "C" void __tgt_register_requires(int64_t);<br>
>>>>>>>>>>>>>><br>
>>>>>>>>>>>>>> int main() {<br>
>>>>>>>>>>>>>><br>
>>>>>>>>>>>>>> int a[N] = {0};<br>
>>>>>>>>>>>>>> int b[N] = {0};<br>
>>>>>>>>>>>>>> int *device_data;<br>
>>>>>>>>>>>>>> __tgt_register_requires(1);<br>
>>>>>>>>>>>>>> #pragma omp target map(tofrom : device_data)<br>
>>>>>>>>>>>>>> {<br>
>>>>>>>>>>>>>> device_data = &a[0];<br>
>>>>>>>>>>>>>> for (int i = 0; i < 1024; i++) {<br>
>>>>>>>>>>>>>> a[i] += 1;<br>
>>>>>>>>>>>>>> }<br>
>>>>>>>>>>>>>> }<br>
>>>>>>>>>>>>>> }<br>
>>>>>>>>>>>>>> _______________________________________________<br>
>>>>>>>>>>>>>> Openmp-dev mailing list<br>
>>>>>>>>>>>>>> <a href="mailto:Openmp-dev@lists.llvm.org" target="_blank">Openmp-dev@lists.llvm.org</a><br>
>>>>>>>>>>>>>> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev</a><br>
>>>>>>>>>>>> _______________________________________________<br>
>>>>>>>>>>>> Openmp-dev mailing list<br>
>>>>>>>>>>>> <a href="mailto:Openmp-dev@lists.llvm.org" target="_blank">Openmp-dev@lists.llvm.org</a><br>
>>>>>>>>>>>> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev</a><br>
>>>>>>>> _______________________________________________<br>
>>>>>>>> Openmp-dev mailing list<br>
>>>>>>>> <a href="mailto:Openmp-dev@lists.llvm.org" target="_blank">Openmp-dev@lists.llvm.org</a><br>
>>>>>>>> <a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev</a><br>
>>>>>>>><br>
</blockquote></div></div>