<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>@Alexey Why do you think it is a CUDA error and not a race in the
libomptarget?</p>
<p>@Ye Can we run this on a different system too?<br>
</p>
<p><br>
</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 6/9/20 8:19 AM, Ye Luo via
Openmp-dev wrote:<br>
</div>
<blockquote type="cite"
cite="mid:CACiEoHk=bpJOPKEcwE9iq7OZg+Fk=sny_OhKjyW1BKNFn8UKdw@mail.gmail.com">
<pre class="moz-quote-pre" wrap="">It is on the Summit supercomputer. I will ask the administrators for help.
Ye
===================
Ye Luo, Ph.D.
Computational Science Division & Leadership Computing Facility
Argonne National Laboratory
On Tue, Jun 9, 2020 at 6:02 AM Alexey.Bataev <a class="moz-txt-link-rfc2396E" href="mailto:a.bataev@outlook.com"><a.bataev@outlook.com></a> wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Hi, most probably there is something wrong with CUDA installation or GPU
config. Try to reinstall CUDA at first.
-------------
Best regards,
Alexey Bataev
08.06.2020 10:50 PM, Ye Luo via Openmp-dev пишет:
Hi all,
Hopefully I can get some insights from the wider community.
My application runs fine on x86-64 + CUDA.
When I built the same version of clang and application on Power9+V100, I
got "CUDA error is: invalid device ordinal". It seems that the cuda plugin
got the device 0 but failed to create a context. I paste the debug + nvprof
output at the end of this email.
I used the same compiler to build a small test program. It runs fine.
What can be a potential cause of this CUDA error?
Ye
Libomptarget --> Call to omp_get_num_devices returning 1
Libomptarget --> Default TARGET OFFLOAD policy is now mandatory (devices
were found)
Libomptarget --> Entering data begin region for device -1 with 1 mappings
Libomptarget --> Use default device id 0
Libomptarget --> Checking whether device 0 is ready.
Libomptarget --> Is the device 0 (local ID 0) initialized? 0
Target CUDA RTL --> Init requires flags to 1
Target CUDA RTL --> Getting device 0
Target CUDA RTL --> Error returned from cuCtxCreate
Target CUDA RTL --> CUDA error is: invalid device ordinal
Libomptarget --> Failed to init device 0
Libomptarget --> Device 0 is not ready.
Libomptarget --> Failed to get device 0 ready
Libomptarget fatal error 1: failure of target construct while offloading
is mandatory
==176195== Profiling application: ../../../../bin/qmcpack
qmc_short_vmcbatch.in.xml
Libomptarget --> Unloading target library!
Libomptarget --> Image 0x00000000107b6470 is compatible with RTL
0x000000003b329020!
Libomptarget --> Unregistered image 0x00000000107b6470 from RTL
0x000000003b329020!
Libomptarget --> Done unregistering images!
Libomptarget --> Removing translation table for descriptor
0x0000000010900318
Libomptarget --> Done unregistering library!
Libomptarget --> Deinit target library!
==176195== Profiling result:
No kernels were profiled.
Type Time(%) Time Calls Avg Min
Max Name
API calls: 87.10% 1.75034s 7 250.05ms 250.00ms
250.28ms cudaFree
12.02% 241.59ms 1 241.59ms 241.59ms
241.59ms cuDevicePrimaryCtxRelease
0.42% 8.4971ms 1 8.4971ms 8.4971ms
8.4971ms cuCtxCreate
0.31% 6.1826ms 3 2.0609ms 827.87us
3.7271ms cuModuleUnload
0.08% 1.5932ms 97 16.424us 241ns
652.53us cuDeviceGetAttribute
0.05% 1.0525ms 1 1.0525ms 1.0525ms
1.0525ms cuDeviceTotalMem
0.01% 209.36us 1 209.36us 209.36us
209.36us cuDeviceGetName
0.00% 73.862us 7 10.551us 4.6310us
28.909us cudaSetDevice
0.00% 4.3990us 3 1.4660us 543ns
2.6840us cuDeviceGet
0.00% 3.9920us 1 3.9920us 3.9920us
3.9920us cuDeviceGetPCIBusId
0.00% 3.0740us 1 3.0740us 3.0740us
3.0740us cudaGetDeviceCount
0.00% 3.0000us 4 750ns 407ns
1.2090us cuDeviceGetCount
0.00% 2.1410us 1 2.1410us 2.1410us
2.1410us cuInit
0.00% 2.1080us 1 2.1080us 2.1080us
2.1080us cuDriverGetVersion
0.00% 1.9570us 1 1.9570us 1.9570us
1.9570us cuGetErrorString
0.00% 1.2870us 1 1.2870us 1.2870us
1.2870us cuCtxSetCurrent
0.00% 393ns 1 393ns 393ns
393ns cuDeviceGetUuid
===================
Ye Luo, Ph.D.
Computational Science Division & Leadership Computing Facility
Argonne National Laboratory
_______________________________________________
Openmp-dev mailing <a class="moz-txt-link-abbreviated" href="mailto:listOpenmp-dev@lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev">listOpenmp-dev@lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev</a>
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
</pre>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
Openmp-dev mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Openmp-dev@lists.llvm.org">Openmp-dev@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev">https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev</a>
</pre>
</blockquote>
</body>
</html>