<div dir="ltr"><div>Hi all,</div><div>Hopefully I can get some insights from the wider community.</div><div>My application runs fine on x86-64 + CUDA.</div><div>When I built the same version of clang and application on Power9+V100, I got "CUDA error is: invalid device ordinal". It seems that the cuda plugin got the device 0 but failed to create a context. I paste the debug + nvprof output at the end of this email.<br></div><div>I used the same compiler to build a small test program. It runs fine.</div><div>What can be a potential cause of this CUDA error?<br></div><div>Ye<br></div><div><br></div><div>Libomptarget --> Call to omp_get_num_devices returning 1</div>Libomptarget --> Default TARGET OFFLOAD policy is now mandatory (devices were found)<br>Libomptarget --> Entering data begin region for device -1 with 1 mappings<br>Libomptarget --> Use default device id 0<br>Libomptarget --> Checking whether device 0 is ready.<br>Libomptarget --> Is the device 0 (local ID 0) initialized? 0<br>Target CUDA RTL --> Init requires flags to 1<br><span style="background-color:rgb(255,0,0)">Target CUDA RTL --> Getting device 0<br>Target CUDA RTL --> Error returned from cuCtxCreate<br>Target CUDA RTL --> CUDA error is: invalid device ordinal</span><br>Libomptarget --> Failed to init device 0<br>Libomptarget --> Device 0 is not ready.<br>Libomptarget --> Failed to get device 0 ready<br>Libomptarget fatal error 1: failure of target construct while offloading is mandatory<br>==176195== Profiling application: ../../../../bin/qmcpack qmc_short_vmcbatch.in.xml<br>Libomptarget --> Unloading target library!<br>Libomptarget --> Image 0x00000000107b6470 is compatible with RTL 0x000000003b329020!<br>Libomptarget --> Unregistered image 0x00000000107b6470 from RTL 0x000000003b329020!<br>Libomptarget --> Done unregistering images!<br>Libomptarget --> Removing translation table for descriptor 0x0000000010900318<br>Libomptarget --> Done unregistering library!<br>Libomptarget --> Deinit target library!<br>==176195== Profiling result:<br>No kernels were profiled.<br> Type Time(%) Time Calls Avg Min Max Name<br> API calls: 87.10% 1.75034s 7 250.05ms 250.00ms 250.28ms cudaFree<br> 12.02% 241.59ms 1 241.59ms 241.59ms 241.59ms cuDevicePrimaryCtxRelease<br> 0.42% 8.4971ms 1 8.4971ms 8.4971ms 8.4971ms cuCtxCreate<br> 0.31% 6.1826ms 3 2.0609ms 827.87us 3.7271ms cuModuleUnload<br> 0.08% 1.5932ms 97 16.424us 241ns 652.53us cuDeviceGetAttribute<br> 0.05% 1.0525ms 1 1.0525ms 1.0525ms 1.0525ms cuDeviceTotalMem<br> 0.01% 209.36us 1 209.36us 209.36us 209.36us cuDeviceGetName<br> 0.00% 73.862us 7 10.551us 4.6310us 28.909us cudaSetDevice<br> 0.00% 4.3990us 3 1.4660us 543ns 2.6840us cuDeviceGet<br> 0.00% 3.9920us 1 3.9920us 3.9920us 3.9920us cuDeviceGetPCIBusId<br> 0.00% 3.0740us 1 3.0740us 3.0740us 3.0740us cudaGetDeviceCount<br> 0.00% 3.0000us 4 750ns 407ns 1.2090us cuDeviceGetCount<br> 0.00% 2.1410us 1 2.1410us 2.1410us 2.1410us cuInit<br> 0.00% 2.1080us 1 2.1080us 2.1080us 2.1080us cuDriverGetVersion<br> 0.00% 1.9570us 1 1.9570us 1.9570us 1.9570us cuGetErrorString<br> 0.00% 1.2870us 1 1.2870us 1.2870us 1.2870us cuCtxSetCurrent<br> 0.00% 393ns 1 393ns 393ns 393ns cuDeviceGetUuid<br><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr">===================<br>
Ye Luo, Ph.D.<br>Computational Science Division & Leadership Computing Facility<br>
Argonne National Laboratory</div></div></div></div></div></div>