[Openmp-dev] CUDA error is: invalid device ordinal
Johannes Doerfert via Openmp-dev
openmp-dev at lists.llvm.org
Tue Jun 9 07:32:23 PDT 2020
@Alexey Why do you think it is a CUDA error and not a race in the
libomptarget?
@Ye Can we run this on a different system too?
On 6/9/20 8:19 AM, Ye Luo via Openmp-dev wrote:
> It is on the Summit supercomputer. I will ask the administrators for help.
> Ye
> ===================
> Ye Luo, Ph.D.
> Computational Science Division & Leadership Computing Facility
> Argonne National Laboratory
>
>
> On Tue, Jun 9, 2020 at 6:02 AM Alexey.Bataev <a.bataev at outlook.com> wrote:
>
>> Hi, most probably there is something wrong with CUDA installation or GPU
>> config. Try to reinstall CUDA at first.
>>
>> -------------
>> Best regards,
>> Alexey Bataev
>>
>> 08.06.2020 10:50 PM, Ye Luo via Openmp-dev пишет:
>>
>> Hi all,
>> Hopefully I can get some insights from the wider community.
>> My application runs fine on x86-64 + CUDA.
>> When I built the same version of clang and application on Power9+V100, I
>> got "CUDA error is: invalid device ordinal". It seems that the cuda plugin
>> got the device 0 but failed to create a context. I paste the debug + nvprof
>> output at the end of this email.
>> I used the same compiler to build a small test program. It runs fine.
>> What can be a potential cause of this CUDA error?
>> Ye
>>
>> Libomptarget --> Call to omp_get_num_devices returning 1
>> Libomptarget --> Default TARGET OFFLOAD policy is now mandatory (devices
>> were found)
>> Libomptarget --> Entering data begin region for device -1 with 1 mappings
>> Libomptarget --> Use default device id 0
>> Libomptarget --> Checking whether device 0 is ready.
>> Libomptarget --> Is the device 0 (local ID 0) initialized? 0
>> Target CUDA RTL --> Init requires flags to 1
>> Target CUDA RTL --> Getting device 0
>> Target CUDA RTL --> Error returned from cuCtxCreate
>> Target CUDA RTL --> CUDA error is: invalid device ordinal
>> Libomptarget --> Failed to init device 0
>> Libomptarget --> Device 0 is not ready.
>> Libomptarget --> Failed to get device 0 ready
>> Libomptarget fatal error 1: failure of target construct while offloading
>> is mandatory
>> ==176195== Profiling application: ../../../../bin/qmcpack
>> qmc_short_vmcbatch.in.xml
>> Libomptarget --> Unloading target library!
>> Libomptarget --> Image 0x00000000107b6470 is compatible with RTL
>> 0x000000003b329020!
>> Libomptarget --> Unregistered image 0x00000000107b6470 from RTL
>> 0x000000003b329020!
>> Libomptarget --> Done unregistering images!
>> Libomptarget --> Removing translation table for descriptor
>> 0x0000000010900318
>> Libomptarget --> Done unregistering library!
>> Libomptarget --> Deinit target library!
>> ==176195== Profiling result:
>> No kernels were profiled.
>> Type Time(%) Time Calls Avg Min
>> Max Name
>> API calls: 87.10% 1.75034s 7 250.05ms 250.00ms
>> 250.28ms cudaFree
>> 12.02% 241.59ms 1 241.59ms 241.59ms
>> 241.59ms cuDevicePrimaryCtxRelease
>> 0.42% 8.4971ms 1 8.4971ms 8.4971ms
>> 8.4971ms cuCtxCreate
>> 0.31% 6.1826ms 3 2.0609ms 827.87us
>> 3.7271ms cuModuleUnload
>> 0.08% 1.5932ms 97 16.424us 241ns
>> 652.53us cuDeviceGetAttribute
>> 0.05% 1.0525ms 1 1.0525ms 1.0525ms
>> 1.0525ms cuDeviceTotalMem
>> 0.01% 209.36us 1 209.36us 209.36us
>> 209.36us cuDeviceGetName
>> 0.00% 73.862us 7 10.551us 4.6310us
>> 28.909us cudaSetDevice
>> 0.00% 4.3990us 3 1.4660us 543ns
>> 2.6840us cuDeviceGet
>> 0.00% 3.9920us 1 3.9920us 3.9920us
>> 3.9920us cuDeviceGetPCIBusId
>> 0.00% 3.0740us 1 3.0740us 3.0740us
>> 3.0740us cudaGetDeviceCount
>> 0.00% 3.0000us 4 750ns 407ns
>> 1.2090us cuDeviceGetCount
>> 0.00% 2.1410us 1 2.1410us 2.1410us
>> 2.1410us cuInit
>> 0.00% 2.1080us 1 2.1080us 2.1080us
>> 2.1080us cuDriverGetVersion
>> 0.00% 1.9570us 1 1.9570us 1.9570us
>> 1.9570us cuGetErrorString
>> 0.00% 1.2870us 1 1.2870us 1.2870us
>> 1.2870us cuCtxSetCurrent
>> 0.00% 393ns 1 393ns 393ns
>> 393ns cuDeviceGetUuid
>> ===================
>> Ye Luo, Ph.D.
>> Computational Science Division & Leadership Computing Facility
>> Argonne National Laboratory
>>
>> _______________________________________________
>> Openmp-dev mailing listOpenmp-dev at lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>>
>>
>
> _______________________________________________
> Openmp-dev mailing list
> Openmp-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20200609/1e9abf7d/attachment-0001.html>
More information about the Openmp-dev
mailing list