<div dir="ltr"><div>Setting LIBOMPTARGET_DBUEG=1, on POWER8 with P100 GPUs I get:</div><div><br></div>$ ./a.out<br>Libomptarget --> Loading RTLs...<br>Libomptarget --> Loading library '<a href="http://libomptarget.rtl.ppc64.so">libomptarget.rtl.ppc64.so</a>'...<br>Libomptarget --> Successfully loaded library '<a href="http://libomptarget.rtl.ppc64.so">libomptarget.rtl.ppc64.so</a>'!<br>Libomptarget --> Registering RTL <a href="http://libomptarget.rtl.ppc64.so">libomptarget.rtl.ppc64.so</a> supporting 4 devices!<br>Libomptarget --> Loading library '<a href="http://libomptarget.rtl.x86_64.so">libomptarget.rtl.x86_64.so</a>'...<br>Libomptarget --> Unable to load library '<a href="http://libomptarget.rtl.x86_64.so">libomptarget.rtl.x86_64.so</a>': <a href="http://libomptarget.rtl.x86_64.so">libomptarget.rtl.x86_64.so</a>: cannot open shared object file: No such file or directory!<br>Libomptarget --> Loading library '<a href="http://libomptarget.rtl.cuda.so">libomptarget.rtl.cuda.so</a>'...<br>Target CUDA RTL --> Start initializing CUDA<br>Libomptarget --> Successfully loaded library '<a href="http://libomptarget.rtl.cuda.so">libomptarget.rtl.cuda.so</a>'!<br>Libomptarget --> Registering RTL <a href="http://libomptarget.rtl.cuda.so">libomptarget.rtl.cuda.so</a> supporting 1 devices!<br>Libomptarget --> Loading library '<a href="http://libomptarget.rtl.aarch64.so">libomptarget.rtl.aarch64.so</a>'...<br>Libomptarget --> Unable to load library '<a href="http://libomptarget.rtl.aarch64.so">libomptarget.rtl.aarch64.so</a>': <a href="http://libomptarget.rtl.aarch64.so">libomptarget.rtl.aarch64.so</a>: cannot open shared object file: No such file or directory!<br>Libomptarget --> RTLs loaded!<br>Libomptarget --> Image 0x0000000010001300 is NOT compatible with RTL <a href="http://libomptarget.rtl.ppc64.so">libomptarget.rtl.ppc64.so</a>!<br>Libomptarget --> Image 0x0000000010001300 is compatible with RTL <a href="http://libomptarget.rtl.cuda.so">libomptarget.rtl.cuda.so</a>!<br>Libomptarget --> RTL 0x0000010001b6d860 has index 0!<br>Libomptarget --> Registering image 0x0000000010001300 with RTL <a href="http://libomptarget.rtl.cuda.so">libomptarget.rtl.cuda.so</a>!<br>Libomptarget --> Done registering entries!<br>Libomptarget --> New requires flags 8 compatible with existing 8!<br>Libomptarget --> Call to omp_get_num_devices returning 1<br>Libomptarget --> Default TARGET OFFLOAD policy is now mandatory (devices were found)<br>Libomptarget --> Entering target region with entry point 0x0000000010001110 and device Id -1<br>Libomptarget --> Checking whether device 0 is ready.<br>Libomptarget --> Is the device 0 (local ID 0) initialized? 0<br>Target CUDA RTL --> Init requires flags to 8<br>Target CUDA RTL --> Getting device 0<br>Target CUDA RTL --> Max CUDA blocks per grid 2147483647 exceeds the hard team limit 65536, capping at the hard limit<br>Target CUDA RTL --> Using 1024 CUDA threads per block<br>Target CUDA RTL --> Using warp size 32<br>Target CUDA RTL --> Max number of CUDA blocks 65536, threads 1024 & warp size 32<br>Target CUDA RTL --> Default number of teams set according to library's default 128<br>Target CUDA RTL --> Default number of threads set according to library's default 128<br>Libomptarget --> Device 0 is ready to use.<br>Target CUDA RTL --> Load data from image 0x0000000010001300<br>Target CUDA RTL --> CUDA module successfully loaded!<br>Target CUDA RTL --> Entry point 0x0000000000000000 maps to __omp_offloading_46_804afcb6_main_l41 (0x0000110000350fd0)<br>Target CUDA RTL --> Entry point 0x0000000000000001 maps to __omp_offloading_46_804afcb6_main_l89 (0x0000110000361810)<br>Target CUDA RTL --> Sending global device environment data 4 bytes<br>Libomptarget --> Entry 0: Base=0x00003ffff55df0b0, Begin=0x00003ffff55df0b0, Size=8, Type=0x23<br>Libomptarget --> Entry 1: Base=0x00003ffff55de0a8, Begin=0x00003ffff55de0a8, Size=4096, Type=0x223<br>Libomptarget --> Entry 2: Base=0x00003ffff55df0c0, Begin=0x00003ffff55df0c0, Size=8, Type=0x23<br>Libomptarget --> Entry 3: Base=0x0000010001bd1a80, Begin=0x0000010001bd1a80, Size=0, Type=0x220<br>Libomptarget --> Looking up mapping(HstPtrBegin=0x00003ffff55df0b0, Size=8)...<br>Libomptarget --> Return HstPtrBegin 0x00003ffff55df0b0 Size=8 RefCount= updated<br>Libomptarget --> There are 8 bytes allocated at target address 0x00003ffff55df0b0 - is not new<br>Libomptarget --> Looking up mapping(HstPtrBegin=0x00003ffff55de0a8, Size=4096)...<br>Libomptarget --> Return HstPtrBegin 0x00003ffff55de0a8 Size=4096 RefCount= updated<br>Libomptarget --> There are 4096 bytes allocated at target address 0x00003ffff55de0a8 - is not new<br>Libomptarget --> Looking up mapping(HstPtrBegin=0x00003ffff55df0c0, Size=8)...<br>Libomptarget --> Return HstPtrBegin 0x00003ffff55df0c0 Size=8 RefCount= updated<br>Libomptarget --> There are 8 bytes allocated at target address 0x00003ffff55df0c0 - is not new<br>Libomptarget --> Looking up mapping(HstPtrBegin=0x0000010001bd1a80, Size=0)...<br>Libomptarget --> There are 0 bytes allocated at target address 0x0000000000000000 - is not new<br>Libomptarget --> Looking up mapping(HstPtrBegin=0x00003ffff55df0b0, Size=8)...<br>Libomptarget --> Get HstPtrBegin 0x00003ffff55df0b0 Size=8 RefCount=<br>Libomptarget --> Obtained target argument 0x00003ffff55df0b0 from host pointer 0x00003ffff55df0b0<br>Libomptarget --> Looking up mapping(HstPtrBegin=0x00003ffff55de0a8, Size=4096)...<br>Libomptarget --> Get HstPtrBegin 0x00003ffff55de0a8 Size=4096 RefCount=<br>Libomptarget --> Obtained target argument 0x00003ffff55de0a8 from host pointer 0x00003ffff55de0a8<br>Libomptarget --> Looking up mapping(HstPtrBegin=0x00003ffff55df0c0, Size=8)...<br>Libomptarget --> Get HstPtrBegin 0x00003ffff55df0c0 Size=8 RefCount=<br>Libomptarget --> Obtained target argument 0x00003ffff55df0c0 from host pointer 0x00003ffff55df0c0<br>Libomptarget --> Looking up mapping(HstPtrBegin=0x0000010001bd1a80, Size=0)...<br>Libomptarget --> Get HstPtrBegin 0x0000010001bd1a80 Size=0 RefCount=<br>Libomptarget --> Obtained target argument 0x0000010001bd1a80 from host pointer 0x0000010001bd1a80<br>Libomptarget --> Launching target execution __omp_offloading_46_804afcb6_main_l41 with pointer 0x0000110000322840 (index=0).<br>Target CUDA RTL --> Setting CUDA threads per block to requested 1<br>Target CUDA RTL --> Adding master warp: +32 threads<br>Target CUDA RTL --> Using requested number of teams 1<br>Target CUDA RTL --> Launch kernel with 1 blocks and 33 threads<br>Target CUDA RTL --> Launch of entry point at 0x0000110000322840 successful!<br>Libomptarget --> Looking up mapping(HstPtrBegin=0x0000010001bd1a80, Size=0)...<br>Libomptarget --> Get HstPtrBegin 0x0000010001bd1a80 Size=0 RefCount= updated<br>Libomptarget --> There are 0 bytes allocated at target address 0x0000010001bd1a80 - is not last<br>Libomptarget --> Looking up mapping(HstPtrBegin=0x00003ffff55df0c0, Size=8)...<br>Libomptarget --> Get HstPtrBegin 0x00003ffff55df0c0 Size=8 RefCount= updated<br>Libomptarget --> There are 8 bytes allocated at target address 0x00003ffff55df0c0 - is not last<br>Libomptarget --> Looking up mapping(HstPtrBegin=0x00003ffff55de0a8, Size=4096)...<br>Libomptarget --> Get HstPtrBegin 0x00003ffff55de0a8 Size=4096 RefCount= updated<br>Libomptarget --> There are 4096 bytes allocated at target address 0x00003ffff55de0a8 - is not last<br>Libomptarget --> Looking up mapping(HstPtrBegin=0x00003ffff55df0b0, Size=8)...<br>Libomptarget --> Get HstPtrBegin 0x00003ffff55df0b0 Size=8 RefCount= updated<br>Libomptarget --> There are 8 bytes allocated at target address 0x00003ffff55df0b0 - is not last<br>Target CUDA RTL --> Error when synchronizing stream. stream = 0x00001100002cd7c0, async info ptr = 0x00003ffff55dddf8<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Libomptarget fatal error 1: failure of target construct while offloading is mandatory<br>Target CUDA RTL --> Error returned from cuStreamDestroy<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Target CUDA RTL --> Error returned from cuStreamDestroy<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Target CUDA RTL --> Error returned from cuStreamDestroy<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Target CUDA RTL --> Error returned from cuStreamDestroy<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Target CUDA RTL --> Error returned from cuStreamDestroy<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Target CUDA RTL --> Error returned from cuStreamDestroy<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Target CUDA RTL --> Error returned from cuStreamDestroy<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Target CUDA RTL --> Error returned from cuStreamDestroy<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Target CUDA RTL --> Error returned from cuStreamDestroy<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Target CUDA RTL --> Error returned from cuStreamDestroy<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Target CUDA RTL --> Error returned from cuStreamDestroy<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Target CUDA RTL --> Error returned from cuStreamDestroy<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Target CUDA RTL --> Error returned from cuStreamDestroy<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Target CUDA RTL --> Error returned from cuStreamDestroy<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Target CUDA RTL --> Error returned from cuStreamDestroy<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Target CUDA RTL --> Error returned from cuStreamDestroy<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Target CUDA RTL --> Error returned from cuStreamDestroy<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Target CUDA RTL --> Error returned from cuStreamDestroy<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Target CUDA RTL --> Error returned from cuStreamDestroy<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Target CUDA RTL --> Error returned from cuStreamDestroy<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Target CUDA RTL --> Error returned from cuStreamDestroy<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Target CUDA RTL --> Error returned from cuStreamDestroy<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Target CUDA RTL --> Error returned from cuStreamDestroy<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Target CUDA RTL --> Error returned from cuStreamDestroy<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Target CUDA RTL --> Error returned from cuStreamDestroy<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Target CUDA RTL --> Error returned from cuStreamDestroy<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Target CUDA RTL --> Error returned from cuStreamDestroy<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Target CUDA RTL --> Error returned from cuStreamDestroy<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Target CUDA RTL --> Error returned from cuStreamDestroy<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Target CUDA RTL --> Error returned from cuStreamDestroy<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Target CUDA RTL --> Error returned from cuStreamDestroy<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Target CUDA RTL --> Error returned from cuStreamDestroy<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Target CUDA RTL --> Error returned from cuModuleUnload<br>Target CUDA RTL --> CUDA error is: an illegal memory access was encountered<br>Libomptarget --> Unloading target library!<br>Libomptarget --> Image 0x0000000010001300 is compatible with RTL 0x0000010001b6d860!<br>Libomptarget --> Unregistered image 0x0000000010001300 from RTL 0x0000010001b6d860!<br>Libomptarget --> Done unregistering images!<br>Libomptarget --> Removing translation table for descriptor 0x00000000100193e8<br>Libomptarget --> Done unregistering library!<br>Libomptarget --> Deinit target library!<br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, May 2, 2020 at 12:55 PM Itaru Kitayama <<a href="mailto:itaru.kitayama@gmail.com">itaru.kitayama@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">deviceQuery returns:<div><br></div><div> CUDA Device Query (Runtime API) version (CUDART static linking)<br><br>Detected 1 CUDA Capable device(s)<br><br>Device 0: "Tesla P100-SXM2-16GB"<br> CUDA Driver Version / Runtime Version 10.1 / 8.0<br> CUDA Capability Major/Minor version number: 6.0<br> Total amount of global memory: 16281 MBytes (17071734784 bytes)<br> (56) Multiprocessors, ( 64) CUDA Cores/MP: 3584 CUDA Cores<br> GPU Max Clock rate: 1481 MHz (1.48 GHz)<br> Memory Clock rate: 715 Mhz<br> Memory Bus Width: 4096-bit<br> L2 Cache Size: 4194304 bytes<br> Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)<br> Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers<br> Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers<br> Total amount of constant memory: 65536 bytes<br> Total amount of shared memory per block: 49152 bytes<br> Total number of registers available per block: 65536<br> Warp size: 32<br> Maximum number of threads per multiprocessor: 2048<br> Maximum number of threads per block: 1024<br> Max dimension size of a thread block (x,y,z): (1024, 1024, 64)<br> Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)<br> Maximum memory pitch: 2147483647 bytes<br> Texture alignment: 512 bytes<br> Concurrent copy and kernel execution: Yes with 5 copy engine(s)<br> Run time limit on kernels: No<br> Integrated GPU sharing Host Memory: No<br> Support host page-locked memory mapping: Yes<br> Alignment requirement for Surfaces: Yes<br> Device has ECC support: Enabled<br> Device supports Unified Addressing (UVA): Yes<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, May 2, 2020 at 10:31 AM Itaru Kitayama <<a href="mailto:itaru.kitayama@gmail.com" target="_blank">itaru.kitayama@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Executing shared_update.c on P100 results in errors;<div><br></div><div>==130340== NVPROF is profiling process 130340, command: ./a.out<br>Libomptarget fatal error 1: failure of target construct while offloading is mandatory<br>==130340== Profiling application: ./a.out<br>==130340== Warning: 1 records have invalid timestamps due to insufficient device buffer space. You can configure the buffer space using the option --device-buffer-size.<br>==130340== Profiling result:<br> Type Time(%) Time Calls Avg Min Max Name<br> GPU activities: 89.68% 40.950us 2 20.475us 18.103us 22.847us [CUDA memcpy DtoH]<br> 10.32% 4.7100us 1 4.7100us 4.7100us 4.7100us [CUDA memcpy HtoD]<br> API calls: 69.95% 400.85ms 1 400.85ms 400.85ms 400.85ms cuCtxCreate<br> 15.17% 86.932ms 1 86.932ms 86.932ms 86.932ms cuStreamSynchronize<br> 12.11% 69.398ms 1 69.398ms 69.398ms 69.398ms cuCtxDestroy<br> 2.68% 15.375ms 1 15.375ms 15.375ms 15.375ms cuModuleLoadDataEx<br> 0.06% 363.13us 32 11.347us 754ns 171.53us cuStreamCreate<br> 0.01% 48.938us 2 24.469us 19.581us 29.357us cuMemcpyDtoH<br> 0.00% 22.184us 1 22.184us 22.184us 22.184us cuLaunchKernel<br> 0.00% 7.6760us 1 7.6760us 7.6760us 7.6760us cuMemcpyHtoD<br> 0.00% 4.7430us 32 148ns 113ns 520ns cuStreamDestroy<br> 0.00% 2.9060us 3 968ns 562ns 1.5750us cuModuleGetGlobal<br> 0.00% 2.8940us 2 1.4470us 336ns 2.5580us cuModuleGetFunction<br> 0.00% 2.8250us 3 941ns 181ns 2.2050us cuDeviceGetCount<br> 0.00% 2.6040us 2 1.3020us 965ns 1.6390us cuDeviceGet<br> 0.00% 2.4200us 5 484ns 137ns 882ns cuCtxSetCurrent<br> 0.00% 1.6450us 6 274ns 117ns 671ns cuDeviceGetAttribute<br> 0.00% 804ns 1 804ns 804ns 804ns cuFuncGetAttribute<br> 0.00% 296ns 1 296ns 296ns 296ns cuModuleUnload<br>======== Error: Application returned non-zero code 1<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sat, May 2, 2020 at 8:24 AM Itaru Kitayama <<a href="mailto:itaru.kitayama@gmail.com" target="_blank">itaru.kitayama@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Doru,<div>What's the current way of enabling SM_60 CUDA architecture support for unified addressing?</div><div>It's been modified since we exchanged the message.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Nov 7, 2019 at 4:05 AM Gheorghe-Teod Bercea <<a href="mailto:Gheorghe-Teod.Bercea@ibm.com" target="_blank">Gheorghe-Teod.Bercea@ibm.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span style="font-size:10pt;font-family:sans-serif">Hi Itaru,</span><br><br><span style="font-size:10pt;font-family:sans-serif">We did not test
those features on an sm_60 machine like a Pascal GPU so I can't guarantee
it will work. I suggest you enable it locally and see how it performs.</span><br><span style="font-size:10pt;font-family:sans-serif">You only need
to make a small change in "void CGOpenMPRuntimeNVPTX::checkArchForUnifiedAddressing(const
OMPRequiresDecl *D)" to allow sm_60 to be accepted as a valid target.</span><br><br><span style="font-size:10pt;font-family:sans-serif">Thanks,</span><br><br><span style="font-size:10pt;font-family:sans-serif">--Doru<br></span><br><br><br><br><span style="font-size:9pt;color:rgb(95,95,95);font-family:sans-serif">From:
</span><span style="font-size:9pt;font-family:sans-serif">Itaru
Kitayama via Openmp-dev <<a href="mailto:openmp-dev@lists.llvm.org" target="_blank">openmp-dev@lists.llvm.org</a>></span><br><span style="font-size:9pt;color:rgb(95,95,95);font-family:sans-serif">To:
</span><span style="font-size:9pt;font-family:sans-serif">Alexey
Bataev <<a href="mailto:a.bataev@outlook.com" target="_blank">a.bataev@outlook.com</a>></span><br><span style="font-size:9pt;color:rgb(95,95,95);font-family:sans-serif">Cc:
</span><span style="font-size:9pt;font-family:sans-serif">openmp-dev
<<a href="mailto:openmp-dev@lists.llvm.org" target="_blank">openmp-dev@lists.llvm.org</a>></span><br><span style="font-size:9pt;color:rgb(95,95,95);font-family:sans-serif">Date:
</span><span style="font-size:9pt;font-family:sans-serif">11/05/2019
06:04 PM</span><br><span style="font-size:9pt;color:rgb(95,95,95);font-family:sans-serif">Subject:
</span><span style="font-size:9pt;font-family:sans-serif">[EXTERNAL]
Re: [Openmp-dev] Target architecture does not support unified
addressing</span><br><span style="font-size:9pt;color:rgb(95,95,95);font-family:sans-serif">Sent
by: </span><span style="font-size:9pt;font-family:sans-serif">"Openmp-dev"
<<a href="mailto:openmp-dev-bounces@lists.llvm.org" target="_blank">openmp-dev-bounces@lists.llvm.org</a>></span><br><hr noshade><br><br><br><span style="font-size:12pt">Can you say briefly as to why SM60,
while it is capable of handing unified addresses, is not supported in
Clang?</span><br><br><span style="font-size:12pt">On Wed, Nov 6, 2019 at 7:56 AM Alexey
Bataev <</span><a href="mailto:a.bataev@outlook.com" target="_blank"><span style="font-size:12pt;color:blue"><u>a.bataev@outlook.com</u></span></a><span style="font-size:12pt">>
wrote:</span><br><span style="font-size:12pt">Yes, it is enforced in clang.<br></span><br><span style="font-size:12pt">Best regards, </span><br><span style="font-size:12pt">Alexey Bataev</span><br><br><span style="font-size:12pt">5 нояб. 2019 г., в 17:38, Itaru
Kitayama <</span><a href="mailto:itaru.kitayama@gmail.com" target="_blank"><span style="font-size:12pt;color:blue"><u>itaru.kitayama@gmail.com</u></span></a><span style="font-size:12pt">>
написал(а):<br></span><br><span style="font-size:12pt"> </span><br><span style="font-size:12pt">Thank you, Alexey. Now I am seeing: </span><br><br><span style="font-size:12pt">$ clang++ -fopenmp -fopenmp-targets=nvptx64
tmp.cpp</span><br><span style="font-size:12pt">tmp.cpp:1:22: error: Target architecture
sm_60 does not support unified addressing</span><br><span style="font-size:12pt">#pragma omp requires unified_shared_memory</span><br><span style="font-size:12pt">
^</span><br><span style="font-size:12pt">1 error generated.</span><br><br><span style="font-size:12pt">P100 is a SM60 device, but supports unified
memory. Is a requirement sm_70 equals or greater</span><br><span style="font-size:12pt">enforced in Clang? </span><br><br><span style="font-size:12pt">On Wed, Nov 6, 2019 at 5:07 AM Alexey
Bataev <</span><a href="mailto:a.bataev@outlook.com" target="_blank"><span style="font-size:12pt;color:blue"><u>a.bataev@outlook.com</u></span></a><span style="font-size:12pt">>
wrote:</span><br><span style="font-size:12pt">Most probably, you use default architecture,
i.e. sm_35. You need to build clang with sm_35, sm_70, ... supported archs.
Plus, your system must support unified memory.</span><br><span style="font-size:12pt">I updated error message in the compiler,
now it says what target architecture you use .</span><br><tt><span style="font-size:12pt">-------------<br>Best regards,<br>Alexey Bataev</span></tt><br><span style="font-size:12pt">05.11.2019 3:01 PM, Itaru Kitayama пишет:</span><br><span style="font-size:12pt">I’ve been building trunk Clang locally
targeting the P100 device attached to Host. Should I check the tool chain?</span><br><br><span style="font-size:12pt">On Tue, Nov 5, 2019 at 23:47 Alexey Bataev
<</span><a href="mailto:a.bataev@outlook.com" target="_blank"><span style="font-size:12pt;color:blue"><u>a.bataev@outlook.com</u></span></a><span style="font-size:12pt">>
wrote:</span><br><span style="font-size:12pt">You're building you code for the architecture
that does not support unified memory, say sm_35. Unified memory only supported
for architectures >= sm_70. </span><br><tt><span style="font-size:12pt">-------------<br>Best regards,<br>Alexey Bataev</span></tt><br><span style="font-size:12pt">05.11.2019 3:16 AM, Itaru Kitayama via
Openmp-dev пишет:</span><br><span style="font-size:12pt">Hi, </span><br><span style="font-size:12pt">Using a pragma like below:</span><br><br><span style="font-size:12pt">$ cat tmp.cpp</span><br><span style="font-size:12pt">#pragma omp requires unified_shared_memory</span><br><br><span style="font-size:12pt">int main() {</span><br><span style="font-size:12pt">}</span><br><br><span style="font-size:12pt">produces en error on a POWER8 based system
with P100 devices (that support unified memory).</span><br><br><span style="font-size:12pt">$ clang++ -fopenmp -fopenmp-targets=nvptx64
tmp.cpp</span><br><span style="font-size:12pt">tmp.cpp:1:22: error: Target architecture
does not support unified addressing</span><br><span style="font-size:12pt">#pragma omp requires unified_shared_memory</span><br><span style="font-size:12pt">
^</span><br><span style="font-size:12pt">1 error generated.</span><br><br><span style="font-size:12pt">The Clang is locally and natively built
with the appropriate capability, so </span><br><span style="font-size:12pt">what does this mean?</span><br><br><br><tt><span style="font-size:12pt">_______________________________________________<br>Openmp-dev mailing list<br></span></tt><a href="mailto:Openmp-dev@lists.llvm.org" target="_blank"><tt><span style="font-size:12pt;color:blue"><u>Openmp-dev@lists.llvm.org</u></span></tt></a><tt><span style="font-size:12pt"><br></span></tt><a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev" target="_blank"><tt><span style="font-size:12pt;color:blue"><u>https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev</u></span></tt></a><tt><span style="font-size:12pt"><br></span></tt><tt><span style="font-size:10pt">_______________________________________________<br>Openmp-dev mailing list<br><a href="mailto:Openmp-dev@lists.llvm.org" target="_blank">Openmp-dev@lists.llvm.org</a><br></span></tt><a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev" target="_blank"><tt><span style="font-size:10pt">https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev</span></tt></a><tt><span style="font-size:10pt"><br></span></tt><br><br><br>
</blockquote></div>
</blockquote></div>
</blockquote></div>
</blockquote></div>