[Openmp-dev] Target architecture does not support unified addressing
Itaru Kitayama via Openmp-dev
openmp-dev at lists.llvm.org
Fri May 1 20:55:47 PDT 2020
deviceQuery returns:
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "Tesla P100-SXM2-16GB"
CUDA Driver Version / Runtime Version 10.1 / 8.0
CUDA Capability Major/Minor version number: 6.0
Total amount of global memory: 16281 MBytes (17071734784
bytes)
(56) Multiprocessors, ( 64) CUDA Cores/MP: 3584 CUDA Cores
GPU Max Clock rate: 1481 MHz (1.48 GHz)
Memory Clock rate: 715 Mhz
Memory Bus Width: 4096-bit
L2 Cache Size: 4194304 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072,
65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048
layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 5 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Enabled
Device supports Unified Addressing (UVA): Yes
On Sat, May 2, 2020 at 10:31 AM Itaru Kitayama <itaru.kitayama at gmail.com>
wrote:
> Executing shared_update.c on P100 results in errors;
>
> ==130340== NVPROF is profiling process 130340, command: ./a.out
> Libomptarget fatal error 1: failure of target construct while offloading
> is mandatory
> ==130340== Profiling application: ./a.out
> ==130340== Warning: 1 records have invalid timestamps due to insufficient
> device buffer space. You can configure the buffer space using the option
> --device-buffer-size.
> ==130340== Profiling result:
> Type Time(%) Time Calls Avg Min
> Max Name
> GPU activities: 89.68% 40.950us 2 20.475us 18.103us
> 22.847us [CUDA memcpy DtoH]
> 10.32% 4.7100us 1 4.7100us 4.7100us
> 4.7100us [CUDA memcpy HtoD]
> API calls: 69.95% 400.85ms 1 400.85ms 400.85ms
> 400.85ms cuCtxCreate
> 15.17% 86.932ms 1 86.932ms 86.932ms
> 86.932ms cuStreamSynchronize
> 12.11% 69.398ms 1 69.398ms 69.398ms
> 69.398ms cuCtxDestroy
> 2.68% 15.375ms 1 15.375ms 15.375ms
> 15.375ms cuModuleLoadDataEx
> 0.06% 363.13us 32 11.347us 754ns
> 171.53us cuStreamCreate
> 0.01% 48.938us 2 24.469us 19.581us
> 29.357us cuMemcpyDtoH
> 0.00% 22.184us 1 22.184us 22.184us
> 22.184us cuLaunchKernel
> 0.00% 7.6760us 1 7.6760us 7.6760us
> 7.6760us cuMemcpyHtoD
> 0.00% 4.7430us 32 148ns 113ns
> 520ns cuStreamDestroy
> 0.00% 2.9060us 3 968ns 562ns
> 1.5750us cuModuleGetGlobal
> 0.00% 2.8940us 2 1.4470us 336ns
> 2.5580us cuModuleGetFunction
> 0.00% 2.8250us 3 941ns 181ns
> 2.2050us cuDeviceGetCount
> 0.00% 2.6040us 2 1.3020us 965ns
> 1.6390us cuDeviceGet
> 0.00% 2.4200us 5 484ns 137ns
> 882ns cuCtxSetCurrent
> 0.00% 1.6450us 6 274ns 117ns
> 671ns cuDeviceGetAttribute
> 0.00% 804ns 1 804ns 804ns
> 804ns cuFuncGetAttribute
> 0.00% 296ns 1 296ns 296ns
> 296ns cuModuleUnload
> ======== Error: Application returned non-zero code 1
>
> On Sat, May 2, 2020 at 8:24 AM Itaru Kitayama <itaru.kitayama at gmail.com>
> wrote:
>
>> Doru,
>> What's the current way of enabling SM_60 CUDA architecture support for
>> unified addressing?
>> It's been modified since we exchanged the message.
>>
>> On Thu, Nov 7, 2019 at 4:05 AM Gheorghe-Teod Bercea <
>> Gheorghe-Teod.Bercea at ibm.com> wrote:
>>
>>> Hi Itaru,
>>>
>>> We did not test those features on an sm_60 machine like a Pascal GPU so
>>> I can't guarantee it will work. I suggest you enable it locally and see how
>>> it performs.
>>> You only need to make a small change in "void
>>> CGOpenMPRuntimeNVPTX::checkArchForUnifiedAddressing(const OMPRequiresDecl
>>> *D)" to allow sm_60 to be accepted as a valid target.
>>>
>>> Thanks,
>>>
>>> --Doru
>>>
>>>
>>>
>>>
>>> From: Itaru Kitayama via Openmp-dev <openmp-dev at lists.llvm.org>
>>> To: Alexey Bataev <a.bataev at outlook.com>
>>> Cc: openmp-dev <openmp-dev at lists.llvm.org>
>>> Date: 11/05/2019 06:04 PM
>>> Subject: [EXTERNAL] Re: [Openmp-dev] Target architecture does
>>> not support unified addressing
>>> Sent by: "Openmp-dev" <openmp-dev-bounces at lists.llvm.org>
>>> ------------------------------
>>>
>>>
>>>
>>> Can you say briefly as to why SM60, while it is capable of handing
>>> unified addresses, is not supported in Clang?
>>>
>>> On Wed, Nov 6, 2019 at 7:56 AM Alexey Bataev <*a.bataev at outlook.com*
>>> <a.bataev at outlook.com>> wrote:
>>> Yes, it is enforced in clang.
>>>
>>> Best regards,
>>> Alexey Bataev
>>>
>>> 5 нояб. 2019 г., в 17:38, Itaru Kitayama <*itaru.kitayama at gmail.com*
>>> <itaru.kitayama at gmail.com>> написал(а):
>>>
>>>
>>> Thank you, Alexey. Now I am seeing:
>>>
>>> $ clang++ -fopenmp -fopenmp-targets=nvptx64 tmp.cpp
>>> tmp.cpp:1:22: error: Target architecture sm_60 does not support unified
>>> addressing
>>> #pragma omp requires unified_shared_memory
>>> ^
>>> 1 error generated.
>>>
>>> P100 is a SM60 device, but supports unified memory. Is a requirement
>>> sm_70 equals or greater
>>> enforced in Clang?
>>>
>>> On Wed, Nov 6, 2019 at 5:07 AM Alexey Bataev <*a.bataev at outlook.com*
>>> <a.bataev at outlook.com>> wrote:
>>> Most probably, you use default architecture, i.e. sm_35. You need to
>>> build clang with sm_35, sm_70, ... supported archs. Plus, your system must
>>> support unified memory.
>>> I updated error message in the compiler, now it says what target
>>> architecture you use .
>>> -------------
>>> Best regards,
>>> Alexey Bataev
>>> 05.11.2019 3:01 PM, Itaru Kitayama пишет:
>>> I’ve been building trunk Clang locally targeting the P100 device
>>> attached to Host. Should I check the tool chain?
>>>
>>> On Tue, Nov 5, 2019 at 23:47 Alexey Bataev <*a.bataev at outlook.com*
>>> <a.bataev at outlook.com>> wrote:
>>> You're building you code for the architecture that does not support
>>> unified memory, say sm_35. Unified memory only supported for architectures
>>> >= sm_70.
>>> -------------
>>> Best regards,
>>> Alexey Bataev
>>> 05.11.2019 3:16 AM, Itaru Kitayama via Openmp-dev пишет:
>>> Hi,
>>> Using a pragma like below:
>>>
>>> $ cat tmp.cpp
>>> #pragma omp requires unified_shared_memory
>>>
>>> int main() {
>>> }
>>>
>>> produces en error on a POWER8 based system with P100 devices (that
>>> support unified memory).
>>>
>>> $ clang++ -fopenmp -fopenmp-targets=nvptx64 tmp.cpp
>>> tmp.cpp:1:22: error: Target architecture does not support unified
>>> addressing
>>> #pragma omp requires unified_shared_memory
>>> ^
>>> 1 error generated.
>>>
>>> The Clang is locally and natively built with the appropriate capability,
>>> so
>>> what does this mean?
>>>
>>>
>>> _______________________________________________
>>> Openmp-dev mailing list
>>> *Openmp-dev at lists.llvm.org* <Openmp-dev at lists.llvm.org>
>>> *https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev*
>>> <https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev>
>>> _______________________________________________
>>> Openmp-dev mailing list
>>> Openmp-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>>>
>>>
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20200502/e4995207/attachment.html>
More information about the Openmp-dev
mailing list