[Openmp-dev] Target architecture does not support unified addressing

Itaru Kitayama via Openmp-dev openmp-dev at lists.llvm.org
Fri May 1 20:55:47 PDT 2020


deviceQuery returns:

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "Tesla P100-SXM2-16GB"
  CUDA Driver Version / Runtime Version          10.1 / 8.0
  CUDA Capability Major/Minor version number:    6.0
  Total amount of global memory:                 16281 MBytes (17071734784
bytes)
  (56) Multiprocessors, ( 64) CUDA Cores/MP:     3584 CUDA Cores
  GPU Max Clock rate:                            1481 MHz (1.48 GHz)
  Memory Clock rate:                             715 Mhz
  Memory Bus Width:                              4096-bit
  L2 Cache Size:                                 4194304 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072,
65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048
layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 5 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Enabled
  Device supports Unified Addressing (UVA):      Yes

On Sat, May 2, 2020 at 10:31 AM Itaru Kitayama <itaru.kitayama at gmail.com>
wrote:

> Executing shared_update.c on P100 results in errors;
>
> ==130340== NVPROF is profiling process 130340, command: ./a.out
> Libomptarget fatal error 1: failure of target construct while offloading
> is mandatory
> ==130340== Profiling application: ./a.out
> ==130340== Warning: 1 records have invalid timestamps due to insufficient
> device buffer space. You can configure the buffer space using the option
> --device-buffer-size.
> ==130340== Profiling result:
>             Type  Time(%)      Time     Calls       Avg       Min
> Max  Name
>  GPU activities:   89.68%  40.950us         2  20.475us  18.103us
>  22.847us  [CUDA memcpy DtoH]
>                    10.32%  4.7100us         1  4.7100us  4.7100us
>  4.7100us  [CUDA memcpy HtoD]
>       API calls:   69.95%  400.85ms         1  400.85ms  400.85ms
>  400.85ms  cuCtxCreate
>                    15.17%  86.932ms         1  86.932ms  86.932ms
>  86.932ms  cuStreamSynchronize
>                    12.11%  69.398ms         1  69.398ms  69.398ms
>  69.398ms  cuCtxDestroy
>                     2.68%  15.375ms         1  15.375ms  15.375ms
>  15.375ms  cuModuleLoadDataEx
>                     0.06%  363.13us        32  11.347us     754ns
>  171.53us  cuStreamCreate
>                     0.01%  48.938us         2  24.469us  19.581us
>  29.357us  cuMemcpyDtoH
>                     0.00%  22.184us         1  22.184us  22.184us
>  22.184us  cuLaunchKernel
>                     0.00%  7.6760us         1  7.6760us  7.6760us
>  7.6760us  cuMemcpyHtoD
>                     0.00%  4.7430us        32     148ns     113ns
> 520ns  cuStreamDestroy
>                     0.00%  2.9060us         3     968ns     562ns
>  1.5750us  cuModuleGetGlobal
>                     0.00%  2.8940us         2  1.4470us     336ns
>  2.5580us  cuModuleGetFunction
>                     0.00%  2.8250us         3     941ns     181ns
>  2.2050us  cuDeviceGetCount
>                     0.00%  2.6040us         2  1.3020us     965ns
>  1.6390us  cuDeviceGet
>                     0.00%  2.4200us         5     484ns     137ns
> 882ns  cuCtxSetCurrent
>                     0.00%  1.6450us         6     274ns     117ns
> 671ns  cuDeviceGetAttribute
>                     0.00%     804ns         1     804ns     804ns
> 804ns  cuFuncGetAttribute
>                     0.00%     296ns         1     296ns     296ns
> 296ns  cuModuleUnload
> ======== Error: Application returned non-zero code 1
>
> On Sat, May 2, 2020 at 8:24 AM Itaru Kitayama <itaru.kitayama at gmail.com>
> wrote:
>
>> Doru,
>> What's the current way of enabling SM_60 CUDA architecture support for
>> unified addressing?
>> It's been modified since we exchanged the message.
>>
>> On Thu, Nov 7, 2019 at 4:05 AM Gheorghe-Teod Bercea <
>> Gheorghe-Teod.Bercea at ibm.com> wrote:
>>
>>> Hi Itaru,
>>>
>>> We did not test those features on an sm_60 machine like a Pascal GPU so
>>> I can't guarantee it will work. I suggest you enable it locally and see how
>>> it performs.
>>> You only need to make a small change in "void
>>> CGOpenMPRuntimeNVPTX::checkArchForUnifiedAddressing(const OMPRequiresDecl
>>> *D)" to allow sm_60 to be accepted as a valid target.
>>>
>>> Thanks,
>>>
>>> --Doru
>>>
>>>
>>>
>>>
>>> From:        Itaru Kitayama via Openmp-dev <openmp-dev at lists.llvm.org>
>>> To:        Alexey Bataev <a.bataev at outlook.com>
>>> Cc:        openmp-dev <openmp-dev at lists.llvm.org>
>>> Date:        11/05/2019 06:04 PM
>>> Subject:        [EXTERNAL] Re: [Openmp-dev] Target architecture does
>>> not support unified        addressing
>>> Sent by:        "Openmp-dev" <openmp-dev-bounces at lists.llvm.org>
>>> ------------------------------
>>>
>>>
>>>
>>> Can you say briefly as to why SM60, while it is capable of handing
>>> unified addresses, is not supported in Clang?
>>>
>>> On Wed, Nov 6, 2019 at 7:56 AM Alexey Bataev <*a.bataev at outlook.com*
>>> <a.bataev at outlook.com>> wrote:
>>> Yes, it is enforced in clang.
>>>
>>> Best regards,
>>> Alexey Bataev
>>>
>>> 5 нояб. 2019 г., в 17:38, Itaru Kitayama <*itaru.kitayama at gmail.com*
>>> <itaru.kitayama at gmail.com>> написал(а):
>>>
>>>
>>> Thank you, Alexey. Now I am seeing:
>>>
>>> $ clang++ -fopenmp -fopenmp-targets=nvptx64 tmp.cpp
>>> tmp.cpp:1:22: error: Target architecture sm_60 does not support unified
>>> addressing
>>> #pragma omp requires unified_shared_memory
>>>                      ^
>>> 1 error generated.
>>>
>>> P100 is a SM60 device, but supports unified memory. Is a requirement
>>> sm_70 equals or greater
>>> enforced in Clang?
>>>
>>> On Wed, Nov 6, 2019 at 5:07 AM Alexey Bataev <*a.bataev at outlook.com*
>>> <a.bataev at outlook.com>> wrote:
>>> Most probably, you use default architecture, i.e. sm_35. You need to
>>> build clang with sm_35, sm_70, ... supported archs. Plus, your system must
>>> support unified memory.
>>> I updated error message in the compiler, now it says what target
>>> architecture you use .
>>> -------------
>>> Best regards,
>>> Alexey Bataev
>>> 05.11.2019 3:01 PM, Itaru Kitayama пишет:
>>> I’ve been building trunk Clang locally targeting the P100 device
>>> attached to Host. Should I check the tool chain?
>>>
>>> On Tue, Nov 5, 2019 at 23:47 Alexey Bataev <*a.bataev at outlook.com*
>>> <a.bataev at outlook.com>> wrote:
>>> You're building you code for the architecture that does not support
>>> unified memory, say sm_35. Unified memory only supported for architectures
>>> >= sm_70.
>>> -------------
>>> Best regards,
>>> Alexey Bataev
>>> 05.11.2019 3:16 AM, Itaru Kitayama via Openmp-dev пишет:
>>> Hi,
>>> Using a pragma like below:
>>>
>>> $ cat tmp.cpp
>>> #pragma omp requires unified_shared_memory
>>>
>>> int main() {
>>> }
>>>
>>> produces en error on a POWER8 based system with P100 devices (that
>>> support unified memory).
>>>
>>> $ clang++ -fopenmp -fopenmp-targets=nvptx64 tmp.cpp
>>> tmp.cpp:1:22: error: Target architecture does not support unified
>>> addressing
>>> #pragma omp requires unified_shared_memory
>>>                      ^
>>> 1 error generated.
>>>
>>> The Clang is locally and natively built with the appropriate capability,
>>> so
>>> what does this mean?
>>>
>>>
>>> _______________________________________________
>>> Openmp-dev mailing list
>>> *Openmp-dev at lists.llvm.org* <Openmp-dev at lists.llvm.org>
>>> *https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev*
>>> <https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev>
>>> _______________________________________________
>>> Openmp-dev mailing list
>>> Openmp-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev
>>>
>>>
>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20200502/e4995207/attachment.html>


More information about the Openmp-dev mailing list