[Openmp-commits] [PATCH] D148178: [OpenMP][libomptarget] Improve plugin device info printing

Kevin Sala via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Thu Apr 20 01:41:24 PDT 2023

kevinsala added a comment.

Examples of `llvm-omp-device-info` in AMDGPU and NVIDIA devices:

  Device (7):
      CUDA Driver Version:              10020
      CUDA OpenMP Device Number:        3
      Device Name:                      Tesla V100-SXM2-16GB
      Global Memory Size:               16911433728 bytes
      Number of Multiprocessors:        80
      Concurrent Copy and Execution:    Yes
      Total Constant Memory:            65536 bytes
      Max Shared Memory per Block:      49152 bytes
      Registers per Block:              65536
      Warp Size:                        32
      Maximum Threads per Block:        1024
      Maximum Block Dimensions:         
          x:                            1024
          y:                            1024
          z:                            64
      Maximum Grid Dimensions:          
          x:                            2147483647
          y:                            65535
          z:                            65535
      Maximum Memory Pitch:             2147483647 bytes
      Texture Alignment:                512 bytes
      Clock Rate:                       1530000 kHz
      Execution Timeout:                No
      Integrated Device:                No
      Can Map Host Memory:              Yes
      Compute Mode:                     Default
      Concurrent Kernels:               Yes
      ECC Enabled:                      Yes
      Memory Clock Rate:                877000 kHz
      Memory Bus Width:                 4096 bits
      L2 Cache Size:                    6291456 bytes
      Max Threads Per SMP:              2048
      Async Engines:                    4
      Unified Addressing:               Yes
      Managed Memory:                   Yes
      Concurrent Managed Memory:        Yes
      Preemption Supported:             Yes
      Cooperative Launch:               Yes
      Multi-Device Boars:               No
      Compute Capabilities:             sm_70

  Device (5):
      HSA Runtime Version:                 1.1
      HSA OpenMP Device Number:            1
      Product Name:                        
      Device Name:                         gfx906
      Vendor Name:                         AMD
      Device Type:                         GPU
      Max Queues:                          128
      Queue Min Size:                      64
      Queue Max Size:                      131072
          L0:                              16384
          L1:                              8388608
      Cacheline Size:                      64
      Max Clock Freq:                      1725 MHz
      Compute Units:                       60
      SIMD per CU:                         4
      Fast F16 Operation:                  Yes
      Wavefront Size:                      64
      Workgroup Max Size:                  1024
      Workgroup Max Size per Dimension:    
          x:                               1024
          y:                               1024
          z:                               1024
      Max Waves Per CU:                    40
      Max Work-item Per CU:                2560
      Grid Max Size:                       4294967295
      Grid Max Size per Dimension:         
          x:                               4294967295
          y:                               4294967295
          z:                               4294967295
      Max fbarriers/Workgrp:               32
      Memory Pools:                        
          Pool Global:                     
              Flags:                       Coarse Grained 
              Size:                        34342961152 bytes
              Allocatable:                 Yes
              Runtime Alloc Granule:       4096 bytes
              Runtime Alloc Alignment:     4096 bytes
              Accessable by all:           No
          Pool Group:                      
              Size:                        65536 bytes
              Allocatable:                 No
              Runtime Alloc Granule:       0 bytes
              Runtime Alloc Alignment:     0 bytes
              Accessable by all:           No
          Name:                            amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-



