[all-commits] [llvm/llvm-project] 313c52: [OpenMP][Tool] Introducing the `llvm-omp-device-in...

Shilei Tian via All-commits all-commits at lists.llvm.org
Tue Jul 27 19:38:51 PDT 2021


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 313c5239959b8f9e5cc182b982c914978f437ae1
      https://github.com/llvm/llvm-project/commit/313c5239959b8f9e5cc182b982c914978f437ae1
  Author: Jose M Monsalve Diaz <jmonsalvediaz at anl.gov>
  Date:   2021-07-27 (Tue, 27 Jul 2021)

  Changed paths:
    M openmp/libomptarget/CMakeLists.txt
    M openmp/libomptarget/include/omptarget.h
    M openmp/libomptarget/plugins/cuda/src/rtl.cpp
    M openmp/libomptarget/src/exports
    M openmp/libomptarget/src/interface.cpp
    M openmp/libomptarget/src/rtl.cpp
    M openmp/libomptarget/src/rtl.h
    A openmp/libomptarget/tools/CMakeLists.txt
    A openmp/libomptarget/tools/deviceinfo/CMakeLists.txt
    A openmp/libomptarget/tools/deviceinfo/llvm-omp-device-info.cpp

  Log Message:
  -----------
  [OpenMP][Tool] Introducing the `llvm-omp-device-info` tool

This patch introduces the `llvm-omp-device-info` tool, which uses the
omptarget library and interface to query the device info from all the
available devices as seen by OpenMP. This is inspired by PGI's `pgaccelinfo`

Since omptarget usually requires a description structure with executable
kernels, I split the initialization of the RTLs and Devices to be able to
initialize all possible devices and query each of them.

This revision relies on the patch that introduces the print device info.

A limitation is that the order in which the devices are initialized, and the
corresponding device ID is not necesarily the one seen by OpenMP.

The changes are as follows:
1. Separate the RTL initialization that was performed in `RegisterLib` to its own `initRTLonce` function
2. Create an `initAllRTLs` method that initializes all available RTLs at runtime
3. Created the `llvm-deviceinfo.cpp` tool that uses `omptarget` to query each device and prints its information.

Example Output:
```
Device (0):
    print_device_info not implemented

Device (1):
    print_device_info not implemented

Device (2):
    print_device_info not implemented

Device (3):
    print_device_info not implemented

Device (4):
    CUDA Driver Version:                11000
    CUDA Device Number:                 0
    Device Name:                        Quadro P1000
    Global Memory Size:                 4236312576 bytes
    Number of Multiprocessors:          5
    Concurrent Copy and Execution:      Yes
    Total Constant Memory:              65536 bytes
    Max Shared Memory per Block:        49152 bytes
    Registers per Block:                65536
    Warp Size:                          32 Threads
    Maximum Threads per Block:          1024
    Maximum Block Dimensions:           1024, 1024, 64
    Maximum Grid Dimensions:            2147483647 x 65535 x 65535
    Maximum Memory Pitch:               2147483647 bytes
    Texture Alignment:                  512 bytes
    Clock Rate:                         1480500 kHz
    Execution Timeout:                  Yes
    Integrated Device:                  No
    Can Map Host Memory:                Yes
    Compute Mode:                       DEFAULT
    Concurrent Kernels:                 Yes
    ECC Enabled:                        No
    Memory Clock Rate:                  2505000 kHz
    Memory Bus Width:                   128 bits
    L2 Cache Size:                      1048576 bytes
    Max Threads Per SMP:                2048
    Async Engines:                      Yes (2)
    Unified Addressing:                 Yes
    Managed Memory:                     Yes
    Concurrent Managed Memory:          Yes
    Preemption Supported:               Yes
    Cooperative Launch:                 Yes
    Multi-Device Boars:                 No
    Compute Capabilities:               61
```

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D106752




More information about the All-commits mailing list