[Openmp-commits] [openmp] abb174b - [OpenMP] Add example in Libomptarget Information docs

Joseph Huber via Openmp-commits openmp-commits at lists.llvm.org
Thu Jan 7 12:01:04 PST 2021


Author: Joseph Huber
Date: 2021-01-07T15:00:51-05:00
New Revision: abb174bbc100437556fd386d920a9939723e0647

URL: https://github.com/llvm/llvm-project/commit/abb174bbc100437556fd386d920a9939723e0647
DIFF: https://github.com/llvm/llvm-project/commit/abb174bbc100437556fd386d920a9939723e0647.diff

LOG: [OpenMP] Add example in Libomptarget Information docs

Add an example to the OpenMP Documentation on the LIBOMPTARGET_INFO environment variable

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D94246

Added: 
    

Modified: 
    openmp/docs/design/Runtimes.rst

Removed: 
    


################################################################################
diff  --git a/openmp/docs/design/Runtimes.rst b/openmp/docs/design/Runtimes.rst
index c9f3a55c0067..1d52b6b8378c 100644
--- a/openmp/docs/design/Runtimes.rst
+++ b/openmp/docs/design/Runtimes.rst
@@ -98,6 +98,85 @@ Or, to enable every flag run with every bit set.
 
    $ env LIBOMPTARGET_INFO=-1 ./your-application
 
+For example, given a small application implementing the ``ZAXPY`` BLAS routine,
+``Libomptarget`` can provide useful information about data mappings and thread
+usages.
+
+.. code-block:: c++
+
+    #include <complex>
+
+    using complex = std::complex<double>;
+
+    void zaxpy(complex *X, complex *Y, complex D, std::size_t N) {
+    #pragma omp target teams distribute parallel for
+      for (std::size_t i = 0; i < N; ++i)
+        Y[i] = D * X[i] + Y[i];
+    }
+
+    int main() {
+      const std::size_t N = 1024;
+      complex X[N], Y[N], D;
+    #pragma omp target data map(to:X[0 : N]) map(tofrom:Y[0 : N])
+      zaxpy(X, Y, D, N);
+    }
+
+Compiling this code targeting ``nvptx64`` with all information enabled will
+provide the following output from the runtime library.
+
+.. code-block:: console
+
+    $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O3 -gline-tables-only zaxpy.cpp -o zaxpy
+    $ env LIBOMPTARGET_INFO=-1 ./zaxpy
+
+.. code-block:: text
+
+    Info: Device supports up to 65536 CUDA blocks and 1024 threads with a warp size of 32
+    Info: Entering OpenMP data region at zaxpy.cpp:14:1 with 2 arguments:
+    Info: to(X[0:N])[16384] 
+    Info: tofrom(Y[0:N])[16384] 
+    Info: OpenMP Host-Device pointer mappings after block at zaxpy.cpp:14:1:
+    Info: Host Ptr           Target Ptr         Size (B) RefCount Declaration
+    Info: 0x00007fff963f4000 0x00007fd225004000 16384    1        Y[0:N] at zaxpy.cpp:13:17
+    Info: 0x00007fff963f8000 0x00007fd225000000 16384    1        X[0:N] at zaxpy.cpp:13:11
+    Info: Entering OpenMP kernel at zaxpy.cpp:6:1 with 4 arguments:
+    Info: firstprivate(N)[8] (implicit)
+    Info: use_address(Y)[0] (implicit)
+    Info: tofrom(D)[16] (implicit)
+    Info: use_address(X)[0] (implicit)
+    Info: Mapping exists (implicit) with HstPtrBegin=0x00007ffe37d8be80, 
+          TgtPtrBegin=0x00007f90ff004000, Size=0, updated RefCount=2, Name=Y
+    Info: Mapping exists (implicit) with HstPtrBegin=0x00007ffe37d8fe80, 
+          TgtPtrBegin=0x00007f90ff000000, Size=0, updated RefCount=2, Name=X
+    Info: Launching kernel __omp_offloading_fd02_c2c4ac1a__Z5daxpyPNSt3__17complexIdEES2_S1_m_l6
+          with 8 blocks and 128 threads in SPMD mode
+    Info: OpenMP Host-Device pointer mappings after block at zaxpy.cpp:6:1:
+    Info: Host Ptr           Target Ptr         Size (B) RefCount Declaration
+    Info: 0x00007fff963f4000 0x00007fd225004000 16384    1        Y[0:N] at zaxpy.cpp:13:17
+    Info: 0x00007fff963f8000 0x00007fd225000000 16384    1        X[0:N] at zaxpy.cpp:13:11
+    Info: Exiting OpenMP data region at zaxpy.cpp:14:1 with 2 arguments:
+    Info: to(X[0:N])[16384] 
+    Info: tofrom(Y[0:N])[16384] 
+
+From this information, we can see the OpenMP kernel being launched on the CUDA
+device with enough threads and blocks for all ``1024`` iterations of the loop in
+simplified :doc:`SPMD Mode <Offloading>`. The information from the OpenMP data
+region shows the two arrays ``X`` and ``Y`` being copied from the host to the
+device. This creates an entry in the host-device mapping table associating the
+host pointers to the newly created device data. The data mappings in the OpenMP
+device kernel show the default mappings being used for all the variables used
+implicitly on the device. Because ``X`` and ``Y`` are already mapped in the
+device's table, no new entries are created. Additionally, the default mapping
+shows that ``D`` will be copied back from the device once the OpenMP device
+kernel region ends even though it isn't written to. Finally, at the end of the
+OpenMP data region the entries for ``X`` and ``Y`` are removed from the table.
+
+.. toctree::
+   :hidden:
+   :maxdepth: 1
+
+   Offloading
+
 LLVM/OpenMP Target Host Runtime Plugins (``libomptarget.rtl.XXXX``)
 -------------------------------------------------------------------
 


        


More information about the Openmp-commits mailing list