[clang] 15e6206 - [Clang][Docs] Update information on the new driver now that it's default

Joseph Huber via cfe-commits cfe-commits at lists.llvm.org
Mon Apr 18 12:05:29 PDT 2022

Author: Joseph Huber
Date: 2022-04-18T15:05:09-04:00
New Revision: 15e62062c0c919ac1fa28d6f0c9f438063da2286

URL: https://github.com/llvm/llvm-project/commit/15e62062c0c919ac1fa28d6f0c9f438063da2286
DIFF: https://github.com/llvm/llvm-project/commit/15e62062c0c919ac1fa28d6f0c9f438063da2286.diff

LOG: [Clang][Docs] Update information on the new driver now that it's default

This patch updates some of the documentation on the new driver now that
it's the default. Also the ABI for embedding these images changed.




diff  --git a/clang/docs/ClangCommandLineReference.rst b/clang/docs/ClangCommandLineReference.rst
index b94b0473f7c2f..1a0e4805bc263 100644
--- a/clang/docs/ClangCommandLineReference.rst
+++ b/clang/docs/ClangCommandLineReference.rst
@@ -801,7 +801,7 @@ Generate Interface Stub Files, emit merged text not binary.
 Extract API information
-.. option:: -fopenmp-new-driver
+.. option:: -fopenmp-new-driver, fno-openmp-new-driver
 Use the new driver for OpenMP offloading.

diff  --git a/clang/docs/OffloadingDesign.rst b/clang/docs/OffloadingDesign.rst
index 30018098f5378..43a0c0d2c29f7 100644
--- a/clang/docs/OffloadingDesign.rst
+++ b/clang/docs/OffloadingDesign.rst
@@ -17,11 +17,6 @@ application using Clang.
 OpenMP Offloading
-.. note::
-   This documentation describes Clang's behavior using the new offloading
-   driver. This currently must be enabled manually using
-   ``-fopenmp-new-driver``.
 Clang supports OpenMP target offloading to several 
diff erent architectures such
 as NVPTX, AMDGPU, X86_64, Arm, and PowerPC. Offloading code is generated by
 Clang and then executed using the ``libomptarget`` runtime and the associated
@@ -226,15 +221,15 @@ A fat binary is a binary file that contains information intended for another
 device. We create a fat object by embedding the output of the device compilation
 stage into the host as a named section. The output from the device compilation
 is passed to the host backend using the ``-fembed-offload-object`` flag. This
-inserts the object as a global in the host's IR. The section name contains the
-target triple and architecture that the data corresponds to for later use.
-Typically we will also add an extra string to the section name to prevent it
-from being merged with other sections if the user performs relocatable linking
-on the object.
+embeds the device image into the ``.llvm.offloading`` section using a special
+binary format that behaves like a string map. This binary format is used to
+bundle metadata about the image so the linker can associate the proper device
+linking action with the image. Each device image will start with the magic bytes
 .. code-block:: llvm
-  @llvm.embedded.object = private constant [1 x i8] c"\00", section ".llvm.offloading.nvptx64.sm_70."
+  @llvm.embedded.object = private constant [1 x i8] c"\00", section ".llvm.offloading"
 The device code will then be placed in the corresponding section one the backend
 is run on the host, creating a fat object. Using fat objects allows us to treat
@@ -250,7 +245,7 @@ will use this information when :ref:`Device Linking`.
     | omp_offloading_entries           | Offloading entry information (see :ref:`table-tgt_offload_entry`)  |
-    | .llvm.offloading.<triple>.<arch> | Embedded device object file for the target device and architecture |
+    | .llvm.offloading                 | Embedded device object file for the target device and architecture |
 .. _Device Linking:
@@ -262,9 +257,10 @@ Objects containing :ref:`table-offloading_sections` require special handling to
 create an executable device image. This is done using a Clang tool, see
 :doc:`ClangLinkerWrapper` for more information. This tool works as a wrapper
 over the host linking job. It scans the input object files for the offloading
-sections and runs the appropriate device linking action. The linked device image
-is then :ref:`wrapped <Device Binary Wrapping>` to create the symbols used to load the
-device image and link it with the host.
+section ``.llvm.offloading``. The device files stored in this section are then
+extracted and passed tot he appropriate linking job. The linked device image is
+then :ref:`wrapped <Device Binary Wrapping>` to create the symbols used to load
+the device image and link it with the host.
 The linker wrapper tool supports linking bitcode files through link time
 optimization (LTO). This is used whenever the object files embedded in the host
@@ -438,19 +434,22 @@ This code is compiled using the following Clang flags.
     $ clang++ -fopenmp -fopenmp-targets=nvptx64 -O3 zaxpy.cpp -c
-The output section in the object file can be seen using the ``readelf`` utility
+The output section in the object file can be seen using the ``readelf`` utility.
+The ``.llvm.offloading`` section has the ``SHF_EXCLUDE`` flag so it will be
+removed from the final executable or shared library by the linker.
 .. code-block:: text
   $ llvm-readelf -WS zaxpy.o
-  [Nr] Name                                       Type
-  ...
-  [34] omp_offloading_entries                     PROGBITS
-  [35] .llvm.offloading.nvptx64-nvidia-cuda.sm_70 PROGBITS
+  Section Headers:
+  [Nr] Name                   Type     Address          Off    Size   ES Flg Lk Inf Al
+  [11] omp_offloading_entries PROGBITS 0000000000000000 0001f0 000040 00   A  0   0  1
+  [12] .llvm.offloading       PROGBITS 0000000000000000 000260 030950 00   E  0   0  8
 Compiling this file again will invoke the ``clang-linker-wrapper`` utility to
 extract and link the device code stored at the section named
-``.llvm.offloading.nvptx64-nvidia-cuda.sm_70`` and then use entries stored in
+``.llvm.offloading`` and then use entries stored in
 the section named ``omp_offloading_entries`` to create the symbols necessary for
 ``libomptarget`` to register the device image and call the entry function.

diff  --git a/clang/docs/OpenMPSupport.rst b/clang/docs/OpenMPSupport.rst
index 54b31f5371334..1292af07a8e41 100644
--- a/clang/docs/OpenMPSupport.rst
+++ b/clang/docs/OpenMPSupport.rst
@@ -95,9 +95,6 @@ Features not supported or with limited support for Cuda devices
 - Nested parallelism: inner parallel regions are executed sequentially.
-- Static linking of libraries containing device code is not supported without
-  explicitly using ``-fopenmp-new-driver``.
 - Automatic translation of math functions in target regions to device-specific
   math functions is not implemented yet.


More information about the cfe-commits mailing list