[PATCH] D47394: [OpenMP][Clang][NVPTX] Replace bundling with partial linking for the OpenMP NVPTX device offloading toolchain

Fri May 25 14:33:52 PDT 2018

gtbercea created this revision.
gtbercea added reviewers: Hahnfeld, hfinkel, caomhin, carlo.bertolli, tra.
Herald added subscribers: cfe-commits, guansong.

So far, the clang-offload-bundler has been the default tool for bundling together various files types produced by the different OpenMP offloading toolchains supported by Clang. It does a great job for file types such as .bc, .ll, .ii, .ast. It is also used for bundling object files. Object files are special, in this case object files which contain sections meant to be executed on devices other than the host (such is the case of the OpenMP NVPTX toolchain). The bundling of object files prevents:

- STATIC LINKING: These bundled object files can be part of static libraries which means that the object file requires an unbundling step. If an object file in a static library requires "unbundling" then we need to know the whereabouts of that library and of the files before the actual link step which makes it impossible to do static linking using the "-L/path/to/lib/folder -labc" flag.
- INTEROPERABILITY WITH OTHER COMPILERS: These bundled object files can end up being passed between Clang and other compilers which may lead to incompatibilities: passing a bundled file from Clang to another compiler would lead to that compiler not being able to unbundle it. Passing an unbundled object file to Clang and therefore Clang not knowing that it doesn't need to unbundle it.

**Goal:**
Disable the use of the clang-offload-bundler for bundling/unbundling object files which contain OpenMP NVPTX device offloaded code. This applies to the case where the following set of flags are passed to Clang:
-fopenmp -fopenmp-targets=nvptx64-nvidia-cuda
When the above condition is not met the compiler works as it does today by invoking the clang-offload-bundler for bundling/unbundling object files (at the cost of static linking and interoperability).
The clang-offload-bundler usage on files other than object files is not affected by this patch.

**Extensibility**
Although this patch disables bundling/unbundling of object files via the clang-offload-bundler for the OpenMP NVPTX device offloading toolchain ONLY, this functionality can be extended to other platforms/system where:

- the device toolchain can produce a host-compatible object AND
- partial linking of host objects is supported.

**The solution:**
The solution enables the OpenMP NVPTX toolchain to produce an object file which is host-compatible (when compiling with -c). The host-compatible file is produced using several steps:
Step 1 (already exists): invoke PTXAS on the .s file to obtain a .cubin.
Step 2 (new step): invoke the FATBIN tool (this tool comes with every standard CUDA installation) which creates a CUDA fatbinary that contains both the PTX code (the .s file) and the .cubin file. This same tool can wrap the resulting .fatbin file in a C/C++ wrapper thus creating a .fatbin.c file.
Step 3 (new step): call clang++ on the .fatbin.c file to create a .o file which is host-compatible.

Once this device side host-compatible file is produced for the NVPTX toolchain then one further step is needed:
Step 4 (new step): invoke a linker supporting partial linking (currently using "ld -r") to link host-compatible object file against the original host file and end up with one single object file which I can now safely pass to another compiler or include in a static library (new step).

**Passing final object file to clang:**
This file doesn't require unbundling so call to "clang-offload-bundler --unbundle" is NOT required.
The compiler needs to be notified that the object file contains an "offloaded device part" by using: "-fopenmp -fopenmp-targets=nvptx64-nvidia-cuda". This will invoke the OpenMP NVPTX toolchain and it will call only NVLINK on this file.

**Passing final object file to clang inside a static lib "libabc.a" passed to clang via: "-L/path/to/lib/folder -labc":**
Call clang with "-fopenmp -fopenmp-targets=nvptx64-nvidia-cuda" to trigger NVPTX toolchain.
The -L path along with the -labc will be passed to NVLINK which will perform the "static linking".

Repository:
  rC Clang

https://reviews.llvm.org/D47394

Files:
  include/clang/Driver/Action.h
  include/clang/Driver/Compilation.h
  include/clang/Driver/Driver.h
  include/clang/Driver/ToolChain.h
  lib/Driver/Action.cpp
  lib/Driver/Compilation.cpp
  lib/Driver/Driver.cpp
  lib/Driver/ToolChain.cpp
  lib/Driver/ToolChains/Clang.cpp
  lib/Driver/ToolChains/Clang.h
  lib/Driver/ToolChains/Cuda.cpp
  test/Driver/openmp-offload-gpu.c
  test/Driver/openmp-offload.c

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D47394.148665.patch
Type: text/x-patch
Size: 30589 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20180525/7bb87bde/attachment-0001.bin>