[PATCH] D47394: [OpenMP][Clang][NVPTX] Replace bundling with partial linking for the OpenMP NVPTX device offloading toolchain

Samuel Antao via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue May 29 08:36:54 PDT 2018


sfantao added a comment.

@gtbercea, thanks for the patch.

> INTEROPERABILITY WITH OTHER COMPILERS: These bundled object files can end up being passed between Clang and other compilers which may lead to incompatibilities: passing a bundled file from Clang to another compiler would lead to that compiler not being able to unbundle it. Passing an unbundled object file to Clang and therefore Clang not knowing that it doesn't need to unbundle it.

I don't understand the interoperability issue here. Clang convention to create the offloading records in the host binary is specific to clang and was discussed, defined and documented. We do not expect other compilers to understand it as we don't expect clang to understand that from other compilers. In terms of the binary itself, it is an ELF file that can be understood by other compilers/tools - albeit the linking would probably fail if the device section addresses are not defined.

I think that what you mean here is that by forcing nvcc as the host linker when one of the targets is a NVPTX target, you shift the (un)bundling part to nvlink and you will be compatible with host compilers supported by nvcc.

I understand that at the moment there is no immediate need for combining multiple targets, and there is an immediate need to support archives with offloading code. Therefore, changing the host linker in function of the offload target seems reasonable as that is what would help most users.

I just want to note that the separation of the toolchains in the driver and the support for multiple toolchains in the same binary were part of the reason we converged to the current design. The idea was to have a generic driver with all the details relative to bundling formats handled by a separate tool, the bundler. Of course the requirements can be reviewed at any time and priorities can change. However, I think it would be cleaner to have the nvlink compatible binary generated by the bundler in the current design. Just my two cents.



================
Comment at: include/clang/Driver/Compilation.h:126
+  /// Whether the clang-offload-bundler can be skipped.
+  bool skipOffloadBundler = false;
+
----------------
Use CamelCase for class local variables.


================
Comment at: include/clang/Driver/Compilation.h:312
+  /// \param skipBundler - bool value set once by the driver.
+  void setSkipOffloadBundler(bool skipBundler);
+
----------------
Why is this a property of the compilation and not of a set of actions referring to a given target? That would allow one to combine in the same compilation targets requiring the bundler and targets that wouldn't. 


================
Comment at: lib/Driver/Compilation.cpp:276
+void Compilation::setSkipOffloadBundler(bool skipBundler) {
+  skipOffloadBundler = skipBundler;
+}
----------------
Given the logic you have below, you are assuming this is not set to false ever. It would be wise to get an assertion here in case you end up having toolchains skipping and others don't. If that is just not supported a diagnostic should be added instead.

The convention is that local variables use CamelCase.


================
Comment at: lib/Driver/Driver.cpp:2927
+    if (OpenMPTargets->getValues().size() > 0) {
+      unsigned triplesRequiringBundler = 0;
+      for (const char *Val : OpenMPTargets->getValues()) {
----------------
CamelCase


================
Comment at: lib/Driver/Driver.cpp:2943
+    }
+  }
+
----------------
Can you just implement this check in the definition of `Compilation: canSkipClangOffloadBundler` and get rid of `setSkipOffloadBundler`? All the requirted information is already in `Compilation` under `C.getInputArgs()`.


================
Comment at: lib/Driver/Driver.cpp:2962
+    // is ld instead but this needs to be replaced.
+    canDoPartialLinking = LinkerName.endswith("/ld");
+  }
----------------
In the current implementation there is no distinction between what is meant for Windows/Linux. This check would only work on Linux and the test below would fail for bots running windows.

Also, I think it makes more sense to have this check part of the `Toolchain` instead of doing it in the driver. The `Toolchain` definition knows the names of the third-party executables, the driver doesn't. 


================
Comment at: lib/Driver/ToolChains/Clang.cpp:5504
+  StringRef LinkerName = getToolChain().GetLinkerPath();
+  assert(LinkerName.endswith("/ld") && "Partial linking not supported.");
+
----------------
I believe this check should be done when the toolchain is created with all the required diagnostics. What happens if the linker does not support partial linking?


================
Comment at: lib/Driver/ToolChains/Cuda.cpp:536
+  }
 }
 
----------------
What prevents all this from being done in the bundler? If I understand it correctly, if the bundler implements this wrapping all the checks for librariers wouldn't be required and, only two changes would be required in the driver:

- generate fatbin instead of cubin. This is straightforward to do by changing the device assembling job. In terms of the loading of the kernels by the device API, doing it through fatbin or cubin should be equivalent except that fatbin enables storing the PTX format and JIT for newer GPUs.
- Use NVIDIA linker as host linker.

This last requirement could be problematic if we get two targets attempting  to use different (incompatible linkers). If we get this kind of incompatibility we should get the appropriate diagnostic.


================
Comment at: test/Driver/openmp-offload.c:497
 // RUN:   %clang -###  -fopenmp=libomp -o %t.out -lsomelib -target powerpc64le-linux -fopenmp-targets=powerpc64le-ibm-linux-gnu,x86_64-pc-linux-gnu %t.i -no-canonical-prefixes 2>&1 \
 // RUN:   | FileCheck -check-prefix=CHK-UBJOBS %s
 // RUN:   %clang -### -fopenmp=libomp -o %t.out -lsomelib -target powerpc64le-linux -fopenmp-targets=powerpc64le-ibm-linux-gnu,x86_64-pc-linux-gnu %t.i -save-temps -no-canonical-prefixes 2>&1 \
----------------
We need a test for the static linking. The host linker has to be nvcc in that case, right?


Repository:
  rC Clang

https://reviews.llvm.org/D47394





More information about the cfe-commits mailing list