[PATCH] D47394: [OpenMP][Clang][NVPTX] Replace bundling with partial linking for the OpenMP NVPTX device offloading toolchain

Mon Jun 11 12:52:21 PDT 2018

gtbercea marked 3 inline comments as done.
gtbercea added inline comments.

================
Comment at: include/clang/Driver/Compilation.h:312
+  /// \param skipBundler - bool value set once by the driver.
+  void setSkipOffloadBundler(bool skipBundler);
+
----------------
sfantao wrote:
> Why is this a property of the compilation and not of a set of actions referring to a given target? That would allow one to combine in the same compilation targets requiring the bundler and targets that wouldn't. 
This was a way to pass this information to the OpenMP NVPTX device toolchain.

Both the Driver OpenMP NVPTX toolchain need to agree on the usage of the new scheme (proposed in this patch) or the old scheme (the one that is in the compiler today).

================
Comment at: lib/Driver/Compilation.cpp:276
+void Compilation::setSkipOffloadBundler(bool skipBundler) {
+  skipOffloadBundler = skipBundler;
+}
----------------
sfantao wrote:
> Given the logic you have below, you are assuming this is not set to false ever. It would be wise to get an assertion here in case you end up having toolchains skipping and others don't. If that is just not supported a diagnostic should be added instead.
> 
> The convention is that local variables use CamelCase.
The checks I added in the Driver will set this flag to true if all toolchains Clang offloads to support the skipping of the bundler/unbundler for object files. Currently only NVPTX toolchain can skip the bundler/unbundler for object files so the code path in this patch will be taken only for:

-fopenmp -fopenmp-targets=nvptx64-nvidia-cuda

================
Comment at: lib/Driver/Driver.cpp:2943
+    }
+  }
+
----------------
sfantao wrote:
> Can you just implement this check in the definition of `Compilation: canSkipClangOffloadBundler` and get rid of `setSkipOffloadBundler`? All the requirted information is already in `Compilation` under `C.getInputArgs()`.
The driver needs to have the result of this check available. The flag is passed to the step which adds host-device dependencies. If the bundler can be skipped then the unbundling action is not required.

I guess this could be implemented in Compilation. Even so I would like it to happen only once like it does here and not every time someone queries the "can I skip the bundler" flag.

I wanted this check to happen only once hence why I put in on the driver side. The result of this check needs to be available in Driver.cpp and in Cuda.cpp files (see usage in this patch). Compilation keeps track of the flag because skipping the bundler is an all or nothing flag: you can skip the bundler/unbundler for object files if and only if all toolchains you are offloading to can skip it.

================
Comment at: lib/Driver/Driver.cpp:2962
+    // is ld instead but this needs to be replaced.
+    canDoPartialLinking = LinkerName.endswith("/ld");
+  }
----------------
sfantao wrote:
> In the current implementation there is no distinction between what is meant for Windows/Linux. This check would only work on Linux and the test below would fail for bots running windows.
> 
> Also, I think it makes more sense to have this check part of the `Toolchain` instead of doing it in the driver. The `Toolchain` definition knows the names of the third-party executables, the driver doesn't. 
Currently this is only meant to work with ld because that we know for sure is a linker which supports partial linking. If other linkers also support it then they can be added here. Not finding ld will lead to the old scheme being used.

You are correct the test needs to be fixed.

================
Comment at: lib/Driver/ToolChains/Clang.cpp:5504
+  StringRef LinkerName = getToolChain().GetLinkerPath();
+  assert(LinkerName.endswith("/ld") && "Partial linking not supported.");
+
----------------
sfantao wrote:
> I believe this check should be done when the toolchain is created with all the required diagnostics. What happens if the linker does not support partial linking?
The linker should always support partial linking at this point, hence the assert. The linker not supporting partial linking is determined at a much earlier step so by this point we should already have that information. In the eventuality that some corner case arises that I may have missed then this assert is triggered. But it would be really unexpected for this assert to be triggered since choosing this action is based on the linker supporting partial linking.

================
Comment at: lib/Driver/ToolChains/Cuda.cpp:536
+  }
 }

----------------
sfantao wrote:
> gtbercea wrote:
> > sfantao wrote:
> > > What prevents all this from being done in the bundler? If I understand it correctly, if the bundler implements this wrapping all the checks for librariers wouldn't be required and, only two changes would be required in the driver:
> > > 
> > > - generate fatbin instead of cubin. This is straightforward to do by changing the device assembling job. In terms of the loading of the kernels by the device API, doing it through fatbin or cubin should be equivalent except that fatbin enables storing the PTX format and JIT for newer GPUs.
> > > - Use NVIDIA linker as host linker.
> > > 
> > > This last requirement could be problematic if we get two targets attempting  to use different (incompatible linkers). If we get this kind of incompatibility we should get the appropriate diagnostic.
> > What prevents it is the fact that the bundler is called AFTER the HOST and DEVICE object files have been produced. The creation of the fatbin (FATBINARY + CALNG++) needs to happen within the NVPTX toolchain.
> > 
> Why does it have to happen in NVPTX toolchain, you are making the NVPTX toolchain generate an ELF object from another toolchain, right? What I'm suggesting is to do the stuff that mixes two (or more) toolchains in the bundler. Your inputs are still a fatbin and a host file.   
I think my latest update to the patch description should clarify this part here.

Repository:
  rC Clang

https://reviews.llvm.org/D47394