[PATCH] D47394: [OpenMP][Clang][NVPTX] Replace bundling with partial linking for the OpenMP NVPTX device offloading toolchain

Tue Jul 31 06:19:17 PDT 2018

gtbercea marked 3 inline comments as done.
gtbercea added a comment.

Answers to comments.

================
Comment at: include/clang/Driver/Compilation.h:312
+  /// \param skipBundler - bool value set once by the driver.
+  void setSkipOffloadBundler(bool skipBundler);
+
----------------
sfantao wrote:
> gtbercea wrote:
> > sfantao wrote:
> > > Why is this a property of the compilation and not of a set of actions referring to a given target? That would allow one to combine in the same compilation targets requiring the bundler and targets that wouldn't. 
> > This was a way to pass this information to the OpenMP NVPTX device toolchain.
> > 
> > Both the Driver OpenMP NVPTX toolchain need to agree on the usage of the new scheme (proposed in this patch) or the old scheme (the one that is in the compiler today).
> > 
> > 
> I understand, but the way I see it is that it is the toolchain that skips the bundler not the compilation. I understand that as of this patch, you skip only if there is a single nvptx target. If you have more than one target, as some tests do, some toolchains will still need the bundler. So, we are making what happens with the nvptx target dependent of other toolchains. Is this an intended effect of this patch?
Bundler is skipped only for the OpenMP NVPTX toolchain. I'm not sure what you mean by "other toolchain".

================
Comment at: lib/Driver/Compilation.cpp:276
+void Compilation::setSkipOffloadBundler(bool skipBundler) {
+  skipOffloadBundler = skipBundler;
+}
----------------
sfantao wrote:
> gtbercea wrote:
> > sfantao wrote:
> > > Given the logic you have below, you are assuming this is not set to false ever. It would be wise to get an assertion here in case you end up having toolchains skipping and others don't. If that is just not supported a diagnostic should be added instead.
> > > 
> > > The convention is that local variables use CamelCase.
> > The checks I added in the Driver will set this flag to true if all toolchains Clang offloads to support the skipping of the bundler/unbundler for object files. Currently only NVPTX toolchain can skip the bundler/unbundler for object files so the code path in this patch will be taken only for:
> > 
> > -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda
> Ok, if that is the case, just add an assertion here.
If one of the toolchains in the list of toolchains can't skip then none of them skip. If all can skip then they all skip. What assertion would you like me to add?

================
Comment at: lib/Driver/Driver.cpp:2943
+    }
+  }
+
----------------
sfantao wrote:
> gtbercea wrote:
> > sfantao wrote:
> > > Can you just implement this check in the definition of `Compilation: canSkipClangOffloadBundler` and get rid of `setSkipOffloadBundler`? All the requirted information is already in `Compilation` under `C.getInputArgs()`.
> > The driver needs to have the result of this check available. The flag is passed to the step which adds host-device dependencies. If the bundler can be skipped then the unbundling action is not required.
> > 
> > I guess this could be implemented in Compilation. Even so I would like it to happen only once like it does here and not every time someone queries the "can I skip the bundler" flag.
> > 
> > I wanted this check to happen only once hence why I put in on the driver side. The result of this check needs to be available in Driver.cpp and in Cuda.cpp files (see usage in this patch). Compilation keeps track of the flag because skipping the bundler is an all or nothing flag: you can skip the bundler/unbundler for object files if and only if all toolchains you are offloading to can skip it.
> > 
> Right, in these circumstances "can skip bundler" is the same as "do I have a single toolchain" and "is that toolchain nvptx". This is fairly inexpensive to do, so I don't really see the need to record this state in the driver. It will also be clearer what are the conditions for which you skip the bundler.
That is true for now but if more toolchains get added to the list of toolchains that can skip the bundler then you want to factor it out and make it happen only once in a toolchain-independent point in the code. Otherwise you will carry that list of toolchains everywhere in the code where you need to do the check.

Also if you are to do this at toolchain level you will not be able to check if the other toolchains were able to skip or not. For now ALL toolchains must skip or ALL toolchains don't skip the bundler.

================
Comment at: lib/Driver/ToolChains/Cuda.cpp:496
+              ? CudaVirtualArchToString(VirtualArchForCudaArch(gpu_arch))
+              : GPUArch.str().c_str();
+      const char *PtxF =
----------------
sfantao wrote:
> Why don't create fatbins instead of cubins in all cases. For the purposes of OpenMP they are equivalent, i.e. nvlink can interpret them in the same way, no?
I'm not sure why the comment is attached to this particular line in the code.

But the reason why I don't use fatbins everywhere is because I want to leave the previous toolchain intact. So when the bundler is not skipped we do precisely what we did before.

Repository:
  rC Clang

https://reviews.llvm.org/D47394