[PATCH] D47394: [OpenMP][Clang][NVPTX] Replace bundling with partial linking for the OpenMP NVPTX device offloading toolchain

Tue Jul 31 04:19:42 PDT 2018

sfantao added a comment.

Hi Doru,

Thanks for updating the patch. I've a few comments below.

================
Comment at: include/clang/Driver/Compilation.h:312
+  /// \param skipBundler - bool value set once by the driver.
+  void setSkipOffloadBundler(bool skipBundler);
+
----------------
gtbercea wrote:
> sfantao wrote:
> > Why is this a property of the compilation and not of a set of actions referring to a given target? That would allow one to combine in the same compilation targets requiring the bundler and targets that wouldn't. 
> This was a way to pass this information to the OpenMP NVPTX device toolchain.
> 
> Both the Driver OpenMP NVPTX toolchain need to agree on the usage of the new scheme (proposed in this patch) or the old scheme (the one that is in the compiler today).
> 
> 
I understand, but the way I see it is that it is the toolchain that skips the bundler not the compilation. I understand that as of this patch, you skip only if there is a single nvptx target. If you have more than one target, as some tests do, some toolchains will still need the bundler. So, we are making what happens with the nvptx target dependent of other toolchains. Is this an intended effect of this patch?

================
Comment at: lib/Driver/Compilation.cpp:276
+void Compilation::setSkipOffloadBundler(bool skipBundler) {
+  skipOffloadBundler = skipBundler;
+}
----------------
gtbercea wrote:
> sfantao wrote:
> > Given the logic you have below, you are assuming this is not set to false ever. It would be wise to get an assertion here in case you end up having toolchains skipping and others don't. If that is just not supported a diagnostic should be added instead.
> > 
> > The convention is that local variables use CamelCase.
> The checks I added in the Driver will set this flag to true if all toolchains Clang offloads to support the skipping of the bundler/unbundler for object files. Currently only NVPTX toolchain can skip the bundler/unbundler for object files so the code path in this patch will be taken only for:
> 
> -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda
Ok, if that is the case, just add an assertion here.

================
Comment at: lib/Driver/Driver.cpp:2943
+    }
+  }
+
----------------
gtbercea wrote:
> sfantao wrote:
> > Can you just implement this check in the definition of `Compilation: canSkipClangOffloadBundler` and get rid of `setSkipOffloadBundler`? All the requirted information is already in `Compilation` under `C.getInputArgs()`.
> The driver needs to have the result of this check available. The flag is passed to the step which adds host-device dependencies. If the bundler can be skipped then the unbundling action is not required.
> 
> I guess this could be implemented in Compilation. Even so I would like it to happen only once like it does here and not every time someone queries the "can I skip the bundler" flag.
> 
> I wanted this check to happen only once hence why I put in on the driver side. The result of this check needs to be available in Driver.cpp and in Cuda.cpp files (see usage in this patch). Compilation keeps track of the flag because skipping the bundler is an all or nothing flag: you can skip the bundler/unbundler for object files if and only if all toolchains you are offloading to can skip it.
> 
Right, in these circumstances "can skip bundler" is the same as "do I have a single toolchain" and "is that toolchain nvptx". This is fairly inexpensive to do, so I don't really see the need to record this state in the driver. It will also be clearer what are the conditions for which you skip the bundler.

================
Comment at: lib/Driver/ToolChains/Cuda.cpp:496
+              ? CudaVirtualArchToString(VirtualArchForCudaArch(gpu_arch))
+              : GPUArch.str().c_str();
+      const char *PtxF =
----------------
Why don't create fatbins instead of cubins in all cases. For the purposes of OpenMP they are equivalent, i.e. nvlink can interpret them in the same way, no?

================
Comment at: lib/Driver/ToolChains/Cuda.cpp:517
+    //   --image=profile=sm_35 at compute_35,file=ompprint.compute_35.sm_35.cubin
+    //   --embedded-fatbin=ompprint.fatbin.c --cuda --device-c
+    const char *Exec = Args.MakeArgString(TC.GetProgramPath("fatbinary"));
----------------
I'd move this comment to the top of this session so that we know what is going on in the code above.

================
Comment at: lib/Driver/ToolChains/Cuda.cpp:524
+    // the hash of the full path of the file.
+    std::hash<std::string> hash_fn;
+    size_t hash = hash_fn(llvm::sys::path::filename(Output.getFilename()));
----------------
CamelCase

================
Comment at: lib/Driver/ToolChains/Cuda.cpp:664
+      // Anything that's not a file name is potentially a static library
+      // so treat it as such.
+      if (C.canSkipOffloadBundler())
----------------
So, what if it is not a static library?

================
Comment at: test/Driver/openmp-offload-gpu-linux.c:24
+// CHK-PTXAS-CUBIN-BUNDLING: clang++{{.*}}" "-c" "-o" "[[HOSTDEV:.*\.o]]"{{.*}}" "[[FATBINC]]" "-D__NV_MODULE_ID=
+// CHK-PTXAS-CUBIN-BUNDLING-NOT: clang-offload-bundler{{.*}}" "-type=o" {{.*}}"-inputs={{.*}}[[CUBIN]]
+// CHK-PTXAS-CUBIN-BUNDLING: ld" "-r" "[[HOSTDEV]]" "{{.*}}.o" "-o" "{{.*}}.o"
----------------
`clang-offload-bundler` should be sufficient here.

Repository:
  rC Clang

https://reviews.llvm.org/D47394