[PATCH] D95313: [WIP] Move part of nvptx devicertl under clang

Sun Jan 24 13:51:38 PST 2021

tianshilei1992 added a comment.

In general we're moving to the direction that target specific implementation will be compiled along with user code, which is fantastic. In this way, we only need to provide one bitcode library for one target. The change in FE lacks of some efficiency. If user code has multiple files, target specific header will be included multiple times, thus compiled multiple times. A more efficient way is to change the workflow of the driver, probably in the following way:

1. Compile target implementation `t.bc`
2. Link `t.bc` and `libomptarget-[arch].bc` to `libomptarget.bc`
3. Compile user code, which is also multiple steps. `libomptarget.bc` is fed into FE in this step.
4. Remaining steps...

================
Comment at: clang/lib/Driver/ToolChains/Clang.cpp:1204
+    {
+      auto *CTC = static_cast<const toolchains::CudaToolChain *>(
+          C.getSingleOffloadToolChain<Action::OFK_Cuda>());
----------------
JonChesterfield wrote:
> Logic very like this could pick out a second, small devicertl bitcode library
can we just use one header with different macros, like what we're using now?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95313/new/

https://reviews.llvm.org/D95313