[all-commits] [llvm/llvm-project] 0035f7: [CUDA] Create offloading entries when using the ne...

Wed May 11 04:30:45 PDT 2022

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 0035f7154c2a80c58aea6c6dfcac548050e4c5e0
      https://github.com/llvm/llvm-project/commit/0035f7154c2a80c58aea6c6dfcac548050e4c5e0
  Author: Joseph Huber <jhuber6 at vols.utk.edu>
  Date:   2022-05-11 (Wed, 11 May 2022)

  Changed paths:
    M clang/include/clang/Basic/LangOptions.def
    M clang/include/clang/Driver/Options.td
    M clang/lib/CodeGen/CGCUDANV.cpp
    M clang/lib/CodeGen/CGCUDARuntime.h
    M clang/lib/Driver/ToolChains/Clang.cpp
    A clang/test/CodeGenCUDA/offloading-entries.cu

  Log Message:
  -----------
  [CUDA] Create offloading entries when using the new driver

The changes made in D123460 generalized the code generation for OpenMP's
offloading entries. We can use the same scheme to register globals for
CUDA code. This patch adds the code generation to create these
offloading entries when compiling using the new offloading driver mode.
The offloading entries are simple structs that contain the information
necessary to register the global. The struct used is as follows:

```
Type struct __tgt_offload_entry {
  void    *addr;      // Pointer to the offload entry info.
                      // (function or global)
  char    *name;      // Name of the function or global.
  size_t  size;       // Size of the entry info (0 if it a function).
  int32_t flags;
  int32_t reserved;
};
```

Currently CUDA handles RDC code generation by deferring the registration
of globals in the current TU to a callback function containing the
modules ID. Later all the module IDs will be used to register all of the
globals at once. Rather than mimic this, offloading entries allow us to
mimic the way OpenMP registers globals. That is, we create a simple
global struct for each device global to be registered. These are placed
at a special section `cuda_offloading_entires`. Because this section is
a valid C-identifier, the linker will profide a `__start` and `__stop`
pointer that we can use to iterate and register all globals at runtime.

the registration requires a flag variable to indicate which registration
function to use. I have assigned the flags somewhat arbitrarily, but
these use the following values.

Kernel: 0
Variable: 0
Managed: 1
Surface: 2
Texture: 3

Depends on D120272

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D123471

  Commit: e7858a9fab8c11a44868ad4e0572c6c7618b219a
      https://github.com/llvm/llvm-project/commit/e7858a9fab8c11a44868ad4e0572c6c7618b219a
  Author: Joseph Huber <jhuber6 at vols.utk.edu>
  Date:   2022-05-11 (Wed, 11 May 2022)

  Changed paths:
    M clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
    M clang/tools/clang-linker-wrapper/OffloadWrapper.cpp
    M clang/tools/clang-linker-wrapper/OffloadWrapper.h
    M llvm/include/llvm/Object/OffloadBinary.h

  Log Message:
  -----------
  [Cuda] Add initial support for wrapping CUDA images in the new driver.

This patch adds the initial support for wrapping CUDA images. This
requires changing some of the logic for how we bundle images. We now
need to copy the image for all kinds that are active for the
architecture. Then we need to run a separate wrapping job if the Kind is
Cuda. For cuda wrapping we need to use the `fatbinary` program from the
CUDA SDK to bundle all the binaries together. This is then passed to a
new function to perfom the actual module code generation that will be
implemented in a later patch.

Depends on D120273 D123471

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D123810

  Commit: f49d576a882da81292b5730af442fa38899af312
      https://github.com/llvm/llvm-project/commit/f49d576a882da81292b5730af442fa38899af312
  Author: Joseph Huber <jhuber6 at vols.utk.edu>
  Date:   2022-05-11 (Wed, 11 May 2022)

  Changed paths:
    M clang/test/Driver/linker-wrapper-image.c
    M clang/test/Driver/linker-wrapper.c
    M clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
    M clang/tools/clang-linker-wrapper/OffloadWrapper.cpp

  Log Message:
  -----------
  [CUDA] Add wrapper code generation for registering CUDA images

This patch adds the necessary code generation to create the wrapper code
that registers all the globals in CUDA. We create the necessary
functions and iterate through the list of
`__start_cuda_offloading_entries` to find which globals must be
registered. This is very similar to the code generation done currently
in Clang for non-rdc builds, but here we are registering a fully linked
fatbinary and finding the globals via the above sections.

With this we should be able to fully support basic RDC / LTO building of CUDA
code.

It's also worth noting that this does not include the necessary PTX to JIT the
image, so to use this support the offloading architecture must match the
system's architecture.

Depends on D123810

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D123812

Compare: https://github.com/llvm/llvm-project/compare/c7ba568f40b2...f49d576a882d