[PATCH] D123471: [CUDA] Create offloading entries when using the new driver

Tue Apr 12 14:36:03 PDT 2022

yaxunl added a comment.

HIP is considering a unified device binary embedding scheme with OpenMP. However, some large MI frameworks are compiled with -fno-gpu-rdc. If compiling with -fgpu-rdc, the linking time will significantly increase since the post-linking optimizations take much longer time with the large linked IR. Therefore, it would be desirable if the new OpenMP device binary embedding scheme supports -fno-gpu-rdc mode.

That said, I think this new scheme may work for -fno-gpu-rdc, probably with some minor changes.

For -fno-gpu-rdc, each TU has its own device binary, so the device binaries in the final image would be per GPU and per TU. That seems not a big problem since they can be post-fixed with a unique ID for each TU.

Different offload entries may have the same name in different TU's, therefore an offload entry may not be uniquely identified by its name. To uniquely identify an offload entry, it needs its name and the pointer to its belonging device binary. Therefore, it would be desirable to have one extra field 'owner':

  Type struct __tgt_offload_entry {
    void    *addr;      // Pointer to the offload entry info.
                        // (function or global)
    char    *name;      // Name of the function or global.
    size_t  size;       // Size of the entry info (0 if it a function).
    int32_t flags;
    void  *owner; // pointer to the device binary containing this offload-entry
    int32_t reserved;
  };

It may be possible to use the `reserved` field for that purpose. However, it is not sure if `reserved` will be used for some other purpose later.

Another choice is to let addr point to a struct which contains owner info. However, that would introduce another level of indirection.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123471/new/

https://reviews.llvm.org/D123471