[PATCH] D123471: [CUDA] Create offloading entries when using the new driver

Tue Apr 12 14:57:20 PDT 2022

jhuber6 added a comment.

In D123471#3446751 <https://reviews.llvm.org/D123471#3446751>, @yaxunl wrote:

> HIP is considering a unified device binary embedding scheme with OpenMP. However, some large MI frameworks are compiled with -fno-gpu-rdc. If compiling with -fgpu-rdc, the linking time will significantly increase since the post-linking optimizations take much longer time with the large linked IR. Therefore, it would be desirable if the new OpenMP device binary embedding scheme supports -fno-gpu-rdc mode.

This work should be very close to that, the new driver allows us to link everything together so OpenMP can call HIP / CUDA functions and vice-versa. I have done some preliminary tests with registering CUDA device variables with OpenMP, the only change required is to store these offloading sections at `omp_offloading_entries` and the OpenMP runtime will pick them up and try to register them. This method allows us to compile HIP / CUDA with OpenMP but since we're going to be registering two different images they'll have unique state. For full interoperability we'd need some way for make either HIP / CUDA or OpenMP "borrow" the other one's registered image so they can share the state.

> That said, I think this new scheme may work for -fno-gpu-rdc, probably with some minor changes.

My understanding is that non-RDC builds do all the registration per-TU. Since that's the case then we should just be able to link them as we do now and they won't emit any device code that needs to be linked. So individual files could specify no-rdc and then they wouldn't be touched by the device linker run later.

> For -fno-gpu-rdc, each TU has its own device binary, so the device binaries in the final image would be per GPU and per TU. That seems not a big problem since they can be post-fixed with a unique ID for each TU.
>
> Different offload entries may have the same name in different TU's, therefore an offload entry may not be uniquely identified by its name. To uniquely identify an offload entry, it needs its name and the pointer to its belonging device binary. Therefore, it would be desirable to have one extra field 'owner':
>
>   Type struct __tgt_offload_entry {
>     void    *addr;      // Pointer to the offload entry info.
>                         // (function or global)
>     char    *name;      // Name of the function or global.
>     size_t  size;       // Size of the entry info (0 if it a function).
>     int32_t flags;
>     void  *owner; // pointer to the device binary containing this offload-entry
>     int32_t reserved;
>   };
>
> It may be possible to use the `reserved` field for that purpose. However, it is not sure if `reserved` will be used for some other purpose later.

For OpenMP we use an `exec_mode` global to control some kernel execution, there's a possibility we'd want to put it in the reserved field instead. We could add more fields to this, but it would break the ABI. We could work around that but it would be some additional complexity.

> Another choice is to let addr point to a struct which contains owner info. However, that would introduce another level of indirection.

Yeah, I think for arbitrary extensions that would be the easiest way without breaking the ABI. We could use the reserved field to indicate if we have some "extension" there.

I think we're working through some similar stuff. I haven't worked much with HIP but I think there would be some benefit to bringing this all under the new driver I've been working on for OpenMP. Let me know if you want to collaborate on something for getting this to work with HIP.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123471/new/

https://reviews.llvm.org/D123471