[PATCH] D123471: [CUDA] Create offloading entries when using the new driver
Yaxun Liu via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Tue Apr 12 14:36:03 PDT 2022
yaxunl added a comment.
HIP is considering a unified device binary embedding scheme with OpenMP. However, some large MI frameworks are compiled with -fno-gpu-rdc. If compiling with -fgpu-rdc, the linking time will significantly increase since the post-linking optimizations take much longer time with the large linked IR. Therefore, it would be desirable if the new OpenMP device binary embedding scheme supports -fno-gpu-rdc mode.
That said, I think this new scheme may work for -fno-gpu-rdc, probably with some minor changes.
For -fno-gpu-rdc, each TU has its own device binary, so the device binaries in the final image would be per GPU and per TU. That seems not a big problem since they can be post-fixed with a unique ID for each TU.
Different offload entries may have the same name in different TU's, therefore an offload entry may not be uniquely identified by its name. To uniquely identify an offload entry, it needs its name and the pointer to its belonging device binary. Therefore, it would be desirable to have one extra field 'owner':
Type struct __tgt_offload_entry {
void *addr; // Pointer to the offload entry info.
// (function or global)
char *name; // Name of the function or global.
size_t size; // Size of the entry info (0 if it a function).
int32_t flags;
void *owner; // pointer to the device binary containing this offload-entry
int32_t reserved;
};
It may be possible to use the `reserved` field for that purpose. However, it is not sure if `reserved` will be used for some other purpose later.
Another choice is to let addr point to a struct which contains owner info. However, that would introduce another level of indirection.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D123471/new/
https://reviews.llvm.org/D123471
More information about the cfe-commits
mailing list