[PATCH] D128914: [HIP] Add support for handling HIP in the linker wrapper
Yaxun Liu via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Mon Jul 11 13:38:29 PDT 2022
yaxunl added a comment.
In D128914#3643270 <https://reviews.llvm.org/D128914#3643270>, @jhuber6 wrote:
>> There is only one fatbin for -fgpu-rdc mode but the fatbin unregister function is called multiple times in each TU. HIP runtime expects each fatbin is unregistered only once. The old embedding scheme introduced a weak symbol to track whether the fabin has been unregistered and to make sure it is only unregistered once.
>
> I see, this wrapping will only happen in RDC-mode so it's probably safe to ignore here? When I support non-RDC mode in the new driver it will most likely rely on the old code generation. Although it's entirely feasible to make RDC-mode the default. There's no runtime overhead when using LTO.
If you only unregister fatbin once for the whole program, then it should be safe -fgpu-rdc. I am not sure if that is the case.
My experience with -fgpu-rdc is that it causes much longer linking time for large applications like PyTorch or TensroFlow, and LTO does not help. This is because the compiler has lots of inter-procedural optimization passes which take more than linear time. Due to that those apps need to be compiled as -fno-gpu-rdc. Actually most CUDA/HIP applications are using -fno-gpu-rdc.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D128914/new/
https://reviews.llvm.org/D128914
More information about the cfe-commits
mailing list