[PATCH] D128914: [HIP] Add support for handling HIP in the linker wrapper

Joseph Huber via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Jul 11 13:48:02 PDT 2022


jhuber6 added a comment.

In D128914#3643451 <https://reviews.llvm.org/D128914#3643451>, @yaxunl wrote:

> If you only unregister fatbin once for the whole program, then it should be safe -fgpu-rdc. I am not sure if that is the case.

it should be here, the generated handle is private to the registration module we created We only make one and it's impossible for anyone else to touch it even if mixing rdc with non-rdc codes.

> My experience with -fgpu-rdc is that it causes much longer linking time for large applications like PyTorch or TensroFlow, and LTO does not help. This is because the compiler has lots of inter-procedural optimization passes which take more than linear time. Due to that those apps need to be compiled as -fno-gpu-rdc. Actually most CUDA/HIP applications are using -fno-gpu-rdc.

Yes, it's actually pretty difficult to find a CUDA application using `fgpu-rdc`. It seems much more common to just stick everything that's needed in the file.I've considered finding a CUDA / HIP benchmark suite and comparing compile times using the new driver stuff. The benefit of having `fgpu-rdc` be the default is that device code basically behaves exactly like host code and LTO makes `fgpu-rdc` behave like `fno-gpu-rdc` performance wise. The downside, as you mentioned, is compile time.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128914/new/

https://reviews.llvm.org/D128914



More information about the cfe-commits mailing list