[clang] Correctly link and optimize device libraries with -mlink-builtin-bitcode (PR #69371)

Joseph Huber via cfe-commits cfe-commits at lists.llvm.org
Tue Oct 17 14:46:26 PDT 2023


jhuber6 wrote:

> > This approach assumes that whatever the function call was transformed into also exists in the same library, which isn't necessarily true.
> 
> True, good point. But I don't think it's necessarily due to this approach, but more of how AMDGPULibCalls is implemented. It seems like the instruction folding implementations are assuming the definitions of the new functions they're inserting will be linked in at a later point?
>
> Previously we were doing it via a separate _llvm-link_ call to re-link all the device libraries. With this approach, we're doing it as part of the clang optimization pipeline, and would no longer need that extra link step.
> 
> Maybe this approach could be a solution for the time being? With a longer term solution being to think more carefully about the legality of AMDGPULibCalls inserting function calls that may or may not exist? Maybe device library calls are the exception?

This is part of why I'm not very happy with the current approach to just `-mlink-builtin-bitcode` everything and hope it works. We actually had similar issues with the `OpenMPOpt`  pass as it would emit OpenMP runtime calls after the device library was already linked. The solution there was to simply guard emission of function calls to only be possible if we are either executing the optimizations pre-linking, or if the function call already exists in the module. I think this is a much easier fix than making this already quite complicated solution even more complicated. However, this would prevent these optimizations from firing without fixing it in other ways.

Generally this occurs because `-mlink-builtin-bitcode` only links *needed* symbols, so other ones are ignored. I think there's a few hacks in the ROCm device libs to get around this, such as emitting a `printf` symbol that only calls `__printf_alloc` so that it can be called. You could potentially link in all the symbols and internalize them, assuming that a future DCE or global opt pass will trim it down, but that would require some phase ordering to make sure this AMDGPULibCalls pass is run first.

https://github.com/llvm/llvm-project/pull/69371


More information about the cfe-commits mailing list