[clang] Correctly link and optimize device libraries with -mlink-builtin-bitcode (PR #69371)

Joseph Huber via cfe-commits cfe-commits at lists.llvm.org
Tue Oct 17 13:41:07 PDT 2023


jhuber6 wrote:

> sincos() is just one example. There are several other cases that can trigger this issue. fold_rootn() generates new function calls for square and cubic roots, fold_pow() does a similar thing for specific powers (ex 2), etc.
> 
> We did try disabling -amdgpu-prelink, and it did lead to a significant performance difference for a couple of key applications
> 
> With this approach we're also hoping to cover future optimizations added that may fall under this category

This approach assumes that whatever the function call was transformed into also exists in the same library, which isn't necessarily true. This is related to a whole host of problems with handling known runtime library calls inside of bitcode. It's possible that we could use some module metadata to indicate when these sorts of transformations are legal within LLVM-IR. I've had similar issues with LTO, where a `sin` call gets turned into `llvm.sin.f64` which then no longer links with the `sin` function implemented in an LTO library. Then when the backend runs it will turn back into `sin` but will not be resolved.

https://github.com/llvm/llvm-project/pull/69371


More information about the cfe-commits mailing list