[PATCH] D47070: [CUDA] Upgrade linked bitcode to enable inlining
Jonas Hahnfeld via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Thu May 31 23:28:08 PDT 2018
Hahnfeld added a comment.
In https://reviews.llvm.org/D47070#1108803, @tra wrote:
> Here's my understanding of what happens:
> We've started adding target-features and target-cpu to everything clang generates.
> We also need to link with libdevice (or IR generated by clang which which has functions w/o those attributes. Or we need to link with IR produced by clang which used different CUDA SDK and thus different PTX version in target-feature.
> Due to attribute mismatch we are failing to inline some of the functions and that hurts performance.
In the case of OpenMP we are linking runtime function in a bitcode library so that Clang can inline them. This dramatically improves performance, so I'm really interested in making this work again with libraries compiled by older versions of Clang.
Is there a viable path forward? Should I put up a patch that just ignores all `target-features` in LLVM?
Repository:
rC Clang
https://reviews.llvm.org/D47070
More information about the cfe-commits
mailing list