[PATCH] D47070: [CUDA] Upgrade linked bitcode to enable inlining

Thu May 31 23:28:08 PDT 2018

Hahnfeld added a comment.

In https://reviews.llvm.org/D47070#1108803, @tra wrote:

> Here's my understanding of what happens: 
>  We've started adding target-features and target-cpu to everything clang generates. 
>  We also need to link with libdevice (or IR generated by clang which which has functions w/o those attributes. Or we need to link with IR produced by clang which used different CUDA SDK and thus different PTX version in target-feature.
>  Due to attribute mismatch we are failing to inline some of the functions and that hurts performance.

In the case of OpenMP we are linking runtime function in a bitcode library so that Clang can inline them. This dramatically improves performance, so I'm really interested in making this work again with libraries compiled by older versions of Clang.

Is there a viable path forward? Should I put up a patch that just ignores all `target-features` in LLVM?

Repository:
  rC Clang

https://reviews.llvm.org/D47070