[PATCH] D47070: [CUDA] Upgrade linked bitcode to enable inlining

Tue May 22 16:09:08 PDT 2018

tra added a comment.

In https://reviews.llvm.org/D47070#1106018, @echristo wrote:

> > As a short-term fix we can disable feature-to-function attribute propagation for NVPTX until we fix it.
> > 
> > @echristo -- any other suggestions?
>
> This is some of what I was talking about when I was mentioning how function attributes and the targets work. Ideally you'll have a compatible set of features and it won't really cause an issue. The idea is that if you're compiling for a minimum ptx feature of X, then any "compatible" set of ptx should be able to inline into your code. I think you do want the features to propagate in general, just specific use cases may not care one way or another - that said, for those use cases you're probably just compiling everything with the same feature anyhow.

The thing is that with NVPTX you can not have incompatible functions in the PTX, period. PTXAS will just throw syntax errors at you. In that regard PTX is very different from intel where in the same binary you can have different functions with code for different x86 variants.  For PTX, sm_50 and sm_60 mean entirely different GPUs with entirely different instruction sets/encoding. PTX version would be an approximation of a different language dialect .  You can not use anything from PTX 4.0 if your file says it's PTX3.0. It's sort of like you can't use c++17 features when you're compiling in c++98 mode. Bottom line is that features and target-cpu do not make  much sense for NVPTX. Everything  we generate in a TU must satisfy minimum PTX version and minimum GPU variant and it all will be compiled for and run on only one specific GPU. There's no mixing and matching.

The question is -- what's the best way to make things work as they were before I broke them?
@Hahnfeld's idea of ignoring features and target-cpu would get us there, but that may be a never-ending source of surprises if/when something else decides to pay attention to those attributes.
I think the best way to tackle that would be to 
a) figure out how to make builtins available/or not on clang side, and
b) make target-cpu and target-features attributes explicitly unsupported on NVPTX as we can not provide the functionality those attributes imply.

> I guess, ultimately, I'm not seeing what the concern here is for how features are working or not working for the target so it's harder to help. What is the problem you're running into, or can you try a different way of explaining it to me? :)

Here's my understanding of what happens: 
We've started adding target-features and target-cpu to everything clang generates. 
We also need to link with libdevice (or IR generated by clang which which has functions w/o those attributes. Or we need to link with IR produced by clang which used different CUDA SDK and thus different PTX version in target-feature.
Due to attribute mismatch we are failing to inline some of the functions and that hurts performance.

Repository:
  rC Clang

https://reviews.llvm.org/D47070