[llvm] [NVPTX] Convert calls to indirect when call signature mismatches function signature (PR #107644)

Mon Sep 9 14:48:02 PDT 2024

kalxr wrote:

>what's supposed to happen if I decide to do `%call2 = call i128 @callee()` or have ` i8 callee()` ?

I should have included this very relevant part of the [PTX programming guide](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#control-flow-instructions-call):
> The call operands are type-checked against the prototype, and code generation will follow the ABI calling convention. If a function that doesn’t match the prototype is called, the behavior is undefined.

We would generate PTX with undefined behavior. The same is true in some cases when the type sizes match, because an `i64` and a `type <{i64}>` can be lowered to `.b64` and a `.b8 _[8]`. I think the only type mismatches that won't be undefined behavior are two structs of the same size. In practice, when the sizes match it seems to work as expected. There's all sorts of combinations of return types and parameter types and relative sizes that may or may not work.

I was also unable to find much documentation beyond what you mentioned. I think an argument could be made that since the behavior of LLVM IR with these mismatches doesn't seem particularly well defined, the issue is with the input and it is okay to generate code that is undefined as well. I tested a bunch of these cases with other backends (x86, aarch64, riscv64) and they don't seem to do anything special, just follow the calling convention. This patch brings NVPTX closer to that behavior by sort of falling back on a particular calling convention when we know that the more abstract direct call will fail. I don't know that this behavior is ideal, but that's part of the motivation behind the change.

https://github.com/llvm/llvm-project/pull/107644