[llvm] [NVPTX] Improve device function byval parameter lowering (PR #129188)

Fri Feb 28 12:50:28 PST 2025

AlexMaclean wrote:

> Counter-example, where PTX docs allow mov + ld.param (though only for kernel arguments): https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#kernel-function-parameters
> 
> > The address of a kernel parameter may be moved into a register using the mov instruction. The resulting address is in the .param state space and is accessed using ld.param instructions.
> 
> ```
> Example
> 
> .entry bar ( .param .b32 len )
> {
>     .reg .u32 %ptr, %n;
> 
>     mov.u32      %ptr, len;
>     ld.param.u32 %n, [%ptr];
>     ...
> ```
> 
> And indeed, mov+ld.param for kernels does work w/o generating a local copy: https://godbolt.org/z/7E4Tf5zrE

While they can look similar in simple cases, kernel and device parameters have very different behavior and semantics in PTX. A `mov` of a kernel parameter is just a `mov`, but with a device parameter it carries lots of implications. This change is only intended to address problems with device parameters and will leave kernel parameter handling unchanged.

https://github.com/llvm/llvm-project/pull/129188