[llvm] [NVPTX] Improve device function byval parameter lowering (PR #129188)
Alex MacLean via llvm-commits
llvm-commits at lists.llvm.org
Fri Feb 28 12:50:28 PST 2025
AlexMaclean wrote:
> Counter-example, where PTX docs allow mov + ld.param (though only for kernel arguments): https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#kernel-function-parameters
>
> > The address of a kernel parameter may be moved into a register using the mov instruction. The resulting address is in the .param state space and is accessed using ld.param instructions.
>
> ```
> Example
>
> .entry bar ( .param .b32 len )
> {
> .reg .u32 %ptr, %n;
>
> mov.u32 %ptr, len;
> ld.param.u32 %n, [%ptr];
> ...
> ```
>
> And indeed, mov+ld.param for kernels does work w/o generating a local copy: https://godbolt.org/z/7E4Tf5zrE
While they can look similar in simple cases, kernel and device parameters have very different behavior and semantics in PTX. A `mov` of a kernel parameter is just a `mov`, but with a device parameter it carries lots of implications. This change is only intended to address problems with device parameters and will leave kernel parameter handling unchanged.
https://github.com/llvm/llvm-project/pull/129188
More information about the llvm-commits
mailing list