[llvm] [NVPTX] Improve device function byval parameter lowering (PR #129188)

Artem Belevich via llvm-commits llvm-commits at lists.llvm.org
Fri Feb 28 12:18:12 PST 2025


Artem-B wrote:

Counter-example, where PTX docs allow mov + ld.param (though only for kernel arguments):  https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#kernel-function-parameters

> The address of a kernel parameter may be moved into a register using the mov instruction. The resulting address is in the .param state space and is accessed using ld.param instructions.
```
Example

.entry bar ( .param .b32 len )
{
    .reg .u32 %ptr, %n;

    mov.u32      %ptr, len;
    ld.param.u32 %n, [%ptr];
    ...
```

https://github.com/llvm/llvm-project/pull/129188


More information about the llvm-commits mailing list