[llvm] [NVPTX] Improve device function byval parameter lowering (PR #129188)

Fri Feb 28 12:09:29 PST 2025

akshayrdeodhar wrote:

> > Unfortunately the criteria for determining which case is possible are not correct, leading to miscompilations (https://godbolt.org/z/Gq1fP7a3G).
> 
> Can you elaborate on what exactly is incorrect about the example? AFAICT, the code is still valid, even if ptxas itself has to make a copy: https://godbolt.org/z/ffd3d3G6z
> 
> My understanding is that ld.param from the parameter address is still legal and that we assumed that ptxas is smart enough to avoid local copies. Even if it does not, the code is still valid, even if it may be suboptimal.

> Aside from passing structures by value, .param space is also required whenever a formal parameter has its address taken within the called function. In PTX, the address of a function input parameter may be moved into a register using the mov instruction. Note that the parameter will be copied to the stack if necessary, and so the address will be in the .local state space and is accessed via ld.local and st.local instructions. It is not possible to use mov to get the address of or a locally-scoped .param space variable. Starting PTX ISA version 6.0, it is possible to use mov instruction to get address of return parameter of device function.

The [example ](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#device-function-parameters) uses ld.local

https://github.com/llvm/llvm-project/pull/129188