[PATCH] D91590: [NVPTX] Efficently support dynamic index on CUDA kernel aggregate parameters.

Tue Nov 17 10:27:38 PST 2020

tra added a reviewer: jlebar.
tra added a comment.

In D91590#2398842 <https://reviews.llvm.org/D91590#2398842>, @hliao wrote:

> This's an experimental or demo-only patch in my spare time on eliminating private memory usage in https://godbolt.org/z/EPPn6h. The attachment F14026286: sample.tar.xz <https://reviews.llvm.org/F14026286> includes both the reference and new IR, PTX, and SASS (sm_60) output. For the new code, that aggregate argument is loaded through `LDC` instruction in SASS instead of `MOV` due to the non-static address. I don't have sm_60 to verify that. Could you try that on the real hardware?

I'll give it a try.

> BTW, from PTX ISA document, parameter space is read-only for input parameters and write-only for output parameters. If that's right, even non-kernel function may also require a similar change as the semantic is different from the language model, where the argument variable could be modified in the function body.

Regular functions currently handle parameters exactly the same way as kernels - via a copy to a local buffer, which can then be modified.  https://godbolt.org/z/W9PY17
So, if we need to change a parameter, it would have to be done on a local copy.

================
Comment at: clang/test/CodeGenCUDA/kernel-args.cu:13-14
 // AMDGCN: define amdgpu_kernel void @_Z6kernel1A(%struct.A addrspace(4)* byref(%struct.A) align 8 %{{.+}})
-// NVPTX: define void @_Z6kernel1A(%struct.A* byval(%struct.A) align 8 %x)
+// NVPTX: define void @_Z6kernel1A(%struct.A addrspace(101)* byref(%struct.A) align 8 %0)
 __global__ void kernel(A x) {
 }
----------------
Is the idea here to rely on PTX to store the value in param space (so we do actually pass the parameter by value)  and represent it on IR level as a reference to an an externally-provided storage with the value.
So:
- C++ passes argument by value
- IR knows that PTX will store it somewhere in param space and uses `byref`
- we still generate PTX which has parameter passed by value, but now we can access it directly via a reference to param-space value.

Presumably for parameters we do want to modify, we'll need to fall back to having a local copy.

So far so good. However, now we may have a problem distinguishing between C++-level arguments passed by value vs by reference -- they all will look like `byref` on IR level. That is, unless you rely on `addrspace(101)` to indicate that it's actually a `byval` in disguise. 

It looks plausible as long as we can guarantee that we never modify it. Neither in the current function nor in any of the callees, if we pass it by reference. 

I'm not particularly familiar with AA machinery. I'd appreciate if you could elaborate on how you see it all work end-to-end.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91590/new/

https://reviews.llvm.org/D91590