[PATCH] D91590: [NVPTX] Efficently support dynamic index on CUDA kernel aggregate parameters.

Michael Liao via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Nov 16 23:24:04 PST 2020


hliao added a comment.

This's an experimental or demo-only patch in my spare time on eliminating private memory usage in https://godbolt.org/z/EPPn6h. The attachment F14026286: sample.tar.xz <https://reviews.llvm.org/F14026286> includes both the reference and new IR, PTX, and SASS (sm_60) output. For the new code, that aggregate argument is loaded through `LDC` instruction in SASS instead of `MOV` due to the non-static address. I don't have sm_60 to verify that. Could you try that on the real hardware?

BTW, from PTX ISA document, parameter space is read-only for input parameters and write-only for output parameters. If that's right, even non-kernel function may also require a similar change as the semantic is different from the language model, where the argument variable could be modified in the function body.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91590/new/

https://reviews.llvm.org/D91590



More information about the llvm-commits mailing list