[PATCH] D91590: [NVPTX] Efficently support dynamic index on CUDA kernel aggregate parameters.
Michael Liao via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Mon Nov 16 23:24:04 PST 2020
hliao added a comment.
This's an experimental or demo-only patch in my spare time on eliminating private memory usage in https://godbolt.org/z/EPPn6h. The attachment F14026286: sample.tar.xz <https://reviews.llvm.org/F14026286> includes both the reference and new IR, PTX, and SASS (sm_60) output. For the new code, that aggregate argument is loaded through `LDC` instruction in SASS instead of `MOV` due to the non-static address. I don't have sm_60 to verify that. Could you try that on the real hardware?
BTW, from PTX ISA document, parameter space is read-only for input parameters and write-only for output parameters. If that's right, even non-kernel function may also require a similar change as the semantic is different from the language model, where the argument variable could be modified in the function body.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D91590/new/
https://reviews.llvm.org/D91590
More information about the cfe-commits
mailing list