[PATCH] D100394: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX cp.async instructions
Steffen Larsen via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Apr 22 00:36:54 PDT 2022
steffenlarsen added a comment.
In D100394#3466316 <https://reviews.llvm.org/D100394#3466316>, @nirvedhmeshram wrote:
> Hello, I was interested in using `llvm.nvvm.cp.async.cg.shared.global.8` and `llvm.nvvm.cp.async.cg.shared.global.4` and was wondering if there is some fundamental reason they were not added here. I only see the ca variants for these.
Hi @nirvedhmeshram! According to the PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async> there is only a 16 variant of `cp.async.cg.shared.global`. That said, they have an example further down using 8 with it, so it seems there's either a problem in the Syntax subsection or the examples. Either way, that is the explanation as to why it was not added with this.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D100394/new/
https://reviews.llvm.org/D100394
More information about the llvm-commits
mailing list