[PATCH] D100394: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX cp.async instructions

Fri Apr 22 00:36:54 PDT 2022

steffenlarsen added a comment.

In D100394#3466316 <https://reviews.llvm.org/D100394#3466316>, @nirvedhmeshram wrote:

> Hello, I was interested in using `llvm.nvvm.cp.async.cg.shared.global.8` and `llvm.nvvm.cp.async.cg.shared.global.4` and was wondering if there is some fundamental reason they were not added here. I only see the ca variants for these.

Hi @nirvedhmeshram! According to the PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async> there is only a 16 variant of `cp.async.cg.shared.global`. That said, they have an example further down using 8 with it, so it seems there's either a problem in the Syntax subsection or the examples. Either way, that is the explanation as to why it was not added with this.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100394/new/

https://reviews.llvm.org/D100394