[llvm-dev] NVPTX i8 surface intrinsics/instructions are actually i16?

Sun Oct 10 17:31:18 PDT 2021

Hi all

I’ve been looking into adding support for NVPTX’s texture and surface intrinsics for our frontend.  Running our builtins generator revealed that the intrinsics corresponding to 8-bit integer surface instructions, e.g. "llvm.nvvm.suld.3d.i8.zero”, return a 16-bit integer whereas the rest of the intrinsics in the overload set i.e. the “llvm.nvvm.suld.3d.{i16,i32,i64}.zero” all return the corresponding type. Looking at llvm/include/IR/IntrinsicsNNVM.td confirms this, as does llvm/lib/Target/NVPTX/NVPTXIntrinsics.td

My question: is this intentional? https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#surface-instructions-suld seems to suggest that the corresponding assembly does support 8-bit operations and that they should return data of the "size of the data transfer matches the size of destination operand d” which seems to me like it should be i8 for an i8 instruction