[PATCH] D104847: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX 6.5 and 7.0 WMMA and MMA instructions
Artem Belevich via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Jul 2 11:15:33 PDT 2021
tra added inline comments.
================
Comment at: clang/include/clang/Basic/BuiltinsNVPTX.def:727
TARGET_BUILTIN(__bmma_m8n8k128_ld_c, "vi*iC*UiIi", "", AND(SM_75,PTX63))
TARGET_BUILTIN(__bmma_m8n8k128_mma_xor_popc_b1, "vi*iC*iC*iC*Ii", "", AND(SM_75,PTX63))
TARGET_BUILTIN(__bmma_m8n8k128_st_c_i32, "vi*iC*UiIi", "", AND(SM_75,PTX63))
----------------
Bummer. mma.h in CUDA-11.3 still does not compile for Ampere.
We appear to be missing the new `__bmma_m8n8k128_mma_and_popc_b1` builtin for the `.and` variant of 1-bit `mma` introduced in PTX 7.1 and not included in this patch.
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-mma
Do you, by any chance, have upcoming patch for PTX7.1, too. :-)
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D104847/new/
https://reviews.llvm.org/D104847
More information about the llvm-commits
mailing list