[PATCH] D104847: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX 6.5 and 7.0 WMMA and MMA instructions
Artem Belevich via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Fri Jul 2 15:25:48 PDT 2021
tra added inline comments.
================
Comment at: clang/test/CodeGen/builtins-nvptx-mma.cu:781-786
+ // CHECK_PTX70_SM80: call {{.*}} @llvm.nvvm.wmma.m16n16k8.load.c.col.stride.f32
+ // expected-error-re at +1 {{'__mma_tf32_m16n16k8_ld_c' needs target feature (sm_80{{.*}},(ptx70{{.*}}}}
+ __mma_tf32_m16n16k8_ld_c(fdst, fsrc, ldm, 1);
+ // CHECK_PTX70_SM80: call {{.*}} @llvm.nvvm.wmma.m16n16k8.load.c.row.stride.f32
+ // expected-error-re at +1 {{'__mma_tf32_m16n16k8_ld_c' needs target feature (sm_80{{.*}},(ptx70{{.*}}}}
+ __mma_tf32_m16n16k8_ld_c(fdst, fsrc, ldm, 0);
----------------
This looks rather odd. We're calling a `tf32` builtin, but expect to see and `f32` load intrinsic. Is that expected ?
================
Comment at: clang/test/CodeGen/builtins-nvptx-mma.py:74
+ make_ldst_ops(["m8n8k4"], ["a", "b", "c", "d"], ["f64"]) +
+ make_ldst_ops(["m16n16k8"], ["a", "b"], ["tf32"]) +
+ make_ldst_ops(["m16n16k8"], ["c", "d"], ["f32"]))
----------------
This does not seem to match the generated `builtins-nvptx-mma.cu` which does have `__mma_tf32_m16n16k8_ld_c`
If I regenrate the test I see a somewhat different set of tests, possibly related to the oddity I've pointed in the generated test changes in this patch.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D104847/new/
https://reviews.llvm.org/D104847
More information about the cfe-commits
mailing list