[PATCH] D104847: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX 6.5 and 7.0 WMMA and MMA instructions
Artem Belevich via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Thu Jun 24 11:46:08 PDT 2021
tra accepted this revision.
tra added a comment.
This revision is now accepted and ready to land.
Nice. Thank you for adding support for these missing instructions!
LGTM, modulo a few of cosmetic nits.
================
Comment at: clang/include/clang/Basic/BuiltinsNVPTX.def:762
+// Builtins to support double and alternate float WMMA instructions on sm_80
+TARGET_BUILTIN(__dmma_m8n8k4_ld_a, "vd*dC*UiIi", "", AND(SM_80,PTX70))
----------------
Is this a sufficient set of builtins to compile mma.h in CUDA-11.x?
================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:16411-16430
#define MMA_VARIANTS(geom, type) {{ \
+ Intrinsic::nvvm_wmma_##geom##_mma_row_row_##type, \
+ 0, \
+ Intrinsic::nvvm_wmma_##geom##_mma_row_col_##type, \
+ 0, \
+ Intrinsic::nvvm_wmma_##geom##_mma_col_row_##type, \
+ 0, \
----------------
Nit: satf variants are in the minority. We could move them to the end of the variant list and rely on default initialization to 0. E.g something like this:
```
unsigned getMMAIntrinsic(int Layout, bool Satf) {
unsigned Index = Layout + 4*Satf;
if (Index >= Variants.size())
return 0;
return Variants[Index];
}
#define MMA_VARIANTS(geom, type)
Intrinsic::nvvm_wmma_##geom##_mma_row_row_##type, \
Intrinsic::nvvm_wmma_##geom##_mma_row_col_##type, \
Intrinsic::nvvm_wmma_##geom##_mma_col_row_##type, \
Intrinsic::nvvm_wmma_##geom##_mma_col_col_##type
#define MMA_SATF_VARIANTS(geom, type)
MMA_VARIANTS(geom, type), \
Intrinsic::nvvm_wmma_##geom##_mma_row_row_##type##_satfinite, \
Intrinsic::nvvm_wmma_##geom##_mma_row_col_##type##_satfinite, \
Intrinsic::nvvm_wmma_##geom##_mma_col_row_##type##_satfinite, \
Intrinsic::nvvm_wmma_##geom##_mma_col_col_##type##_satfinite
...
case NVPTX::BI__hmma_m16n16k16_mma_f16f16:
return {8, 8, 4, 4, {{ MMA_SATF_VARIANTS(m16n16k16, f16_f16) }}};
...
case NVPTX::BI__dmma_m8n8k4_mma_f64:
return {1, 1, 2, 2, {{MMA_VARIANTS(m8n8k4, f64)}}};
```
Up to you.
================
Comment at: clang/test/CodeGen/builtins-nvptx-mma.py:111
+
+ # sub-integer require row/col layout.
if op.a.ptx_type in ["s4", "u4", "b1"]:
----------------
typo in the original code: `sub-integers` or `sub-integer types`
================
Comment at: clang/test/CodeGen/builtins-nvptx-mma.py:142-146
+ elif frag.geom == "m16n16k8":
+ if frag.frag == "d":
+ prefix = "__mma"
+ else:
+ prefix = "__mma_tf32"
----------------
It's not obvious why frag `d` is `__mma` and not `__mma_tf32`
Can we use frag.ptx_type to make that decision?
================
Comment at: llvm/include/llvm/IR/IntrinsicsNVVM.td:55
list<LLVMType> regs = !cond(
- // mma.sync.m8n8k4 uses smaller a/b fragments than wmma fp ops
+ // mma uses some smaller fragments than wmma fp ops
!eq(gft,"m8n8k4:a:f16") : !listsplat(llvm_v2f16_ty, 2),
----------------
Nit: I'd drop `some`.
================
Comment at: llvm/include/llvm/IR/IntrinsicsNVVM.td:219
list<WMMA_REGS> id_frags = !cond(
- // int and sub-int ops are identified by input type.
- !eq(A.ptx_elt_type, "s8") : [A],
- !eq(A.ptx_elt_type, "u8") : [A],
- !eq(A.ptx_elt_type, "s4") : [A],
- !eq(A.ptx_elt_type, "u4") : [A],
- !eq(A.ptx_elt_type, "b1") : [A],
- // the rest are FP ops identified by accumulator & result type.
- true: [D, C]
+ // FP16 ops identified by accumulator & result type.
+ !eq(A.ptx_elt_type, "f16") : [D, C],
----------------
Nit: `are identified`
================
Comment at: llvm/include/llvm/IR/IntrinsicsNVVM.td:221
+ !eq(A.ptx_elt_type, "f16") : [D, C],
+ // other ops are identified by input type.
+ !ne(A.ptx_elt_type, B.ptx_elt_type): [A, B],
----------------
Nit: `types` as both A and B are considered.
================
Comment at: llvm/include/llvm/IR/IntrinsicsNVVM.td:4474
+ foreach satf = [0, 1] in {
+ foreach rnd = ["-", "rn", "rz", "rm", "rp"] in {
+ foreach op = NVVM_MMA_OPS.all_wmma_ops in {
----------------
We're often using an empty string to represent a `none`. Comparisons with `-` where we check `rnd` look like we're doing something special there.
I'd use an empty string for `rnd`, too.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D104847/new/
https://reviews.llvm.org/D104847
More information about the cfe-commits
mailing list