[PATCH] D104847: [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX 6.5 and 7.0 WMMA and MMA instructions

Artem Belevich via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Jun 24 11:46:08 PDT 2021


tra accepted this revision.
tra added a comment.
This revision is now accepted and ready to land.

Nice. Thank you for adding support for these missing instructions!
LGTM, modulo a few of cosmetic nits.



================
Comment at: clang/include/clang/Basic/BuiltinsNVPTX.def:762
 
+// Builtins to support double and alternate float WMMA instructions on sm_80
+TARGET_BUILTIN(__dmma_m8n8k4_ld_a, "vd*dC*UiIi", "", AND(SM_80,PTX70))
----------------
Is this a sufficient set of builtins to compile mma.h in CUDA-11.x?


================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:16411-16430
 #define MMA_VARIANTS(geom, type) {{                                 \
+      Intrinsic::nvvm_wmma_##geom##_mma_row_row_##type,             \
+      0,                                                            \
+      Intrinsic::nvvm_wmma_##geom##_mma_row_col_##type,             \
+      0,                                                            \
+      Intrinsic::nvvm_wmma_##geom##_mma_col_row_##type,             \
+      0,                                                            \
----------------
Nit: satf variants are in the minority. We could  move them to the end of the variant list and rely on default initialization to 0. E.g something like this:

```
unsigned getMMAIntrinsic(int Layout, bool Satf) {
    unsigned Index = Layout + 4*Satf;
    if (Index >= Variants.size())
      return 0;
    return Variants[Index];
  }
#define MMA_VARIANTS(geom, type) 
      Intrinsic::nvvm_wmma_##geom##_mma_row_row_##type, \
      Intrinsic::nvvm_wmma_##geom##_mma_row_col_##type, \
      Intrinsic::nvvm_wmma_##geom##_mma_col_row_##type, \
      Intrinsic::nvvm_wmma_##geom##_mma_col_col_##type 

#define MMA_SATF_VARIANTS(geom, type)
      MMA_VARIANTS(geom, type), \
      Intrinsic::nvvm_wmma_##geom##_mma_row_row_##type##_satfinite, \
      Intrinsic::nvvm_wmma_##geom##_mma_row_col_##type##_satfinite, \
      Intrinsic::nvvm_wmma_##geom##_mma_col_row_##type##_satfinite, \
      Intrinsic::nvvm_wmma_##geom##_mma_col_col_##type##_satfinite 
...
case NVPTX::BI__hmma_m16n16k16_mma_f16f16:
  return {8, 8, 4, 4, {{ MMA_SATF_VARIANTS(m16n16k16, f16_f16) }}};
...
    case NVPTX::BI__dmma_m8n8k4_mma_f64:
      return {1, 1, 2, 2, {{MMA_VARIANTS(m8n8k4, f64)}}};

```

Up to you.


================
Comment at: clang/test/CodeGen/builtins-nvptx-mma.py:111
+
+  # sub-integer require row/col layout.
   if op.a.ptx_type in ["s4", "u4", "b1"]:
----------------
typo in the original code: `sub-integers` or `sub-integer types`


================
Comment at: clang/test/CodeGen/builtins-nvptx-mma.py:142-146
+  elif frag.geom == "m16n16k8":
+    if frag.frag == "d":
+      prefix = "__mma"
+    else:
+      prefix = "__mma_tf32"
----------------
It's not obvious why  frag `d` is `__mma` and not `__mma_tf32` 
Can we use frag.ptx_type to make that decision?


================
Comment at: llvm/include/llvm/IR/IntrinsicsNVVM.td:55
   list<LLVMType> regs = !cond(
-    // mma.sync.m8n8k4 uses smaller a/b fragments than wmma fp ops
+    // mma uses some smaller fragments than wmma fp ops
     !eq(gft,"m8n8k4:a:f16") : !listsplat(llvm_v2f16_ty, 2),
----------------
Nit: I'd drop `some`.


================
Comment at: llvm/include/llvm/IR/IntrinsicsNVVM.td:219
   list<WMMA_REGS> id_frags = !cond(
-     // int and sub-int ops are identified by input type.
-     !eq(A.ptx_elt_type, "s8") : [A],
-     !eq(A.ptx_elt_type, "u8") : [A],
-     !eq(A.ptx_elt_type, "s4") : [A],
-     !eq(A.ptx_elt_type, "u4") : [A],
-     !eq(A.ptx_elt_type, "b1") : [A],
-     // the rest are FP ops identified by accumulator & result type.
-     true: [D, C]
+     // FP16 ops identified by accumulator & result type.
+     !eq(A.ptx_elt_type, "f16") : [D, C],
----------------
Nit: `are identified`


================
Comment at: llvm/include/llvm/IR/IntrinsicsNVVM.td:221
+     !eq(A.ptx_elt_type, "f16") : [D, C],
+     // other ops are identified by input type.
+     !ne(A.ptx_elt_type, B.ptx_elt_type): [A, B],
----------------
Nit: `types` as both A and B are considered.


================
Comment at: llvm/include/llvm/IR/IntrinsicsNVVM.td:4474
+    foreach satf = [0, 1] in {
+      foreach rnd = ["-", "rn", "rz", "rm", "rp"] in {
+        foreach op = NVVM_MMA_OPS.all_wmma_ops in {
----------------
We're often using an empty string to represent a `none`. Comparisons with `-` where we check `rnd` look like we're doing something special there.
I'd use an empty string for `rnd`, too. 



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104847/new/

https://reviews.llvm.org/D104847



More information about the cfe-commits mailing list