[PATCH] D105384: [NVPTX, CUDA] Add .and.popc variant of the b1 MMA instruction.

Mon Jul 12 05:25:36 PDT 2021

steffenlarsen requested changes to this revision.
steffenlarsen added a comment.
This revision now requires changes to proceed.

Good stuff! Thanks for adding this and adjusting the test generator. I have requested some minor changes, though nothing critical. Are the test failures related to these changes?

================
Comment at: clang/test/CodeGen/builtins-nvptx-mma.py:35-38
+def make_mma_ops(geoms, types_a, types_b, types_c, types_d, b1ops=None):
   ops = []
+  if b1ops is None:
+    b1ops = [""]
----------------

================
Comment at: clang/test/CodeGen/builtins-nvptx-mma.py:84
+          # It uses __mma_tf32_m16n16k8_ld_c but __mma_m16n16k8_st_c_f32.
+          make_ldst_ops(["m16n16k8"], ["a", "b", "c", "d"], ["tf32", "f32"]))

----------------
The following changes would remove the need for the `m16n16k8` cases in `is_ldst_variant_supported`.

================
Comment at: clang/test/CodeGen/builtins-nvptx-mma.py:265
     min_ptx = get_required_ptx(frag)
+    # TF32 uses t32 for loads.
+    if frag.geom == "m16n16k8" and frag.frag =="c":
----------------

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105384/new/

https://reviews.llvm.org/D105384