[PATCH] D105384: [NVPTX, CUDA] Add .and.popc variant of the b1 MMA instruction.

Mon Jul 12 12:56:22 PDT 2021

tra added inline comments.

================
Comment at: clang/test/CodeGen/builtins-nvptx-mma.py:35-38
+def make_mma_ops(geoms, types_a, types_b, types_c, types_d, b1ops=None):
   ops = []
+  if b1ops is None:
+    b1ops = [""]
----------------
steffenlarsen wrote:
> 
Default initializers that use objects in python are one of the common gotchas.
https://docs.python-guide.org/writing/gotchas/#mutable-default-arguments

It probably does not make much of a difference in this case as we do not modify it, but I'd prefer to avoid it nevertheless.

================
Comment at: clang/test/CodeGen/builtins-nvptx-mma.py:84
+          # It uses __mma_tf32_m16n16k8_ld_c but __mma_m16n16k8_st_c_f32.
+          make_ldst_ops(["m16n16k8"], ["a", "b", "c", "d"], ["tf32", "f32"]))

----------------
steffenlarsen wrote:
> The following changes would remove the need for the `m16n16k8` cases in `is_ldst_variant_supported`.
This was done deliberately. I did have that in an earlier variant of the patch, but eventually settled on the current version.

My thinking is that `make_ldst_ops(m16n16k8)` would create whatever is necessary for `m16n16k8`, and we'd keep the decision of what to produce in one place in `is_ldst_variant_supported`. Splitting one make_ldst_ops into two here makes this decision implicit which would need additional explanations. Code that needs less explanations  wins. :-)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105384/new/

https://reviews.llvm.org/D105384