[PATCH] D105384: [NVPTX, CUDA] Add .and.popc variant of the b1 MMA instruction.

Tue Jul 13 02:22:13 PDT 2021

steffenlarsen accepted this revision.
steffenlarsen added a comment.
This revision is now accepted and ready to land.

Thank you for addressing my concerns. I am happy with the changes. Great work!

================
Comment at: clang/test/CodeGen/builtins-nvptx-mma.py:35-38
+def make_mma_ops(geoms, types_a, types_b, types_c, types_d, b1ops=None):
   ops = []
+  if b1ops is None:
+    b1ops = [""]
----------------
tra wrote:
> steffenlarsen wrote:
> > 
> Default initializers that use objects in python are one of the common gotchas.
> https://docs.python-guide.org/writing/gotchas/#mutable-default-arguments
> 
> It probably does not make much of a difference in this case as we do not modify it, but I'd prefer to avoid it nevertheless.
Interesting and a little horrifying. I agree it wouldn't have been a problem in this particular case, but I understand your concern and I am fine with keeping it as-is. 👍 

================
Comment at: clang/test/CodeGen/builtins-nvptx-mma.py:84
+          # It uses __mma_tf32_m16n16k8_ld_c but __mma_m16n16k8_st_c_f32.
+          make_ldst_ops(["m16n16k8"], ["a", "b", "c", "d"], ["tf32", "f32"]))

----------------
tra wrote:
> steffenlarsen wrote:
> > The following changes would remove the need for the `m16n16k8` cases in `is_ldst_variant_supported`.
> This was done deliberately. I did have that in an earlier variant of the patch, but eventually settled on the current version.
> 
> My thinking is that `make_ldst_ops(m16n16k8)` would create whatever is necessary for `m16n16k8`, and we'd keep the decision of what to produce in one place in `is_ldst_variant_supported`. Splitting one make_ldst_ops into two here makes this decision implicit which would need additional explanations. Code that needs less explanations  wins. :-)
> 
My concern is that it is more confusing this way because they are the only load/store ops generated where the type is then later filtered. It makes sense for MMA because it needs to filter out select combinations, but here the comment above says one thing and `make_ldst_ops` does something else.

I don't think it makes a huge difference either way, so if you prefer the current version I am okay with keeping it. That said, it might be good to have a comment in the `m16n16k8` case of `is_ldst_variant_supported` making the connection to this.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105384/new/

https://reviews.llvm.org/D105384