[PATCH] D105384: [NVPTX, CUDA] Add .and.popc variant of the b1 MMA instruction.
Artem Belevich via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Jul 12 12:56:22 PDT 2021
tra added inline comments.
================
Comment at: clang/test/CodeGen/builtins-nvptx-mma.py:35-38
+def make_mma_ops(geoms, types_a, types_b, types_c, types_d, b1ops=None):
ops = []
+ if b1ops is None:
+ b1ops = [""]
----------------
steffenlarsen wrote:
>
Default initializers that use objects in python are one of the common gotchas.
https://docs.python-guide.org/writing/gotchas/#mutable-default-arguments
It probably does not make much of a difference in this case as we do not modify it, but I'd prefer to avoid it nevertheless.
================
Comment at: clang/test/CodeGen/builtins-nvptx-mma.py:84
+ # It uses __mma_tf32_m16n16k8_ld_c but __mma_m16n16k8_st_c_f32.
+ make_ldst_ops(["m16n16k8"], ["a", "b", "c", "d"], ["tf32", "f32"]))
----------------
steffenlarsen wrote:
> The following changes would remove the need for the `m16n16k8` cases in `is_ldst_variant_supported`.
This was done deliberately. I did have that in an earlier variant of the patch, but eventually settled on the current version.
My thinking is that `make_ldst_ops(m16n16k8)` would create whatever is necessary for `m16n16k8`, and we'd keep the decision of what to produce in one place in `is_ldst_variant_supported`. Splitting one make_ldst_ops into two here makes this decision implicit which would need additional explanations. Code that needs less explanations wins. :-)
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D105384/new/
https://reviews.llvm.org/D105384
More information about the llvm-commits
mailing list