[PATCH] D105384: [NVPTX, CUDA] Add .and.popc variant of the b1 MMA instruction.
Steffen Larsen via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Jul 12 05:25:36 PDT 2021
steffenlarsen requested changes to this revision.
steffenlarsen added a comment.
This revision now requires changes to proceed.
Good stuff! Thanks for adding this and adjusting the test generator. I have requested some minor changes, though nothing critical. Are the test failures related to these changes?
================
Comment at: clang/test/CodeGen/builtins-nvptx-mma.py:35-38
+def make_mma_ops(geoms, types_a, types_b, types_c, types_d, b1ops=None):
ops = []
+ if b1ops is None:
+ b1ops = [""]
----------------
================
Comment at: clang/test/CodeGen/builtins-nvptx-mma.py:84
+ # It uses __mma_tf32_m16n16k8_ld_c but __mma_m16n16k8_st_c_f32.
+ make_ldst_ops(["m16n16k8"], ["a", "b", "c", "d"], ["tf32", "f32"]))
----------------
The following changes would remove the need for the `m16n16k8` cases in `is_ldst_variant_supported`.
================
Comment at: clang/test/CodeGen/builtins-nvptx-mma.py:265
min_ptx = get_required_ptx(frag)
+ # TF32 uses t32 for loads.
+ if frag.geom == "m16n16k8" and frag.frag =="c":
----------------
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D105384/new/
https://reviews.llvm.org/D105384
More information about the llvm-commits
mailing list