[llvm] [NVPTX] Lower stmatrix intrinsics to PTX (PR #148561)

Thu Jul 17 11:59:00 PDT 2025

================
@@ -4597,7 +4597,14 @@ class WMMA_REGINFO<WMMA_REGS r, string op>
 
     !and(!eq(op, "ldmatrix"),
          !eq(ptx_elt_type, "b8x16.b4x16_p64"),
-         !eq(geom, "m8n16")) : [hasSM<100>, hasArchAccelFeatures, hasPTX<86>]);
+         !eq(geom, "m8n16")) : [hasSM<100>, hasArchAccelFeatures, hasPTX<86>],
+
+    !and(!eq(op, "stmatrix"),!eq(ptx_elt_type, "b16"),
+         !eq(geom, "m8n8")) : [hasSM<90>, hasPTX<78>],
+
+    !and(!eq(op, "stmatrix"),
+         !eq(ptx_elt_type, "b8"),
+         !eq(geom, "m16n8")) : [hasSM<100>, hasArchAccelFeatures, hasPTX<86>]);
----------------
Artem-B wrote:

Some of these instructions are also supported on `sm_{100,101,120}f` in ptx 8.8. 

We will need to figure out a convenient way to express that, eventually, but enabling them for `a` variants only is fine for now.

https://github.com/llvm/llvm-project/pull/148561