[llvm] [AMDGPU] select v_sat_pk from two i16 or v2i16 (PR #121124)

Wed Jan 15 01:09:13 PST 2025

================
@@ -315,6 +315,55 @@ def srl_16 : PatFrag<
   (ops node:$src0), (srl_oneuse node:$src0, (i32 16))
 >;
 
+def clamp_s16_u8 : PatFrag<
+  (ops node:$src),
+  (i16 (AMDGPUsmed3 $src, (i16 0), (i16 255)))
+>;
+
+def conc_lo_u8_i16 : PatFrags<
+  (ops node:$src0, node:$src1),
+  [
+    (or
+      (i16 $src0),
+      (shl (i16 $src1), (i16 8))
+    ),
+    (or
+      (and (i16 $src0), (i16 255)),
+      (shl (i16 $src1), (i16 8))
+    )
+  ]
+>;
+
+def clamp_v2i16_u8 : PatFrags<
+  (ops node:$src),
+  [
+    (v2i16 (smax (smin $src, (build_vector (i16 255), (i16 255))), (build_vector (i16 0), (i16 0)))),
+    (v2i16 (smin (smax $src, (build_vector (i16 0), (i16 0))), (build_vector (i16 255), (i16 255))))
+  ]
+>;
+
+def conc_lo_v2i16_i16 : PatFrags<
+  (ops node:$src),
----------------
arsenm wrote:

These cases are stretching what should be done in patterns, and there are too many of them in one patch. Can you keep this to one pattern per patch, it's much harder to review the test coverage.

These are all implementing the same thing, so we should be canonicalizing to this form so you don't have as many variants to deal with. This is also implementing the same patterns as is matched for the truncating stores, which we should be trying to reuse. 

https://github.com/llvm/llvm-project/pull/121124