[llvm] [AArch64][SVE] Instcombine uzp1/reinterpret svbool to use vector.insert (PR #81069)

Wed Feb 14 05:36:28 PST 2024

================
@@ -0,0 +1,177 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
+; RUN: opt -S -passes=instcombine -mtriple=aarch64 < %s | FileCheck %s
+
+; Code the concatenates two predictes using uzp1 after converting to
+; double length using sve.convert.to/from.svbool is optimized poorly
+; in the backend, resulting in additional `and` instructions to zero
+; the lanes. Test that we get rid of convert to/from and generate a
+; concatenate using vector insert instead.
+
+
+define <vscale x 8 x i1> @reinterpt_uzp1_1(<vscale x 4 x i32> %v0, <vscale x 4 x i32> %v1, <vscale x 4 x i32> %x) {
+; CHECK-LABEL: define <vscale x 8 x i1> @reinterpt_uzp1_1(
+; CHECK-SAME: <vscale x 4 x i32> [[V0:%.*]], <vscale x 4 x i32> [[V1:%.*]], <vscale x 4 x i32> [[X:%.*]]) {
+; CHECK-NEXT:    [[CMP0:%.*]] = icmp ult <vscale x 4 x i32> [[V0]], [[X]]
+; CHECK-NEXT:    [[CMP1:%.*]] = icmp ult <vscale x 4 x i32> [[V1]], [[X]]
+; CHECK-NEXT:    [[TMP1:%.*]] = call <vscale x 8 x i1> @llvm.vector.insert.nxv8i1.nxv4i1(<vscale x 8 x i1> poison, <vscale x 4 x i1> [[CMP0]], i64 0)
+; CHECK-NEXT:    [[UZ1:%.*]] = call <vscale x 8 x i1> @llvm.vector.insert.nxv8i1.nxv4i1(<vscale x 8 x i1> [[TMP1]], <vscale x 4 x i1> [[CMP1]], i64 4)
+; CHECK-NEXT:    ret <vscale x 8 x i1> [[UZ1]]
+;
+  %cmp0 = icmp ult <vscale x 4 x i32> %v0, %x
+  %cmp1 = icmp ult <vscale x 4 x i32> %v1, %x
----------------
paulwalker-arm wrote:

The compares don't look necessary to the testing.  Can their results be passed in as function parameters?

https://github.com/llvm/llvm-project/pull/81069