[llvm] [AArch64][NFC] Add test as a representative of scalarizing a vector i… (PR #114107)
Sushant Gokhale via llvm-commits
llvm-commits at lists.llvm.org
Tue Oct 29 10:50:26 PDT 2024
https://github.com/sushgokh created https://github.com/llvm/llvm-project/pull/114107
…nteger division
The last resort to vectorize a bundle of integer divisions is considered scalarizing it. Currently, the cost estimates for scalarizing a vector division can be considerably overestimated as is the scenario with this motivating test case i.e. vector cost should not deviate much from the scalar cost.
Future patch will try to improve the scalarization cost.
>From b1297dd6a92dc1f22d7dd014fe1e8f28ff973e61 Mon Sep 17 00:00:00 2001
From: sgokhale <sgokhale at nvidia.com>
Date: Tue, 29 Oct 2024 23:16:13 +0530
Subject: [PATCH] [AArch64][NFC] Add test as a representative of scalarizing a
vector integer division
The last resort to vectorize a bundle of integer divisions is considered scalarizing it.
Currently, the cost estimates for scalarizing a vector division can be considerably overestimated
as is the scenario with this motivating test case i.e. vector cost should not deviate much from
the scalar cost.
Future patch will try to improve the scalarization cost.
---
.../AArch64/scalarizing-vector-cost.ll | 23 +++++++++++++++++++
1 file changed, 23 insertions(+)
create mode 100644 llvm/test/Transforms/SLPVectorizer/AArch64/scalarizing-vector-cost.ll
diff --git a/llvm/test/Transforms/SLPVectorizer/AArch64/scalarizing-vector-cost.ll b/llvm/test/Transforms/SLPVectorizer/AArch64/scalarizing-vector-cost.ll
new file mode 100644
index 00000000000000..12bbf6e043ca91
--- /dev/null
+++ b/llvm/test/Transforms/SLPVectorizer/AArch64/scalarizing-vector-cost.ll
@@ -0,0 +1,23 @@
+; RUN: opt -mtriple=aarch64 -passes=slp-vectorizer -debug-only=SLP -S -disable-output < %s 2>&1 | FileCheck %s
+
+define <4 x i8> @v4i8(<4 x i8> %a, <4 x i8> %b)
+{
+; CHECK: SLP: Found cost = 18 for VF=4
+ %a0 = extractelement <4 x i8> %a, i64 0
+ %a1 = extractelement <4 x i8> %a, i64 1
+ %a2 = extractelement <4 x i8> %a, i64 2
+ %a3 = extractelement <4 x i8> %a, i64 3
+ %b0 = extractelement <4 x i8> %b, i64 0
+ %b1 = extractelement <4 x i8> %b, i64 1
+ %b2 = extractelement <4 x i8> %b, i64 2
+ %b3 = extractelement <4 x i8> %b, i64 3
+ %1 = sdiv i8 %a0, undef
+ %2 = sdiv i8 %a1, 1
+ %3 = sdiv i8 %a2, 2
+ %4 = sdiv i8 %a3, 4
+ %r0 = insertelement <4 x i8> poison, i8 %1, i32 0
+ %r1 = insertelement <4 x i8> %r0, i8 %2, i32 1
+ %r2 = insertelement <4 x i8> %r1, i8 %3, i32 2
+ %r3 = insertelement <4 x i8> %r2, i8 %4, i32 3
+ ret <4 x i8> %r3
+}
More information about the llvm-commits
mailing list