[PATCH] D101460: [SLP]Try to vectorize tiny trees with shuffled gathers of extractelements.
Alexey Bataev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Apr 28 10:05:18 PDT 2021
ABataev added inline comments.
================
Comment at: llvm/test/Transforms/SLPVectorizer/AArch64/accelerate-vector-functions-inseltpoison.ll:43
+; NOACCELERATE-NEXT: [[TMP7:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_3]])
+; NOACCELERATE-NEXT: [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP7]], i32 3
; NOACCELERATE-NEXT: ret <4 x float> [[VECINS_3]]
----------------
RKSimon wrote:
> why do many of these libm vectorizations result in a v2f32 and 2 * f32 scalar calls? I'd expect either 2 x v2f32 or a v4f32.
Cost model. Cost of 4x calls is too high (`Call cost 18 (58-40) for %1 = tail call fast float @llvm.sin.f32(float %vecext`) and the cost of 2x calls is high (`Call cost 6 (26-20) for %1 = tail call fast float @llvm.sin.f32(float %vecext)`), but the cost of the extractelements with indices 1-2 is 5 (they are removed by the vectorizer) + compensate of the costs for inserts.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D101460/new/
https://reviews.llvm.org/D101460
More information about the llvm-commits
mailing list