[PATCH] D101460: [SLP]Try to vectorize tiny trees with shuffled gathers of extractelements.

Wed Apr 28 10:05:18 PDT 2021

ABataev added inline comments.

================
Comment at: llvm/test/Transforms/SLPVectorizer/AArch64/accelerate-vector-functions-inseltpoison.ll:43
+; NOACCELERATE-NEXT:    [[TMP7:%.*]] = tail call fast float @llvm.sin.f32(float [[VECEXT_3]])
+; NOACCELERATE-NEXT:    [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_2]], float [[TMP7]], i32 3
 ; NOACCELERATE-NEXT:    ret <4 x float> [[VECINS_3]]
----------------
RKSimon wrote:
> why do many of these libm vectorizations result in a v2f32 and 2 * f32 scalar calls? I'd expect either 2 x v2f32 or a v4f32.
Cost model. Cost of 4x calls is too high (`Call cost 18 (58-40) for   %1 = tail call fast float @llvm.sin.f32(float %vecext`) and the cost of 2x calls is high (`Call cost 6 (26-20) for   %1 = tail call fast float @llvm.sin.f32(float %vecext)`), but the cost of the extractelements with indices 1-2 is 5 (they are removed by the vectorizer) + compensate of the costs for inserts.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101460/new/

https://reviews.llvm.org/D101460