[PATCH] D98714: [SLP] Add insertelement instructions to vectorizable tree

Fri Apr 16 03:11:52 PDT 2021

RKSimon added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:3648
+        MinIndex = std::min(MinIndex, Index);
+      }
+
----------------
Use getScalarizationOverhead instead of accumulating insertelement costs? That should avoid the need for the subvector costs below as well.

================
Comment at: llvm/test/Transforms/SLPVectorizer/AArch64/accelerate-vector-functions.ll:229
+; NOACCELERATE-NEXT:    [[TMP6:%.*]] = tail call fast float @expf(float [[VECEXT_3]])
+; NOACCELERATE-NEXT:    [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP6]], i32 3
 ; NOACCELERATE-NEXT:    ret <4 x float> [[VECINS_3]]
----------------
I'm curious why we ended up with a partial vectorization of 2 x expf + llvm.exp.v2f32 here instead of llvm.exp.v4f32

================
Comment at: llvm/test/Transforms/SLPVectorizer/AArch64/accelerate-vector-functions.ll:312
+; NOACCELERATE-NEXT:    [[TMP6:%.*]] = tail call fast float @logf(float [[VECEXT_3]])
+; NOACCELERATE-NEXT:    [[VECINS_3:%.*]] = insertelement <4 x float> [[VECINS_21]], float [[TMP6]], i32 3
 ; NOACCELERATE-NEXT:    ret <4 x float> [[VECINS_3]]
----------------
Why not llvm.log.v4f32?

================
Comment at: llvm/test/Transforms/SLPVectorizer/X86/alternate-int-inseltpoison.ll:146
 define <8 x i32> @ashr_shl_v8i32_const(<8 x i32> %a) {
-; SSE-LABEL: @ashr_shl_v8i32_const(
-; SSE-NEXT:    [[TMP1:%.*]] = shufflevector <8 x i32> [[A:%.*]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
-; SSE-NEXT:    [[TMP2:%.*]] = ashr <4 x i32> [[TMP1]], <i32 2, i32 2, i32 2, i32 2>
-; SSE-NEXT:    [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
-; SSE-NEXT:    [[TMP4:%.*]] = shl <4 x i32> [[TMP3]], <i32 3, i32 3, i32 3, i32 3>
-; SSE-NEXT:    [[R7:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> [[TMP4]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
-; SSE-NEXT:    ret <8 x i32> [[R7]]
-;
 ; AVX1-LABEL: @ashr_shl_v8i32_const(
 ; AVX1-NEXT:    [[TMP1:%.*]] = shufflevector <8 x i32> [[A:%.*]], <8 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
----------------
Where did the SSE checks go? Add back the SSE check prefixes?

================
Comment at: llvm/test/Transforms/SLPVectorizer/X86/sext-inseltpoison.ll:3
+; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -S | FileCheck %s --check-prefixes=SSE
+; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -S | FileCheck %s --check-prefixes=SSE
 ; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -S | FileCheck %s --check-prefixes=AVX
----------------
Add these back - you've lost SSE2/SLM test coverage.

================
Comment at: llvm/test/Transforms/SLPVectorizer/X86/sext.ll:3
+; RUN: opt < %s -mtriple=x86_64-unknown -basic-aa -slp-vectorizer -S | FileCheck %s --check-prefixes=SSE
+; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=slm -basic-aa -slp-vectorizer -S | FileCheck %s --check-prefixes=SSE
 ; RUN: opt < %s -mtriple=x86_64-unknown -mcpu=corei7-avx -basic-aa -slp-vectorizer -S | FileCheck %s --check-prefixes=AVX
----------------
Add these back - you've lost SSE2/SLM test coverage.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98714/new/

https://reviews.llvm.org/D98714