[PATCH] D98714: [SLP] Add insertelement instructions to vectorizable tree
Alexey Bataev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Apr 9 09:35:13 PDT 2021
ABataev added inline comments.
================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:2841-2844
+ if (!llvm::any_of(VL, [&](Value *S) { return Op == S; })) {
+ assert(!FirstInsert && "FirstInsert already found?");
+ FirstInsert = IE;
+ }
----------------
anton-afanasyev wrote:
> ABataev wrote:
> > The first insert has undef vector operand, no?
> > Plus, this has n^2 complexity
> Not necessary. There could be the case we deal with splitted array of inserts:
> ```
> %rv0 = insertelement <8 x i32> undef, i16 %r0 , i32 0
> %rv1 = insertelement <8 x i32> %rv0 , i16 %r1 , i32 1
> %rv2 = insertelement <8 x i32> %rv1 , i16 %r2 , i32 2
> %rv3 = insertelement <8 x i32> %rv2 , i16 %r3 , i32 3
> %rv4 = insertelement <8 x i32> %rv3 , i16 %r4 , i32 4
> %rv5 = insertelement <8 x i32> %rv4 , i16 %r5 , i32 5
> %rv6 = insertelement <8 x i32> %rv5 , i16 %r6 , i32 6
> %rv7 = insertelement <8 x i32> %rv6 , i16 %r7 , i32 7
> ```
> Since `<8 x i32>` doesn't fit SSE register, it is splitted to two `<4 x i32>`. The first time we process `%rv0-%rv3`, the next time -- `%rv4-%rv7`.
Hmm, looks like currently it can be only the first element.
================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:2858-2869
+ // Reschedule bundle instructions, sorting them by def-to-use order
+ ScheduleData *FirstInBundle = BS.getScheduleData(FirstInsert);
+ ScheduleData *Current = BS.getScheduleData(LastInsert);
+ ScheduleData *NextInBundle = nullptr;
+ do {
+ Current->FirstInBundle = FirstInBundle;
+ Current->NextInBundle = NextInBundle;
----------------
anton-afanasyev wrote:
> ABataev wrote:
> > This really looks ugly. Why you can't use the existing functionality?
> Hmm, I'm agree that doesn't look elegant. Let me describe the task I'm solving here.
> For the inserts tree node we have two orders, which could be actually different: index (lane) order and def-use order. Both of these orders are used: index order is for correct lane matching, def-use order is for correct vector replacing and extracting.
> There could be the (theoretical, seldom in practice) case when intermediate insert instruction is used out-of-tree:
> ```
> %rv0 = insertelement <4 x i32> undef, i16 %r0 , i32 3
> %rv1 = insertelement <4 x i32> %rv0 , i16 %r1 , i32 1
> %rv2 = insertelement <4 x i32> %rv1 , i16 %r2 , i32 0
> %rv3 = insertelement <4 x i32> %rv2 , i16 %r3 , i32 2
> ...
> %a = shufflevector <4 x i32> %rv2, <4 x i32> undef, ...
> ```
> So we have to use def-use order to extract only 3, 1 and 0 indices here after vectorization.
>
> But your comment made me think to pass already def-use sorted inserts to `tryToVectorizeList()` and to use `TreeEntry::ReorderIndices` to store index order. I'm to try this approach.
I think in this case you need to vectorize only `%rv0-%rv2` sequence, `%rv3` should remain as an insert
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D98714/new/
https://reviews.llvm.org/D98714
More information about the llvm-commits
mailing list