[PATCH] D98714: [SLP] Add insertelement instructions to vectorizable tree

Anton Afanasyev via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Apr 9 09:49:01 PDT 2021


anton-afanasyev marked 3 inline comments as done.
anton-afanasyev added inline comments.


================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:2841-2844
+        if (!llvm::any_of(VL, [&](Value *S) { return Op == S; })) {
+          assert(!FirstInsert && "FirstInsert already found?");
+          FirstInsert = IE;
+        }
----------------
ABataev wrote:
> anton-afanasyev wrote:
> > ABataev wrote:
> > > The first insert has undef vector operand, no?
> > > Plus, this has n^2 complexity
> > Not necessary. There could be the case we deal with splitted array of inserts:
> > ```
> >   %rv0  = insertelement <8 x i32> undef, i16 %r0 , i32 0
> >   %rv1  = insertelement <8 x i32> %rv0 , i16 %r1 , i32 1
> >   %rv2  = insertelement <8 x i32> %rv1 , i16 %r2 , i32 2
> >   %rv3  = insertelement <8 x i32> %rv2 , i16 %r3 , i32 3
> >   %rv4  = insertelement <8 x i32> %rv3 , i16 %r4 , i32 4
> >   %rv5  = insertelement <8 x i32> %rv4 , i16 %r5 , i32 5
> >   %rv6  = insertelement <8 x i32> %rv5 , i16 %r6 , i32 6
> >   %rv7  = insertelement <8 x i32> %rv6 , i16 %r7 , i32 7
> > ```
> > Since `<8 x i32>` doesn't fit SSE register, it is splitted to two `<4 x i32>`. The first time we process `%rv0-%rv3`, the next time -- `%rv4-%rv7`.
> Hmm, looks like currently it can be only the first element.
Why? We process long insert chains slice by slice:
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L6408


================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:2858-2869
+        // Reschedule bundle instructions, sorting them by def-to-use order
+        ScheduleData *FirstInBundle = BS.getScheduleData(FirstInsert);
+        ScheduleData *Current = BS.getScheduleData(LastInsert);
+        ScheduleData *NextInBundle = nullptr;
+        do {
+          Current->FirstInBundle = FirstInBundle;
+          Current->NextInBundle = NextInBundle;
----------------
ABataev wrote:
> anton-afanasyev wrote:
> > ABataev wrote:
> > > This really looks ugly. Why you can't use the existing functionality?
> > Hmm, I'm agree that doesn't look elegant. Let me describe the task I'm solving here.
> > For the inserts tree node we have two orders, which could be actually different: index (lane) order and def-use order. Both of these orders are used: index order is for correct lane matching, def-use order is for correct vector replacing and extracting.
> > There could be the (theoretical, seldom in practice) case when intermediate insert instruction is used out-of-tree:
> > ```
> > %rv0  = insertelement <4 x i32> undef, i16 %r0 , i32 3
> > %rv1  = insertelement <4 x i32> %rv0 , i16 %r1 , i32 1
> > %rv2  = insertelement <4 x i32> %rv1 , i16 %r2 , i32 0
> > %rv3  = insertelement <4 x i32> %rv2 , i16 %r3 , i32 2
> > ...
> > %a = shufflevector <4 x i32> %rv2, <4 x i32> undef, ...
> > ```
> > So we have to use def-use order to extract only 3, 1 and 0 indices here after vectorization.
> > 
> > But your comment made me think to pass already def-use sorted inserts to `tryToVectorizeList()` and to use `TreeEntry::ReorderIndices` to store index order. I'm to try this approach.
> I think in this case you need to vectorize only `%rv0-%rv2` sequence, `%rv3` should remain as an insert
No, full-inserted vector could be used as well:
```
...
%a = shufflevector <4 x i32> %rv2, <4 x i32> undef, ...
%b = shufflevector <4 x i32> %rv3, <4 x i32> undef, ...
```
So we vectorize all lanes, and then replace `%rv3` by `VectorizedValue` and replace `%rv2` by `VectorizedValue`, preceded by shuffle of `SK_ExtractVector` kind.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98714/new/

https://reviews.llvm.org/D98714



More information about the llvm-commits mailing list