[PATCH] D98714: [SLP] Add insertelement instructions to vectorizable tree

Fri Apr 9 09:09:29 PDT 2021

anton-afanasyev marked an inline comment as done.
anton-afanasyev added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:2841-2844
+        if (!llvm::any_of(VL, [&](Value *S) { return Op == S; })) {
+          assert(!FirstInsert && "FirstInsert already found?");
+          FirstInsert = IE;
+        }
----------------
ABataev wrote:
> The first insert has undef vector operand, no?
> Plus, this has n^2 complexity
Not necessary. There could be the case we deal with splitted array of inserts:
```
  %rv0  = insertelement <8 x i32> undef, i16 %r0 , i32 0
  %rv1  = insertelement <8 x i32> %rv0 , i16 %r1 , i32 1
  %rv2  = insertelement <8 x i32> %rv1 , i16 %r2 , i32 2
  %rv3  = insertelement <8 x i32> %rv2 , i16 %r3 , i32 3
  %rv4  = insertelement <8 x i32> %rv3 , i16 %r4 , i32 4
  %rv5  = insertelement <8 x i32> %rv4 , i16 %r5 , i32 5
  %rv6  = insertelement <8 x i32> %rv5 , i16 %r6 , i32 6
  %rv7  = insertelement <8 x i32> %rv6 , i16 %r7 , i32 7
```
Since `<8 x i32>` doesn't fit SSE register, it is splitted to two `<4 x i32>`. The first time we process `%rv0-%rv3`, the next time -- `%rv4-%rv7`.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:2858-2869
+        // Reschedule bundle instructions, sorting them by def-to-use order
+        ScheduleData *FirstInBundle = BS.getScheduleData(FirstInsert);
+        ScheduleData *Current = BS.getScheduleData(LastInsert);
+        ScheduleData *NextInBundle = nullptr;
+        do {
+          Current->FirstInBundle = FirstInBundle;
+          Current->NextInBundle = NextInBundle;
----------------
ABataev wrote:
> This really looks ugly. Why you can't use the existing functionality?
Hmm, I'm agree that doesn't look elegant. Let me describe the task I'm solving here.
For the inserts tree node we have two orders, which could be actually different: index (lane) order and def-use order. Both of these orders are used: index order is for correct lane matching, def-use order is for correct vector replacing and extracting.
There could be the (theoretical, seldom in practice) case when intermediate insert instruction is used out-of-tree:
```
%rv0  = insertelement <4 x i32> undef, i16 %r0 , i32 3
%rv1  = insertelement <4 x i32> %rv0 , i16 %r1 , i32 1
%rv2  = insertelement <4 x i32> %rv1 , i16 %r2 , i32 0
%rv3  = insertelement <4 x i32> %rv2 , i16 %r3 , i32 2
...
%a = shufflevector <4 x i32> %rv2, <4 x i32> undef, ...
```
So we have to use def-use order to extract only 3, 1 and 0 indices here after vectorization.

But your comment made me think to pass already def-use sorted inserts to `tryToVectorizeList()` and to use `TreeEntry::ReorderIndices` to store index order. I'm to try this approach.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98714/new/

https://reviews.llvm.org/D98714