[PATCH] D98714: [SLP] Add insertelement instructions to vectorizable tree

Alexey Bataev via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Apr 9 11:01:20 PDT 2021


ABataev added inline comments.


================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:2858-2869
+        // Reschedule bundle instructions, sorting them by def-to-use order
+        ScheduleData *FirstInBundle = BS.getScheduleData(FirstInsert);
+        ScheduleData *Current = BS.getScheduleData(LastInsert);
+        ScheduleData *NextInBundle = nullptr;
+        do {
+          Current->FirstInBundle = FirstInBundle;
+          Current->NextInBundle = NextInBundle;
----------------
anton-afanasyev wrote:
> ABataev wrote:
> > anton-afanasyev wrote:
> > > ABataev wrote:
> > > > anton-afanasyev wrote:
> > > > > ABataev wrote:
> > > > > > anton-afanasyev wrote:
> > > > > > > ABataev wrote:
> > > > > > > > This really looks ugly. Why you can't use the existing functionality?
> > > > > > > Hmm, I'm agree that doesn't look elegant. Let me describe the task I'm solving here.
> > > > > > > For the inserts tree node we have two orders, which could be actually different: index (lane) order and def-use order. Both of these orders are used: index order is for correct lane matching, def-use order is for correct vector replacing and extracting.
> > > > > > > There could be the (theoretical, seldom in practice) case when intermediate insert instruction is used out-of-tree:
> > > > > > > ```
> > > > > > > %rv0  = insertelement <4 x i32> undef, i16 %r0 , i32 3
> > > > > > > %rv1  = insertelement <4 x i32> %rv0 , i16 %r1 , i32 1
> > > > > > > %rv2  = insertelement <4 x i32> %rv1 , i16 %r2 , i32 0
> > > > > > > %rv3  = insertelement <4 x i32> %rv2 , i16 %r3 , i32 2
> > > > > > > ...
> > > > > > > %a = shufflevector <4 x i32> %rv2, <4 x i32> undef, ...
> > > > > > > ```
> > > > > > > So we have to use def-use order to extract only 3, 1 and 0 indices here after vectorization.
> > > > > > > 
> > > > > > > But your comment made me think to pass already def-use sorted inserts to `tryToVectorizeList()` and to use `TreeEntry::ReorderIndices` to store index order. I'm to try this approach.
> > > > > > I think in this case you need to vectorize only `%rv0-%rv2` sequence, `%rv3` should remain as an insert
> > > > > No, full-inserted vector could be used as well:
> > > > > ```
> > > > > ...
> > > > > %a = shufflevector <4 x i32> %rv2, <4 x i32> undef, ...
> > > > > %b = shufflevector <4 x i32> %rv3, <4 x i32> undef, ...
> > > > > ```
> > > > > So we vectorize all lanes, and then replace `%rv3` by `VectorizedValue` and replace `%rv2` by `VectorizedValue`, preceded by shuffle of `SK_ExtractVector` kind.
> > > > I would vectorize `%rv0-%rv2` and kept `%rv3` as is, i.e. something like this:
> > > > ```
> > > > %vec = vector<%rv0, %rv1, %rv2>
> > > > %rv3 = insertelement <4 x i32> %vec , i16 %r3 , i32 2
> > > > %a = shufflevector <4 x i32> %vec, <4 x i32> undef, ...
> > > > ```
> > > This was just an simple example, one can imagine more complex case where whole vectorization and further extracting is rather cheap to do, so the decision is eventually made by cost. We use `getShuffleCost()` for extract cost at line 4225.
> > I would start with the simple solution and later investigated possible improvements.
> I had several tests broken without this solution, so had to implement this.
> (for instance, https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll#L127).
Could you provide diffs with the broken test cases?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98714/new/

https://reviews.llvm.org/D98714



More information about the llvm-commits mailing list