[PATCH] D98714: [SLP] Add insertelement instructions to vectorizable tree
Anton Afanasyev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Apr 12 09:43:45 PDT 2021
anton-afanasyev added inline comments.
================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:2858-2869
+ // Reschedule bundle instructions, sorting them by def-to-use order
+ ScheduleData *FirstInBundle = BS.getScheduleData(FirstInsert);
+ ScheduleData *Current = BS.getScheduleData(LastInsert);
+ ScheduleData *NextInBundle = nullptr;
+ do {
+ Current->FirstInBundle = FirstInBundle;
+ Current->NextInBundle = NextInBundle;
----------------
ABataev wrote:
> anton-afanasyev wrote:
> > ABataev wrote:
> > > anton-afanasyev wrote:
> > > > ABataev wrote:
> > > > > anton-afanasyev wrote:
> > > > > > ABataev wrote:
> > > > > > > anton-afanasyev wrote:
> > > > > > > > ABataev wrote:
> > > > > > > > > This really looks ugly. Why you can't use the existing functionality?
> > > > > > > > Hmm, I'm agree that doesn't look elegant. Let me describe the task I'm solving here.
> > > > > > > > For the inserts tree node we have two orders, which could be actually different: index (lane) order and def-use order. Both of these orders are used: index order is for correct lane matching, def-use order is for correct vector replacing and extracting.
> > > > > > > > There could be the (theoretical, seldom in practice) case when intermediate insert instruction is used out-of-tree:
> > > > > > > > ```
> > > > > > > > %rv0 = insertelement <4 x i32> undef, i16 %r0 , i32 3
> > > > > > > > %rv1 = insertelement <4 x i32> %rv0 , i16 %r1 , i32 1
> > > > > > > > %rv2 = insertelement <4 x i32> %rv1 , i16 %r2 , i32 0
> > > > > > > > %rv3 = insertelement <4 x i32> %rv2 , i16 %r3 , i32 2
> > > > > > > > ...
> > > > > > > > %a = shufflevector <4 x i32> %rv2, <4 x i32> undef, ...
> > > > > > > > ```
> > > > > > > > So we have to use def-use order to extract only 3, 1 and 0 indices here after vectorization.
> > > > > > > >
> > > > > > > > But your comment made me think to pass already def-use sorted inserts to `tryToVectorizeList()` and to use `TreeEntry::ReorderIndices` to store index order. I'm to try this approach.
> > > > > > > I think in this case you need to vectorize only `%rv0-%rv2` sequence, `%rv3` should remain as an insert
> > > > > > No, full-inserted vector could be used as well:
> > > > > > ```
> > > > > > ...
> > > > > > %a = shufflevector <4 x i32> %rv2, <4 x i32> undef, ...
> > > > > > %b = shufflevector <4 x i32> %rv3, <4 x i32> undef, ...
> > > > > > ```
> > > > > > So we vectorize all lanes, and then replace `%rv3` by `VectorizedValue` and replace `%rv2` by `VectorizedValue`, preceded by shuffle of `SK_ExtractVector` kind.
> > > > > I would vectorize `%rv0-%rv2` and kept `%rv3` as is, i.e. something like this:
> > > > > ```
> > > > > %vec = vector<%rv0, %rv1, %rv2>
> > > > > %rv3 = insertelement <4 x i32> %vec , i16 %r3 , i32 2
> > > > > %a = shufflevector <4 x i32> %vec, <4 x i32> undef, ...
> > > > > ```
> > > > This was just an simple example, one can imagine more complex case where whole vectorization and further extracting is rather cheap to do, so the decision is eventually made by cost. We use `getShuffleCost()` for extract cost at line 4225.
> > > I would start with the simple solution and later investigated possible improvements.
> > I had several tests broken without this solution, so had to implement this.
> > (for instance, https://github.com/llvm/llvm-project/blob/main/llvm/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll#L127).
> Could you provide diffs with the broken test cases?
I've refactored this patch, so now looks better.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D98714/new/
https://reviews.llvm.org/D98714
More information about the llvm-commits
mailing list