[PATCH] D98714: [SLP] Add insertelement instructions to vectorizable tree

Tue May 11 11:25:40 PDT 2021

ABataev added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:2883
+
+      if (IsConsecutive) {
+        TreeEntry *TE = newTreeEntry(VL, Bundle /*vectorized*/, S, UserTreeIdx);
----------------
anton-afanasyev wrote:
> ABataev wrote:
> > anton-afanasyev wrote:
> > > ABataev wrote:
> > > > anton-afanasyev wrote:
> > > > > ABataev wrote:
> > > > > > ABataev wrote:
> > > > > > > anton-afanasyev wrote:
> > > > > > > > ABataev wrote:
> > > > > > > > > anton-afanasyev wrote:
> > > > > > > > > > ABataev wrote:
> > > > > > > > > > > If not consecutive, we can include the cost of single permute and still vectorize it, no?
> > > > > > > > > > Inserts here are coming from `findBuildAggregate()`, being already sorted by index and therefore consecutive, the only exclusion is that they can have gaps, so we check it here. I don't think we should process this rare case within this patch.
> > > > > > > > > Can we check for the gaps during the analysis of the build vector sequence instead?
> > > > > > > > I've decided to check it here since we can get other sources of incoming insert bundle in future.
> > > > > > > The no need to iterate through the whole list, use early exit out of loop with cancelled scheduling and gather nodes.
> > > > > > Not done, you can early exit out of the loop if the non consecutive insert is found:
> > > > > > ```
> > > > > >       int Offset = *getInsertIndex(VL[0], 0);
> > > > > >       ValueList Operands(VL.size());
> > > > > >       Operands[0] = cast<Instruction>(VL[0])->getOperand(1);
> > > > > >       for (unsigned I = 1, E = VL.size(); I < E; ++I) {
> > > > > >         if (I != *getInsertIndex(VL[I], 0) - Offset) {
> > > > > >           LLVM_DEBUG(dbgs() << "SLP: skipping non-consecutive inserts.\n");
> > > > > >           BS.cancelScheduling(VL, VL0);
> > > > > >           buildTree_rec(Operands, Depth, UserTreeIdx);
> > > > > >           return;
> > > > > >         }
> > > > > >         Operands[I] = cast<Instruction>(VL[I])->getOperand(1);
> > > > > >       }
> > > > > >       TreeEntry *TE = newTreeEntry(VL, Bundle /*vectorized*/, S, UserTreeIdx);
> > > > > >       LLVM_DEBUG(dbgs() << "SLP: added inserts bundle.\n");
> > > > > >       TE->setOperand(0, Operands);
> > > > > > 
> > > > > >       ValueList VectorOperands;
> > > > > >       for (Value *V : VL)
> > > > > >         VectorOperands.push_back(cast<Instruction>(V)->getOperand(0));
> > > > > > 
> > > > > >       TE->setOperand(1, VectorOperands);
> > > > > > 
> > > > > >       buildTree_rec(Operands, Depth + 1, {TE, 0});
> > > > > >       return;
> > > > > > ```
> > > > > This code is not what intended, since we early exit building tree with uncompletely filled `Operands`.
> > > > I meant `newTreeEntry(VL, None /*not vectorized*/, S, UserTreeIdx);`, just like we're doing for other nodes.
> > > Hmm, `buildTree_rec()` here (with completely filled Operands) is intended: if we skip vectorization of non-consecutive inserts, we still try to vectorize starting from `Operands` (as it were before this patch). I think it's rare case when such operands could be successful seed for vectorizable tree, but why not to try?
> > Hmm, does it mean you're going to support something like this:
> > ```
> > i1 = insertelement undef, v0, 0
> > i2 = insertelement i1, v1, 2
> > ```
> > `|`
> > `V`
> > ```
> > v = <v0, v1>
> > i2 = shuffle v, undef, <0, undef, 1, undef>
> > ```
> > ? How do we handle shuffle in this case? As external uses and inserts of extractelements? And we do not subtract the costs of insertelements in this case?
> Yes, for that case we end up with previous combination of inserts/extracts and we don't subtract the cost of inserts/extracts to be eliminated by `instcombine` later, but that's better than nothing, if total cost is still good.
> I've checked that neither my version nor your one doesn't affect any test case and it all looks like rather speculative and rare case though.
At least add a FIXME to properly support this kind of vectorization in the future

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98714/new/

https://reviews.llvm.org/D98714