[PATCH] D98714: [SLP] Add insertelement instructions to vectorizable tree

Tue May 11 07:41:35 PDT 2021

ABataev added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:2883
+
+      if (IsConsecutive) {
+        TreeEntry *TE = newTreeEntry(VL, Bundle /*vectorized*/, S, UserTreeIdx);
----------------
ABataev wrote:
> anton-afanasyev wrote:
> > ABataev wrote:
> > > anton-afanasyev wrote:
> > > > ABataev wrote:
> > > > > If not consecutive, we can include the cost of single permute and still vectorize it, no?
> > > > Inserts here are coming from `findBuildAggregate()`, being already sorted by index and therefore consecutive, the only exclusion is that they can have gaps, so we check it here. I don't think we should process this rare case within this patch.
> > > Can we check for the gaps during the analysis of the build vector sequence instead?
> > I've decided to check it here since we can get other sources of incoming insert bundle in future.
> The no need to iterate through the whole list, use early exit out of loop with cancelled scheduling and gather nodes.
Not done, you can early exit out of the loop if the non consecutive insert is found:
```
      int Offset = *getInsertIndex(VL[0], 0);
      ValueList Operands(VL.size());
      Operands[0] = cast<Instruction>(VL[0])->getOperand(1);
      for (unsigned I = 1, E = VL.size(); I < E; ++I) {
        if (I != *getInsertIndex(VL[I], 0) - Offset) {
          LLVM_DEBUG(dbgs() << "SLP: skipping non-consecutive inserts.\n");
          BS.cancelScheduling(VL, VL0);
          buildTree_rec(Operands, Depth, UserTreeIdx);
          return;
        }
        Operands[I] = cast<Instruction>(VL[I])->getOperand(1);
      }
      TreeEntry *TE = newTreeEntry(VL, Bundle /*vectorized*/, S, UserTreeIdx);
      LLVM_DEBUG(dbgs() << "SLP: added inserts bundle.\n");
      TE->setOperand(0, Operands);

      ValueList VectorOperands;
      for (Value *V : VL)
        VectorOperands.push_back(cast<Instruction>(V)->getOperand(0));

      TE->setOperand(1, VectorOperands);

      buildTree_rec(Operands, Depth + 1, {TE, 0});
      return;
```

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:4672-4680
+        V = Builder.CreateShuffleVector(V, UndefValue::get(V->getType()), Mask);
+
+        for (unsigned I = 0; I < NumElts; I++)
+          Mask[I] =
+              (I < MinIndex || I >= MaxIndex) ? I : NumElts - MinIndex + I;
+
+        V = Builder.CreateShuffleVector(
----------------
anton-afanasyev wrote:
> ABataev wrote:
> > ABataev wrote:
> > > anton-afanasyev wrote:
> > > > ABataev wrote:
> > > > > anton-afanasyev wrote:
> > > > > > ABataev wrote:
> > > > > > > anton-afanasyev wrote:
> > > > > > > > anton-afanasyev wrote:
> > > > > > > > > ABataev wrote:
> > > > > > > > > > ABataev wrote:
> > > > > > > > > > > Can we use `ShuffleBuilder` here?
> > > > > > > > > > Did you include these costs in the cost model?
> > > > > > > > > They are included in `getScalarizationOverhead()`.
> > > > > > > > Yes, I thought about `ShuffleBuilder`, but here we just need to create two shuffles of special kind for vector resizing. It requires `ShuffleInstructionBuilder` expanding, don't think it's worth it.
> > > > > > > I rather doubt in it. First, you subtract the scalarization overhead cost from the vector cost, but here you need to add the costs of subvector insert and permutation of 2 vectors
> > > > > > >>> Did you include these costs in the cost model?
> > > > > > >> They are included in getScalarizationOverhead().
> > > > > > > I rather doubt in it. First, you subtract the scalarization overhead cost from the vector cost, but here you need to add the costs of subvector insert and permutation of 2 vectors
> > > > > > 
> > > > > > These subvector inserts are lowered to nop actually, so they cost nothing. We need this code when processing big vector of scalars part-by-part, but every chunk fits the whole vector register (condition `bits >= MinVecRegSize`), so actually there is no inserting. The result consisting of several vector registers is returned then.
> > > > > This might be not true for some targets/code patterns. Plus, the second pattern is a permutation/combining of 2 vectors, its cost is at least 1
> > > > > This might be not true for some targets/code patterns. Plus, the second pattern is a permutation/combining of 2 vectors, its cost is at least 1
> > > > 
> > > > Both of these shuffles are just the one operation of "inserting", we need the first one to expand source vector since shufflevector needs the same size of operands.
> > > > 
> > > > We can add cost of this shuffles, but this prevents several vectorization being performed before, since this cost is redundant (may be `TTI->getShuffleCost()` should be tuned for `TargetTransformInfo::SK_InsertVector`?)
> > > Need to check that the vectorized code is really profitable and if so, need to tune the cost model, yes.
> > You can do this check by yourself. We have something similar for extracts, where we check if need to perform shuffle/copying of subvector. But we definitely need it.
> > You can do this check by yourself. We have something similar for extracts, where we check if need to perform shuffle/copying of subvector. But we definitely need it.
> 
> Ok, I've used something similar for inserts as for extracts. Do you think it's better to make this check within `getShuffleCost()` in `X86TargetTransformInfo.cpp` and `AArch64TargetTransformInfo.cpp`?
It would be good. Probably in a separate patch after this one

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98714/new/

https://reviews.llvm.org/D98714