[PATCH] D76956: [TTI][SLP] Add TTI interface to estimate cost of chain of vector inserts/extracts.

Alexey Bataev via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Apr 9 10:58:30 PDT 2020


ABataev added inline comments.


================
Comment at: llvm/test/Transforms/SLPVectorizer/X86/resched.ll:22-29
+; CHECK-NEXT:    [[TMP1:%.*]] = insertelement <8 x i32> undef, i32 [[CONV31_I]], i32 0
+; CHECK-NEXT:    [[TMP2:%.*]] = insertelement <8 x i32> [[TMP1]], i32 [[CONV31_I]], i32 1
+; CHECK-NEXT:    [[TMP3:%.*]] = insertelement <8 x i32> [[TMP2]], i32 [[CONV31_I]], i32 2
+; CHECK-NEXT:    [[TMP4:%.*]] = insertelement <8 x i32> [[TMP3]], i32 [[CONV31_I]], i32 3
+; CHECK-NEXT:    [[TMP5:%.*]] = insertelement <8 x i32> [[TMP4]], i32 [[CONV31_I]], i32 4
+; CHECK-NEXT:    [[TMP6:%.*]] = insertelement <8 x i32> [[TMP5]], i32 [[CONV31_I]], i32 5
+; CHECK-NEXT:    [[TMP7:%.*]] = insertelement <8 x i32> [[TMP6]], i32 [[CONV31_I]], i32 6
----------------
vdmitrie wrote:
> ABataev wrote:
> > vdmitrie wrote:
> > > ABataev wrote:
> > > > spatel wrote:
> > > > > vdmitrie wrote:
> > > > > > spatel wrote:
> > > > > > > Did SLP fail to recognize that this is a splat shuffle? I would have expected it to produce splat IR:
> > > > > > > (shuffle (insert X, 0), zeroinitializer)
> > > > > > > ...instead of a chain of inserts.
> > > > > > No. It did not fail to recognize a splat. As I see from code single element is not shuffled deliberately:
> > > > > > 
> > > > > > Value *BoUpSLP::vectorizeTree(ArrayRef<Value *> VL) {
> > > > > > ...
> > > > > >     // Do not shuffle single element or if number of unique values is not power
> > > > > >     // of 2.
> > > > > >     if (UniqueValues.size() == VL.size() || UniqueValues.size() <= 1 ||
> > > > > >         !llvm::isPowerOf2_32(UniqueValues.size()))
> > > > > >       ReuseShuffleIndicies.clear();
> > > > > > ...
> > > > > > 
> > > > > I tried to step through the debug spew from SLP, but I can't tell what is happening on this example. I only see a call to getGatherCost() at current line 3318 of SLPVectorizer.cpp, so I thought that is the point where we check for a splat. 
> > > > > 
> > > > > I don't understand this part of the model/usage very well so others should review this patch, but there's still a concern here: ideally, SLP should not be producing this chain of inserts if it's a splat op. I don't know if that changes how we view the diff for the cost model.
> > > > I think, SLP recognizes splat here, but models it through gather, and relies on the InstCombiner to transform it into a real shuffle.
> > > Nope. That is not the point where splat is detected.
> > > First SLP checks for splat when it builds a tree. Such tree entries are  created with NeedToGather state. You can locate the place with "SLP: Gathering due to C,S,B,O." message.
> > > Then cost is calculated in int BoUpSLP::getEntryCost(TreeEntry *E). 
> > > Ideally we want cost calculation routine to calculate cost for such entry taking into account how VectorizeTree then would handle it.
> > > And here is what I see there:
> > > 
> > >   if (E->State == TreeEntry::NeedToGather) {
> > >     if (allConstant(VL))
> > >       return 0;
> > >     if (isSplat(VL)) {
> > >       return ReuseShuffleCost +
> > >              TTI->getShuffleCost(TargetTransformInfo::SK_Broadcast, VecTy, 0);
> > >     }
> > > 
> > > And that is obviously different from what vectorizeTree does(see my previous post). This is a problem. And it explains your debugging experience.
> > > It definitely has to be fixed but I believe not in this patch. This patch specifically targets calculation of gathering cost for multiple indexes for vectors with size greater than 128bits. Specifically it fixes regression in this area caused by D74976
> > > and wrt to this test case (and also one other) it is a revert of D74976 changes.
> > > 
> > And this what I meant. SLP vectorizer does not emit shuffle directly here, instead it relies on InstCombiner. Sure, it can be improved.
> I see. This at least deserves a comment in the code.
Agree. Actually, this is very old code. Originally, SLP vectorizer did not emit shuffles at all, it just emitted sets of ExtractElement/InsertElement instructions completely relying on the InstCombiner. We just had no time to improve/fix everything here.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76956/new/

https://reviews.llvm.org/D76956





More information about the llvm-commits mailing list