[PATCH] D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops.

Tue May 22 18:29:16 PDT 2018

dtemirbulatov added inline comments.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3112
+      int MaxDist = 0;
+      while (Bundle) {
+        int Dist = std::distance(BB->begin(), Bundle->Inst->getIterator());
----------------
dtemirbulatov wrote:
> ABataev wrote:
> > dtemirbulatov wrote:
> > > dtemirbulatov wrote:
> > > > ABataev wrote:
> > > > > dtemirbulatov wrote:
> > > > > > ABataev wrote:
> > > > > > > Why you can't put bundles in the list in the right order: from the very first instruction to the very last?
> > > > > > I could do this in scheduleBlock() function with a queue, but that could add additional complexity.
> > > > > Could you explain why?
> > > > One instruction could belong to one or more separate bundles... and while we try to change order in bundles at scheduleBlock() we have to update ScheduleDataMap, ExtraScheduleDataMap.
> > > I mean pseudo operation could occur in more than one bundle.
> > But these schedule bundles must have different scheduling region id and they must be in a different bundles, why their order changes?
> The bundle is differerent, but scheduling region id is the same.
I mean, for example, for this function:
define void @add0(i32* noalias %dst, i32* noalias %src) {
entry:
  %incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1
  %0 = load i32, i32* %src, align 4
  %add = add nsw i32 %0, 1
  %incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1
  store i32 %add, i32* %dst, align 4
  %incdec.ptr2 = getelementptr inbounds i32, i32* %src, i64 2
  %1 = load i32, i32* %incdec.ptr, align 4
  %add3 = add nsw i32 %1, 1
  %incdec.ptr4 = getelementptr inbounds i32, i32* %dst, i64 2
  store i32 %add3, i32* %incdec.ptr1, align 4
  %incdec.ptr5 = getelementptr inbounds i32, i32* %src, i64 3
  %2 = load i32, i32* %incdec.ptr2, align 4
  %add6 = add nsw i32 %2, 2
  %incdec.ptr7 = getelementptr inbounds i32, i32* %dst, i64 3
  store i32 %add6, i32* %incdec.ptr4, align 4
  %3 = load i32, i32* %incdec.ptr5, align 4
  %add9 = add nsw i32 %3, 3
  store i32 %add9, i32* %incdec.ptr7, align 4
  ret void
}

We have two bundles:
[  %3 = load i32, i32* %src, align 4;  %add3 = add nsw i32 %2, 1;  %add6 = add nsw i32 %1, 2;  %add9 = add nsw i32 %0, 3]
and
[  %3 = load i32, i32* %src, align 4;  %2 = load i32, i32* %incdec.ptr, align 4;  %1 = load i32, i32* %incdec.ptr2, align 4;  %0 = load i32, i32* %incdec.ptr5, align 4]
with the same instruction %3 = load i32, i32* %src, align 4 and one is a pseudo instruction in this bundle [  %3 = load i32, i32* %src, align 4;  %add3;  %add6;  %add9]
and all in the same scheduling region id that equal to 1.

https://reviews.llvm.org/D28907