[PATCH] D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops.

Tue May 22 18:34:31 PDT 2018

dtemirbulatov added inline comments.

================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3112
+      int MaxDist = 0;
+      while (Bundle) {
+        int Dist = std::distance(BB->begin(), Bundle->Inst->getIterator());
----------------
dtemirbulatov wrote:
> dtemirbulatov wrote:
> > ABataev wrote:
> > > dtemirbulatov wrote:
> > > > dtemirbulatov wrote:
> > > > > ABataev wrote:
> > > > > > dtemirbulatov wrote:
> > > > > > > ABataev wrote:
> > > > > > > > Why you can't put bundles in the list in the right order: from the very first instruction to the very last?
> > > > > > > I could do this in scheduleBlock() function with a queue, but that could add additional complexity.
> > > > > > Could you explain why?
> > > > > One instruction could belong to one or more separate bundles... and while we try to change order in bundles at scheduleBlock() we have to update ScheduleDataMap, ExtraScheduleDataMap.
> > > > I mean pseudo operation could occur in more than one bundle.
> > > But these schedule bundles must have different scheduling region id and they must be in a different bundles, why their order changes?
> > The bundle is differerent, but scheduling region id is the same.
> I mean, for example, for this function:
> define void @add0(i32* noalias %dst, i32* noalias %src) {
> entry:
>   %incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1
>   %0 = load i32, i32* %src, align 4
>   %add = add nsw i32 %0, 1
>   %incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1
>   store i32 %add, i32* %dst, align 4
>   %incdec.ptr2 = getelementptr inbounds i32, i32* %src, i64 2
>   %1 = load i32, i32* %incdec.ptr, align 4
>   %add3 = add nsw i32 %1, 1
>   %incdec.ptr4 = getelementptr inbounds i32, i32* %dst, i64 2
>   store i32 %add3, i32* %incdec.ptr1, align 4
>   %incdec.ptr5 = getelementptr inbounds i32, i32* %src, i64 3
>   %2 = load i32, i32* %incdec.ptr2, align 4
>   %add6 = add nsw i32 %2, 2
>   %incdec.ptr7 = getelementptr inbounds i32, i32* %dst, i64 3
>   store i32 %add6, i32* %incdec.ptr4, align 4
>   %3 = load i32, i32* %incdec.ptr5, align 4
>   %add9 = add nsw i32 %3, 3
>   store i32 %add9, i32* %incdec.ptr7, align 4
>   ret void
> }
> 
> We have two bundles:
> [  %3 = load i32, i32* %src, align 4;  %add3 = add nsw i32 %2, 1;  %add6 = add nsw i32 %1, 2;  %add9 = add nsw i32 %0, 3]
> and
> [  %3 = load i32, i32* %src, align 4;  %2 = load i32, i32* %incdec.ptr, align 4;  %1 = load i32, i32* %incdec.ptr2, align 4;  %0 = load i32, i32* %incdec.ptr5, align 4]
> with the same instruction %3 = load i32, i32* %src, align 4 and one is a pseudo instruction in this bundle [  %3 = load i32, i32* %src, align 4;  %add3;  %add6;  %add9]
> and all in the same scheduling region id that equal to 1.
>why their order changes?
sometimes we have to reschedule a pseudo instruction first in both bundles in order to form correct dependencies.

https://reviews.llvm.org/D28907