[PATCH] D28907: [SLP] Fix for PR30787: Failure to beneficially vectorize 'copyable' elements in integer binary ops.
Dinar Temirbulatov via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue May 22 18:34:31 PDT 2018
dtemirbulatov added inline comments.
================
Comment at: lib/Transforms/Vectorize/SLPVectorizer.cpp:3112
+ int MaxDist = 0;
+ while (Bundle) {
+ int Dist = std::distance(BB->begin(), Bundle->Inst->getIterator());
----------------
dtemirbulatov wrote:
> dtemirbulatov wrote:
> > ABataev wrote:
> > > dtemirbulatov wrote:
> > > > dtemirbulatov wrote:
> > > > > ABataev wrote:
> > > > > > dtemirbulatov wrote:
> > > > > > > ABataev wrote:
> > > > > > > > Why you can't put bundles in the list in the right order: from the very first instruction to the very last?
> > > > > > > I could do this in scheduleBlock() function with a queue, but that could add additional complexity.
> > > > > > Could you explain why?
> > > > > One instruction could belong to one or more separate bundles... and while we try to change order in bundles at scheduleBlock() we have to update ScheduleDataMap, ExtraScheduleDataMap.
> > > > I mean pseudo operation could occur in more than one bundle.
> > > But these schedule bundles must have different scheduling region id and they must be in a different bundles, why their order changes?
> > The bundle is differerent, but scheduling region id is the same.
> I mean, for example, for this function:
> define void @add0(i32* noalias %dst, i32* noalias %src) {
> entry:
> %incdec.ptr = getelementptr inbounds i32, i32* %src, i64 1
> %0 = load i32, i32* %src, align 4
> %add = add nsw i32 %0, 1
> %incdec.ptr1 = getelementptr inbounds i32, i32* %dst, i64 1
> store i32 %add, i32* %dst, align 4
> %incdec.ptr2 = getelementptr inbounds i32, i32* %src, i64 2
> %1 = load i32, i32* %incdec.ptr, align 4
> %add3 = add nsw i32 %1, 1
> %incdec.ptr4 = getelementptr inbounds i32, i32* %dst, i64 2
> store i32 %add3, i32* %incdec.ptr1, align 4
> %incdec.ptr5 = getelementptr inbounds i32, i32* %src, i64 3
> %2 = load i32, i32* %incdec.ptr2, align 4
> %add6 = add nsw i32 %2, 2
> %incdec.ptr7 = getelementptr inbounds i32, i32* %dst, i64 3
> store i32 %add6, i32* %incdec.ptr4, align 4
> %3 = load i32, i32* %incdec.ptr5, align 4
> %add9 = add nsw i32 %3, 3
> store i32 %add9, i32* %incdec.ptr7, align 4
> ret void
> }
>
> We have two bundles:
> [ %3 = load i32, i32* %src, align 4; %add3 = add nsw i32 %2, 1; %add6 = add nsw i32 %1, 2; %add9 = add nsw i32 %0, 3]
> and
> [ %3 = load i32, i32* %src, align 4; %2 = load i32, i32* %incdec.ptr, align 4; %1 = load i32, i32* %incdec.ptr2, align 4; %0 = load i32, i32* %incdec.ptr5, align 4]
> with the same instruction %3 = load i32, i32* %src, align 4 and one is a pseudo instruction in this bundle [ %3 = load i32, i32* %src, align 4; %add3; %add6; %add9]
> and all in the same scheduling region id that equal to 1.
>why their order changes?
sometimes we have to reschedule a pseudo instruction first in both bundles in order to form correct dependencies.
https://reviews.llvm.org/D28907
More information about the llvm-commits
mailing list