[PATCH] D34150: [LV] Test once if vector trip count is zero, instead of twice

Tue Jul 11 16:20:32 PDT 2017

Ayal added a subscriber: anemet.
Ayal added a comment.

In https://reviews.llvm.org/D34150#802750, @mkuper wrote:

> Re overflow - the point is that  getOrCreateTripCount() returns, basically, PSE.getBackedgeTakenCount() + 1, and that may overflow, so the "trip count" may end up being 0 if the backedge taken count is 0. I don't think this is outdated, and this is behavior we want to preserve. But this patch should preserve this behavior IIUC.

Ahh, right. This patch is indeed intended to preserve the existing behavior. In https://reviews.llvm.org/D12107 @anemet specifically asked not to remove the explanation for the need of checking overflow. Will make sure this explanation remains in place.

In https://reviews.llvm.org/D34150#805377, @wmi wrote:

> In https://reviews.llvm.org/D34150#802750, @mkuper wrote:
>
> > Re overflow - the point is that  getOrCreateTripCount() returns, basically, PSE.getBackedgeTakenCount() + 1, and that may overflow, so the "trip count" may end up being 0 if the backedge taken count is 0. I don't think this is outdated, and this is behavior we want to preserve. But this patch should preserve this behavior IIUC. Can you make sure there's a test for this?
>
>
> max_i32_backedgetaken in test/Transforms/LoopVectorize/induction.ll is the orginal test for overflow case. The patch generates correct code for it.

Also miniters.ll test added back in r245952 checks for the right min.iters.check, so that seems to be covered. However, a devised test shows that vector.scevcheck may generate its own checks to guard against potential overflow of the trip count, presumably unaware that min.iters.check took care of it(?); so there may be a third test involved in checking if the vector trip count is zero... Will open a PR, as this deserves a separate fix.

BTW, a scalar loop whose trip count is uintxx_max+1 may actually (theoretically?) be a good candidate to vectorize; provided we compute its vector trip count correctly. Such loops however are expected to appear seldom in practice, if at all(?)

In https://reviews.llvm.org/D34150#805377, @wmi wrote:

> In https://reviews.llvm.org/D34150#802750, @mkuper wrote:
>
> > Anyway, the logic seems sound, but I'd like wmi to look at it - he added the minimum iterations check in r245952, and combined it with the overflow check, instead of the VTC==0 check.
> >  Wei, is there some case we're missing here?
>
>
> The logic looks good to me. It is a nice improvement.

Thanks :-). Perhaps this is what @jmolloy referred to originally back in https://reviews.llvm.org/D12107:

In https://reviews.llvm.org/D12107#235903, @jmolloy wrote:

> We do actually already emit such a check - we just emit it late. This new check now makes the later check redundant, so we should remove it.

In https://reviews.llvm.org/D34150#802750, @mkuper wrote:

> One more thing about the description: "In this case a single (the last) vector iteration is peeled and replaced with VF*UF scalar iterations (instead of none), reducing VTC by 1." <-- this is confusing. That only happens if STC == 0 (mod VF * UF). So the logic is correct, since this is indeed what happens when STC == VF * UF, but I would phrase it differently.

Right, sorry for the confusion.
Another, hopefully less confusing way to phrase this: when requiresScalarEpilogue() holds, at-least one iteration must remain scalar; the rest can be used to form vector iterations. So VTC =  (STC-1) mod VF*UF,  and so VTC is zero iff STC-1 < VF*UF, which we convert to STC <= VF*UF.

https://reviews.llvm.org/D34150