[PATCH] D38785: [LV/LAA] Avoid secializing a loop for stride=1 when this predicate implies a single-iteration loop

Wed Oct 11 05:10:16 PDT 2017

dorit added a comment.

Hi Silviu,

> It sounds like this is a profitability issue and therefore the LAA users should handle this? Maybe the users would also want to be more general (I think a loop with an iteration count of 2 is probably not worth vectorizing).

First of all, yes, that's true; This is why I went only for the obvious no-brainer case: a loop that iterates only a single iteration is not a loop and shouldn't go through any loop level analysis and transformation. Anything beyond that is indeed a profitability decision that each user needs to make.

(In fact, regardless of this corner case, the user should be given a chance to consider the specialization to stride=1 as just one alternative. At least the vectorizer is making its first steps towards a model (VPlan) that could allow it to consider different vectorization alternatives. )

> Another issue is that if LV would somehow add a stride == 1 predicate, LAA wouldn't see it so the problem wouldn't go away.

Yes, of course, the vectorizer has a much bigger problem of not being aware of the predicates it operates under. There's the memcheck predicates, scev predicates, LAA Stride==1 predicate, and then the loop-iteration-count predicates that the vectorizer itself adds (I think it's the only ones that no other analysis pass is adding under the covers?). These predicates are never reasoned about together, no attempt to check if they contradict one another or can be optimized. Indeed this patch is not attempting to address this larger problem…

> Do you happen to know what transformation we're blocking here?

The main transformation we are blocking is vectorization with unknown stride (using gathers/scalarized loads).  We specialize for an impossible case, and lose the opportunity to consider an actual viable alternative.

Beyond that the side effect that I've seen, is that the removal of the dead vectorized loop body created a situation where some of the runtime guards that the vectorizer inserted before the vectorized loop got to be in the pre-header of the scalar loop that followed the (now removed) vectorized loop, thereby affecting the decisions of the loop-unroller, which in turn affected the behavior of the SLP vectorizer… (all because of IR we inserted that shouldn't have been added in the first place…).

But the stronger motivation to me is to avoid inserting an impossible predicate, waste time on analysis and transformation that this predicate enables (while also leaving behind a bunch of garbage).  (And all this where we could have done something better…).

In short, Silviu, I absolutely agree with your observations, but I think that this patch has (some) value nonetheless…

Thanks,
dorit

https://reviews.llvm.org/D38785