[LLVMdev] [LoopVectorizer] Missed vectorization opportunities caused by sext/zext operations

Wed May 6 04:03:06 PDT 2015

Hi Renato, Sanjoy,

> void test0(unsigned short a, unsigned short *  in, unsigned short * out) {
>   for (unsigned short w = 1; w < a - 1; w++) //this will never overflow
>       out[w] = in[w+7] * 2;
> }

Turns out this can actually overflow for a = 0, and in this case would
be an infinite loop (end condition would be w < MAX_USHORT). Run-time
checks could still be added here:

if (a > 0)
  do vectorized version
else
  default to scalar version

On 06 May 2015 at 03:00, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:
> I think it will be sufficient to add a couple of new cases to
> ScalarEvolution::HowManyLessThans --
> 
>   zext(A) ult zext(B) == A ult B
>   sext(A) slt sext(B) == A slt B

I had a go at this but realized that the problem above would affect
at least the ult case (we need to have B < max(type(B)) for it to hold),
and so I gave up (at least temporarily). I think this should still be
possible, but some range checking might be required.

On 05 May 2015 at 18:54, Renato Golin < renato.golin at linaro.org > wrote:
> > -          Promote values that go into the trip count calculation and memory
> > access indices to the smallest type
> > which would remove sext/zext/trunc operations from the loop body. This
> > should remove the sext/zext issue, as SCEV wouldn’t have to deal with
> > these operations.
> 
> You mean demote the range specifier (a in your example above), to a more
> constrained type than the induction variable, right?
> 
> If possible, this would help static analysis of loop bounds and not require run
> time checks.

I was thinking about doing something like promoting the
induction variable (in my case w) to a wider type (int), but there might be
some problems with that. I think we need SCEV working to be able to reason
about when that would be safe, and even if we could do that, there is no
guarantee that this would give us any benefit - we might end up
increasing the code size if vectorization wouldn't be possible
even after that transformation.

On 05 May 2015 at 18:54, Renato Golin < renato.golin at linaro.org > wrote:
> 
> > -          Add runtime checks (outside the loop) to detect overflows in the
> > original loop
> 
> I think we do some of that. Maybe the two steps above will make the little
> we have today work on most cases already.

I think we're only checking for overflows in the trip count variable that's
being generated by the loop vectorizer. So any potential overflows in the
actual loop would make us bail out in the legality check part.

I'm currently trying to work around this issue by teaching SCEV
to be able to optimistically fold sext/zext expressions while
accumulating the assumptions that it has made doing so (which
need to be checked at run-time). This allows us to gather all the
assumptions in the vectorization legality check of the loop vectorizer
and even allows us to go through the cost model without modifying
the IR.

This would of course only happen if we would otherwise abort
because we didn't get a SCEV expression that we want to be
an AddRecExpr.

This seems to work at least on a toy example (it can work around
issues with the number of back-edges taken and memory checks),
so I hope to have a patch sent out for RFC soon.

Thanks,
Silviu