[LLVMdev] Loop vectorizer preamble

Sat Aug 30 11:44:32 PDT 2014

Hi Arnold,

Thanks for the reply. Taking your example:

This would overflow for say n = 255, VF = 2, char ?
> start = 0
> loop:
>   i = phi (start, next)
>   next = + (i, VF)
>   stop = >= (n, next)
>   br stop, exit, loop
> exit

Why do we have to use i8 for the induction variable type? If the original
add is no-unsigned-wrap, then we could safely extend the induction variable
to i32/i64 and have an accurate >= comparison.

We couldn't do this with an i64 type because there'd be nothing to extend
it to, but in practice people still write loops with 32-bit variables a lot
of the time.

Cheers,

James

On 30 August 2014 18:08, Arnold Schwaighofer <aschwaighofer at apple.com>
wrote:

> Hi James,
>
> thank you for looking into this!
>
> > On Aug 30, 2014, at 4:47 AM, James Molloy <james at jamesmolloy.co.uk>
> wrote:
> >
> > Hi Arnold, Nadav,
> >
> > I've been taking a look at the preamble and bailout tests created by the
> loop vectorizer, and I can't help but feel it could be rather conservative.
> I'm not a vectorization expert, so I apologise in advance if say something
> obviously wrong...
> >
> > I'm looking in particular at the overflow check and the trip count
> computation. From my reading, it goes something like:
> >
> >   take the backedge taken count and add one -> Count
> >   emit code to check Count didn't overflow
> >   // pointer aliasing checks, if any
> >   calculate vector trip count = Count - (Count % Step)
> >
> > It seems to me that there should be cases when we don't need to check
> for overflow. In a well-formed loop, which this should be at this point,
> there is an increment of the indvar before the backedge. If this increment
> is marked 'nuw', we should be guaranteed that we don't get an overflow when
> calculating numBackedges + 1.
> >
>
>
> Lets say we have a loop that iterates 256 times (if n == 0):
>
> unsigned char i = n;
> do {
>
>   —i;
> } while (i);
>
>
> We need to guard against such cases (see also PR17288).
>
> However, I agree that by looking at the IR (in addition to the SCEV
> expression for the back-edge)  we should be able to do better in some cases.
>
>
> > Also, many many loops don't have a single point-test for exit (x != 0).
> Instead, they have a greater-than or less-than condition. If this is the
> case, we should be able to elide all of our logic with Count and just count
> down until the test is broken. For example:
> >
> > for (i = 0; i < n; ++i)
> >   ...
> >
> > ->
> >
> > count = 0
> > loop:
> >   ...
> >   count += VF * UF
> >   if count >= n goto scalar_loop else goto loop
> >
> > This could remove a lot of overflow checks and "urem"s from real code.
>
> This would overflow for say n = 255, VF = 2, char ?
>
> start = 0
>
> loop:
>   i = phi (start, next)
>   next = + (i, VF)
>   stop = >= (n, next)
>   br stop, exit, loop
>
> exit
>
>
> > Also, we don't currently coalesce overflow checks, vector memchecks and
> trip count computation for adjacent and similar loops. For example:
> >
> >   for (i = 0; i < n/2; ++i)
> >     p[i] = q[i+1];
> >   for (i = n/2; i < n; ++i)
> >     p[i] = q[i-1];
> >
> > Really, these two loops should share a common preamble which checks
> between the range [0, n). Now, they have two preambles, one checking [0,
> n/2) and the other [n/2, n).
>
>
> You are right currently we only look at one loop at a time.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140830/1681cdc9/attachment.html>