[LLVMdev] Loop vectorizer preamble

Sat Aug 30 10:08:48 PDT 2014

Hi James,

thank you for looking into this!

> On Aug 30, 2014, at 4:47 AM, James Molloy <james at jamesmolloy.co.uk> wrote:
> 
> Hi Arnold, Nadav,
> 
> I've been taking a look at the preamble and bailout tests created by the loop vectorizer, and I can't help but feel it could be rather conservative. I'm not a vectorization expert, so I apologise in advance if say something obviously wrong...
> 
> I'm looking in particular at the overflow check and the trip count computation. From my reading, it goes something like:
> 
>   take the backedge taken count and add one -> Count
>   emit code to check Count didn't overflow
>   // pointer aliasing checks, if any
>   calculate vector trip count = Count - (Count % Step)
> 
> It seems to me that there should be cases when we don't need to check for overflow. In a well-formed loop, which this should be at this point, there is an increment of the indvar before the backedge. If this increment is marked 'nuw', we should be guaranteed that we don't get an overflow when calculating numBackedges + 1.
> 

Lets say we have a loop that iterates 256 times (if n == 0):

unsigned char i = n;
do {

  —i;
} while (i);

We need to guard against such cases (see also PR17288).

However, I agree that by looking at the IR (in addition to the SCEV expression for the back-edge)  we should be able to do better in some cases.

> Also, many many loops don't have a single point-test for exit (x != 0). Instead, they have a greater-than or less-than condition. If this is the case, we should be able to elide all of our logic with Count and just count down until the test is broken. For example:
> 
> for (i = 0; i < n; ++i)
>   ...
> 
> ->
> 
> count = 0
> loop:
>   ...
>   count += VF * UF
>   if count >= n goto scalar_loop else goto loop
> 
> This could remove a lot of overflow checks and "urem"s from real code.

This would overflow for say n = 255, VF = 2, char ?

start = 0

loop:
  i = phi (start, next)
  next = + (i, VF)
  stop = >= (n, next)
  br stop, exit, loop

exit

> Also, we don't currently coalesce overflow checks, vector memchecks and trip count computation for adjacent and similar loops. For example:
> 
>   for (i = 0; i < n/2; ++i)
>     p[i] = q[i+1];
>   for (i = n/2; i < n; ++i)
>     p[i] = q[i-1];
> 
> Really, these two loops should share a common preamble which checks between the range [0, n). Now, they have two preambles, one checking [0, n/2) and the other [n/2, n).

You are right currently we only look at one loop at a time.