<div dir="ltr">Hi Arnold,<div><br></div><div>Thanks for the reply. Taking your example:</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<span style="font-family:arial,sans-serif;font-size:13px">This would overflow for say n = 255, VF = 2, char ?</span><br style="font-family:arial,sans-serif;font-size:13px"><span style="font-family:arial,sans-serif;font-size:13px">start = 0</span><br style="font-family:arial,sans-serif;font-size:13px">

<span style="font-family:arial,sans-serif;font-size:13px">loop:<br></span><span style="font-family:arial,sans-serif;font-size:13px">  i = phi (start, next)<br></span><span style="font-family:arial,sans-serif;font-size:13px">  next = + (i, VF)<br>

</span><span style="font-family:arial,sans-serif;font-size:13px">  stop = >= (n, next)<br></span><span style="font-family:arial,sans-serif;font-size:13px">  br stop, exit, loop</span><br style="font-family:arial,sans-serif;font-size:13px">

<span style="font-family:arial,sans-serif;font-size:13px">exit</span></blockquote><div><span style="font-family:arial,sans-serif;font-size:13px"><br></span></div><div><span style="font-family:arial,sans-serif;font-size:13px">Why do we have to use i8 for the induction variable type? If the original add is no-unsigned-wrap, then we could safely extend the induction variable to i32/i64 and have an accurate >= comparison.</span></div>

<div><span style="font-family:arial,sans-serif;font-size:13px"><br></span></div><div><span style="font-family:arial,sans-serif;font-size:13px">We couldn't do this with an i64 type because there'd be nothing to extend it to, but in practice people still write loops with 32-bit variables a lot of the time.</span></div>

<div><span style="font-family:arial,sans-serif;font-size:13px"><br></span></div><div><font face="arial, sans-serif">Cheers,</font></div><div><font face="arial, sans-serif"><br></font></div><div><font face="arial, sans-serif">James</font></div>

</div><div class="gmail_extra"><br><br><div class="gmail_quote">On 30 August 2014 18:08, Arnold Schwaighofer <span dir="ltr"><<a href="mailto:aschwaighofer@apple.com" target="_blank">aschwaighofer@apple.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi James,<br>

<br>

thank you for looking into this!<br>

<div class=""><br>

> On Aug 30, 2014, at 4:47 AM, James Molloy <<a href="mailto:james@jamesmolloy.co.uk">james@jamesmolloy.co.uk</a>> wrote:<br>

><br>

> Hi Arnold, Nadav,<br>

><br>

> I've been taking a look at the preamble and bailout tests created by the loop vectorizer, and I can't help but feel it could be rather conservative. I'm not a vectorization expert, so I apologise in advance if say something obviously wrong...<br>


><br>

> I'm looking in particular at the overflow check and the trip count computation. From my reading, it goes something like:<br>

><br>

>   take the backedge taken count and add one -> Count<br>

>   emit code to check Count didn't overflow<br>

>   // pointer aliasing checks, if any<br>

>   calculate vector trip count = Count - (Count % Step)<br>

><br>

> It seems to me that there should be cases when we don't need to check for overflow. In a well-formed loop, which this should be at this point, there is an increment of the indvar before the backedge. If this increment is marked 'nuw', we should be guaranteed that we don't get an overflow when calculating numBackedges + 1.<br>


><br>

<br>

<br>

</div>Lets say we have a loop that iterates 256 times (if n == 0):<br>

<br>

unsigned char i = n;<br>

do {<br>

<br>

  —i;<br>

} while (i);<br>

<br>

<br>

We need to guard against such cases (see also PR17288).<br>

<br>

However, I agree that by looking at the IR (in addition to the SCEV expression for the back-edge)  we should be able to do better in some cases.<br>

<div class=""><br>

<br>

> Also, many many loops don't have a single point-test for exit (x != 0). Instead, they have a greater-than or less-than condition. If this is the case, we should be able to elide all of our logic with Count and just count down until the test is broken. For example:<br>


><br>

> for (i = 0; i < n; ++i)<br>

>   ...<br>

><br>

> -><br>

><br>

> count = 0<br>

> loop:<br>

>   ...<br>

>   count += VF * UF<br>

>   if count >= n goto scalar_loop else goto loop<br>

><br>

> This could remove a lot of overflow checks and "urem"s from real code.<br>

<br>

</div>This would overflow for say n = 255, VF = 2, char ?<br>

<br>

start = 0<br>

<br>

loop:<br>

  i = phi (start, next)<br>

  next = + (i, VF)<br>

  stop = >= (n, next)<br>

  br stop, exit, loop<br>

<br>

exit<br>

<div class=""><br>

<br>

> Also, we don't currently coalesce overflow checks, vector memchecks and trip count computation for adjacent and similar loops. For example:<br>

><br>

>   for (i = 0; i < n/2; ++i)<br>

>     p[i] = q[i+1];<br>

>   for (i = n/2; i < n; ++i)<br>

>     p[i] = q[i-1];<br>

><br>

> Really, these two loops should share a common preamble which checks between the range [0, n). Now, they have two preambles, one checking [0, n/2) and the other [n/2, n).<br>

<br>

<br>

</div>You are right currently we only look at one loop at a time.<br>

<br>

</blockquote></div><br></div>