<div dir="ltr">Hi Arnold, Nadav,<div><br></div><div>I've been taking a look at the preamble and bailout tests created by the loop vectorizer, and I can't help but feel it could be rather conservative. I'm not a vectorization expert, so I apologise in advance if say something obviously wrong...</div>
<div><br></div><div>I'm looking in particular at the overflow check and the trip count computation. From my reading, it goes something like:</div><div><br></div><div> take the backedge taken count and add one -> Count</div>
<div> emit code to check Count didn't overflow</div><div> // pointer aliasing checks, if any</div><div> calculate vector trip count = Count - (Count % Step)</div><div><br></div><div>It seems to me that there should be cases when we don't need to check for overflow. In a well-formed loop, which this should be at this point, there is an increment of the indvar before the backedge. If this increment is marked 'nuw', we should be guaranteed that we don't get an overflow when calculating numBackedges + 1.</div>
<div><br></div><div>Also, many many loops don't have a single point-test for exit (x != 0). Instead, they have a greater-than or less-than condition. If this is the case, we should be able to elide all of our logic with Count and just count down until the test is broken. For example:</div>
<div><br></div><div>for (i = 0; i < n; ++i)</div><div> ...</div><div><br></div><div>-></div><div><br></div><div>count = 0</div><div>loop:</div><div> ...</div><div> count += VF * UF</div><div> if count >= n goto scalar_loop else goto loop</div>
<div><br></div><div>This could remove a lot of overflow checks and "urem"s from real code.</div><div><br></div><div>Also, we don't currently coalesce overflow checks, vector memchecks and trip count computation for adjacent and similar loops. For example:</div>
<div><br></div><div> for (i = 0; i < n/2; ++i)</div><div> p[i] = q[i+1];</div><div> for (i = n/2; i < n; ++i)</div><div> p[i] = q[i-1];</div><div><br></div><div>Really, these two loops should share a common preamble which checks between the range [0, n). Now, they have two preambles, one checking [0, n/2) and the other [n/2, n).</div>
<div><br></div><div>Sorry for the braindump, and for probably missing many issues!</div><div><br></div><div>Cheers,</div><div><br></div><div>James</div></div>