<div dir="ltr">On 23 October 2013 23:05, Arnold Schwaighofer <span dir="ltr"><<a href="mailto:aschwaighofer@apple.com" target="_blank">aschwaighofer@apple.com</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">

<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div class="im"><span style="color:rgb(34,34,34)">A reduction is something like:</span><br>

</div>

<br>

for (i= …) {<br>

 r+= a[i];<br>

}<br>

return r;<br></blockquote><div><br></div><div>Ok, so "reduction" is just a reduction in the map-reduce sense, and nothing else.</div><div> </div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<div class="im"><span style="color:rgb(34,34,34)">You don’t need to transform them in the legality phase. Believe me ;). Look at how we handle stride one pointer inductions at the moment (they are also memory phis) - they are based off a canonical induction variable that we create during the actual vectorization. Everything before that is done virtually without having to transform code until we actually know we want to vectorize it.</span></div>

</blockquote><div><br></div><div>Oh, so that's what I was missing. When Nadav said about pointer reduction, I thought that was how we were should be dealing with memory PHIs in the end. I'll see how stride 1 pointer induction traverses the code and where to add stride N (but not sooner than I try to teach strides to non-pointer cases).</div>

<div><br></div><div>I'll also add the reduction case, like:</div><div><br></div><div>for (i .. N/3) {</div><div>  r += a[3*i] ..;</div><div>  r += a[3*i+1] ..;<br></div><div>  r += a[3*i+2] ..;<br></div><div>}</div><div>

return r;</div><div><br></div><div>And see how it should work. (later, too).</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<div class="im"><span style="color:rgb(34,34,34)">Basically for pointer inductions we store the start value. When we come to actually vectorize the loop we create a canonical induction variable that starts at zero and strides by the vector factor (VF). Any use of the original pointer induction variable in the vectorized loop is simply:</span></div>

</blockquote><div><br></div><div>Thanks, I'll have a look (again). ;)</div><div><br></div><div>I'll open some Bugzilla bugs and assign to me, one for each type of loop to be vectorized, in sequence (depends on) and using some of the test we discussed in the mailing list. That way, we'll be able to agree on the order and implementation beforehand. Feel free to share them with me if you have time in the interim, I'll mainly be two weeks away from now (Connect, holidays, LLVM meeting).</div>

<div><br></div><div>So far, we have covered:</div><div><br></div><div>1. simple stride: for (N/3) { a[3*i] = b[3*i]; a[3*i+1] = b[3*i+1]; a[3*i+2] = b[3*i+2]; }</div><div> * The simplest form, needs only basic checks & vectorization logic</div>

<div><br></div><div>2. re-rolling possible: for (N/3) { a[3*i] = b[3*i] + K; a[3*i+1] = b[3*i+1] + K; a[3*i+2] = b[3*i+2] + K; }<br></div><div> * can be re-rolled for simple optimization (already implemented)</div><div> * doesn't need interleaved data, even if vectorized directly via stride access</div>

<div><br></div><div><div>3. re-rolling impossible: for (N/3) { a[3*i] = b[3*i] + I; a[3*i+1] = b[3*i+1] + J; a[3*i+2] = b[3*i+2] + K; }</div><div></div><div> * can't be re-rolled and does need interleaved data access</div>

<div><br></div><div></div><div>4. reduction stride: for (N/3) { a += b[3*i]; a += b[3*i+1]; a += b[3*i+2]; } return a;<br></div><div> * May work with case above implemented, may need additional reduction logic</div><div> * can be any of the three types above</div>

<div><br></div><div>5. memory stride: for (N/3) { a++ = b++; a++ = b++; a++ = b++; }</div></div><div> * should be transformed into one of the cases above first, than vectorized as them</div><div><div> * can be any of the three types above (1, 2, 3)</div>

</div><div><br></div><div>With the dependency being:</div><div><br></div><div>1 -> { 2, 3} -> {4, 5}</div><div><br></div><div>cheers,</div><div>--renato </div></div></div></div>