<div dir="ltr">On 14 October 2013 19:31, Arnold Schwaighofer <span dir="ltr"><<a href="mailto:aschwaighofer@apple.com" target="_blank">aschwaighofer@apple.com</a>></span> wrote:<br><div class="gmail_extra"><div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Renato, can you post the c code for the function and the assembly that gcc produces?<br>
</blockquote><div><br></div><div>Attached.</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
Your initial example could be well handled by vectorization of strided loops (and the mentioning of VLD3(.8?)/VST3(.8?) lead me to assume that this is what happened). But the LLVM-IR you sent has a store of 0 in there ;) and strides by 4.<br>
</blockquote><div><br></div><div>I think so. Ignore the last write, it was bogus. (but don't ignore the fact that GCC vectorized it anyway with vst4!).</div><div><br></div><div>By running GCC with -ftree-vectorizer-verbose=1 I got:</div>
<div><br></div><div><div>test.c:11: note: create runtime check for data references DELTA and *WRITE_30</div><div>test.c:11: note: create runtime check for data references *READ_29 and *WRITE_30</div><div>test.c:11: note: created 2 versioning for alias checks.</div>
<div>test.c:11: note: === vect_do_peeling_for_loop_bound ===Setting upper bound of nb iterations for epilogue loop to 14<br></div><div>test.c:11: note: LOOP VECTORIZED.<br></div></div><div><br></div><div>The result is a very concise and very dense code:</div>
<div><br></div><div><span style="white-space:pre"> </span>vld1.8<span class="" style="white-space:pre"> </span>{d28[], d29[]}, [r5]<br></div><div><div><span class="" style="white-space:pre"> </span>vld3.8<span class="" style="white-space:pre"> </span>{d16, d18, d20}, [r9]!</div>
<div><span class="" style="white-space:pre"> </span>vld3.8<span class="" style="white-space:pre"> </span>{d17, d19, d21}, [r9]<br></div><div><span class="" style="white-space:pre"> </span>vmvn <span class="" style="white-space:pre"> </span>q3, q8</div>
<div><span class="" style="white-space:pre"> </span>vmvn <span class="" style="white-space:pre"> </span>q15, q9</div><div><span class="" style="white-space:pre"> </span>vmvn <span class="" style="white-space:pre"> </span>q8, q10</div>
<div><span class="" style="white-space:pre"> </span>vsub.i8<span class="" style="white-space:pre"> </span>q11, q3, q14</div><div><span class="" style="white-space:pre"> </span>vsub.i8<span class="" style="white-space:pre"> </span>q12, q15, q14</div>
<div><span class="" style="white-space:pre"> </span>vsub.i8<span class="" style="white-space:pre"> </span>q13, q8, q14</div><div><span class="" style="white-space:pre"> </span>vst3.8<span class="" style="white-space:pre"> </span>{d22, d24, d26}, [r8]!</div>
<div><span class="" style="white-space:pre"> </span>vst3.8<span class="" style="white-space:pre"> </span>{d23, d25, d27}, [r8]</div></div><div><br></div><div>cheers,</div><div>--renato</div></div></div></div>