<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto"><div><span></span></div><div><meta http-equiv="content-type" content="text/html; charset=utf-8"><div><br></div><div>On Jul 21, 2013, at 4:03 PM, Renato Golin <<a href="mailto:renato.golin@linaro.org">renato.golin@linaro.org</a>> wrote:<br><br></div><blockquote type="cite"><div><div dir="ltr"><div>If I got you right, this is the classic case for loop peeling. I thought LLVM's vectorizer had something like that already in.</div></div></div></blockquote><div><br></div>No we don't have loop peeling.<div><br><div>The problem is even more fundamental than this. In the vectorizer we pass the alignment of the scalar loop access which is of course lower than what is required.we need to compute alignment based on the first access only and the vector access size. But we don't to this at the moment.</div><br><blockquote type="cite"><div><div dir="ltr"><div><br></div>On 21 July 2013 18:16, Arnold Schwaighofer <span dir="ltr"><<a href="mailto:aschwaighofer@apple.com" target="_blank">aschwaighofer@apple.com</a>></span> wrote:<br>

<div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">I will have to work on this soon as ARM also has pretty inefficient unaligned vector loads.<br>

</blockquote><div><br></div></div></div><div class="gmail_extra">NEON does support unaligned access via VLD*/VST*, what loads are you referring to?</div></div></div></blockquote><div><br></div>Yes but they can be very slow depending on the alignment( more micro ops).<br><blockquote type="cite"><div><div dir="ltr"><div class="gmail_extra"><br></div><div class="gmail_extra">cheers,</div>

<div class="gmail_extra">--renato</div></div>

</div></blockquote></div></div></body></html>