<div dir="ltr"><div>If I got you right, this is the classic case for loop peeling. I thought LLVM's vectorizer had something like that already in.<br></div><div><br></div><div><br></div>On 21 July 2013 18:16, Arnold Schwaighofer <span dir="ltr"><<a href="mailto:aschwaighofer@apple.com" target="_blank">aschwaighofer@apple.com</a>></span> wrote:<br>

<div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">I will have to work on this soon as ARM also has pretty inefficient unaligned vector loads.<br>

</blockquote><div><br></div></div></div><div class="gmail_extra">NEON does support unaligned access via VLD*/VST*, what loads are you referring to?</div><div class="gmail_extra"><br></div><div class="gmail_extra">cheers,</div>

<div class="gmail_extra">--renato</div></div>