<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jun 27, 2014 at 6:54 PM, Nadav Rotem <span dir="ltr"><<a href="mailto:nrotem@apple.com" target="_blank">nrotem@apple.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div id=":ab3" class="a3s" style="overflow:hidden">Hi Chandler,<br>

<br>

Thank you for working on this. Lowering shuffles on X86 is challenging and I am glad that you are rewriting and improving this code. Everything looks great.<br>

<div class=""><br>

><br>

> Once SSE2 is polished a bit I should be able to get interesting numbers<br>

> on performance improvements on benchmarks conducive to vectorization.<br>

> All of this will be off by default until it is functionally equivalent<br>

> of course.<br>

<br>

</div>I was wondering how you plan to benchmark this code. The vectorizers don’t generate interesting shuffle pattern (mainly reverse and broadcast)</div></blockquote><div><br></div><div>My work here is motivated specifically by shuffles generated by the vectorizer, so I'm not sure why you think they don't generate interesting shuffle patterns.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div id=":ab3" class="a3s" style="overflow:hidden"> and traditionally interesting shuffle patterns came from hand written code and from code in the domain of graphics, like OpenCL, OpenGL and ISPC. Is there a specific benchmark that you think could be useful?</div>

</blockquote></div><br>I have a decent number of both hand vectorized code and code which vectorizes well; I expect to be able to get reasonable baseline benchmarks from this.</div><div class="gmail_extra"><br></div><div class="gmail_extra">

Also, it is pretty easy to look at the output before and after and understand pretty clearly the likely performance characteristics.</div></div>