<div dir="ltr">Cool, this looks good to me provided your data indicates this pattern works well on other targets as well. =]<div><br></div><div>Thanks for working on it! Any chance you can also look at the fact that we use vmovq here rather than vmovlpd?</div><div><br></div><div>We also at some point need to do a post-processing of the shuffles and replace ones that use packed double type when there is an equivalent for packed single type and it removes a bitcast.... It would be really awesome to get the "obvious" code of vmovlps + vmovhps here (or some variant of vmovlps that still targeted the floating point vector unit and didn't have an input dependency... mayb vxorps + vmovlps + vmovhps would be best)</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jan 16, 2015 at 3:06 PM, Fiona Glaser <span dir="ltr"><<a href="mailto:fglaser@apple.com" target="_blank">fglaser@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br>

> You probably want vector-shuffle-256-v8.ll, as your tests are producing v8f32 vectors.<br>

<br>

</span>Oops, my mistake, I was thinking 32 was the data type size.<br>

<br>

<br><br>

<br>

Fiona<br></blockquote></div><br></div>