<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Sep 21, 2014 at 1:15 PM, Simon Pilgrim <span dir="ltr"><<a href="mailto:llvm-dev@redking.me.uk" target="_blank">llvm-dev@redking.me.uk</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 20 Sep 2014, at 19:44, Chandler Carruth <<a href="mailto:chandlerc@google.com">chandlerc@google.com</a>> wrote:<br>

<br>

> If AVX is available I would expect the vpermilps/vpermilpd instruction to be used for all float/double single vector shuffles, especially as it can deal with the folded load case as well - this would avoid the integer/float execution domain transfer issue with using vpshufd.<br>

><br>

> Yes, this is the obvious solution to folding memory loads. It just isn't implemented yet.<br>

><br>

> Well, actually it is, but I haven't finished writing tests for it. =]<br>

<br>

</span>Thanks Chandler - vpermilps/vpermilpd generation looks great now.<br>

<br>

I've found another regression - byte shifts on pre-ssse3 targets are failing to make use of the vpslldq/vpsrldq instructions - I've attached some basic test cases.<br>

<br>

Could vpslldq/vpsrldq be used on ssse3+ targets for the cases where zeros are being shifted in? It avoids the need for a zero register (although they aren't as good for memory folding).</blockquote></div><br>I'm curious, how important is this? This lowering has always seemed deeply magical and unlikely to be necessary in practice. palignr at least allows blending.</div></div>