[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

Tue Sep 23 04:28:53 PDT 2014

On Sun, Sep 21, 2014 at 1:15 PM, Simon Pilgrim <llvm-dev at redking.me.uk>
wrote:

> On 20 Sep 2014, at 19:44, Chandler Carruth <chandlerc at google.com> wrote:
>
> > If AVX is available I would expect the vpermilps/vpermilpd instruction
> to be used for all float/double single vector shuffles, especially as it
> can deal with the folded load case as well - this would avoid the
> integer/float execution domain transfer issue with using vpshufd.
> >
> > Yes, this is the obvious solution to folding memory loads. It just isn't
> implemented yet.
> >
> > Well, actually it is, but I haven't finished writing tests for it. =]
>
> Thanks Chandler - vpermilps/vpermilpd generation looks great now.
>
> I've found another regression - byte shifts on pre-ssse3 targets are
> failing to make use of the vpslldq/vpsrldq instructions - I've attached
> some basic test cases.
>
> Could vpslldq/vpsrldq be used on ssse3+ targets for the cases where zeros
> are being shifted in? It avoids the need for a zero register (although they
> aren't as good for memory folding).

I'm curious, how important is this? This lowering has always seemed deeply
magical and unlikely to be necessary in practice. palignr at least allows
blending.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140923/a5156662/attachment.html>