[PATCH] [X86] tranform insertps to blendps when possible for better performance

Tue Mar 3 16:42:24 PST 2015

Sorry for the delays...

On Sun, Mar 1, 2015 at 9:24 AM, Sanjay Patel <spatel at rotateright.com> wrote:

> In http://reviews.llvm.org/D7866#131618, @chandlerc wrote:
>
> > I'm not really sold on doing this as a target combine. Is there some
> reason we can't just produce the desired insertps or blendps when lowering?
> This doesn't seem likely to come up only after doing some other shuffle
> lowering, but maybe I'm not seeing why.
>
>
> Let me answer this question first before going to wrangle up some data on
> the other question:
> I made this a target combine because I don't know how else to handle this
> case given our current intrinsic lowering:
>
>   define <4 x float> @blendps(<4 x float> %x, <4 x float> %y) {
>     %0 = tail call <4 x float> @llvm.x86.sse41.insertps(<4 x float> %x, <4
> x float> %y, i8 0)
>     ret <4 x float> %0
>   }
>
> This is the IR produced when a programmer uses SSE intrinsics in C source.
> It directly becomes an INSERTPS node via:
>
>   X86_INTRINSIC_DATA(sse41_insertps,    INTR_TYPE_3OP, X86ISD::INSERTPS, 0)

I think we just need to change the SSE intrinsics to use generic shuffle IR
rather than intrinsics. We shouldn't be worrying about re-combining the
LLVM instruction intrinsics in the backend to speed things up. We should
insist that code use generic IR as input if they want this kind of
combining.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150303/8b5b21bc/attachment.html>