[PATCH] [X86, AVX] instcombine common cases of vperm2* intrinsics into shuffles
spatel at rotateright.com
Fri Mar 20 10:23:12 PDT 2015
Hi andreadb, RKSimon, craig.topper, chandlerc,
vperm2* intrinsics are just shuffles unless a zero mask bit is set. In a few special cases, they're not even shuffles.
Optimizing intrinsics in InstCombine is better than handling this in the front-end for at least two reasons:
1. Optimizing custom-written SSE intrinsic code at -O0 makes vector coders really angry (and so I have some regrets about some patches from last week).
2. Doing mask conversion logic in header files is hard to write and subsequently read.
Unfortunately, we use a magic number (generally assumed to be -1) to specify undef values in shufflevector masks in IR. And apparently, that magic has led to lax coding where we just check <0 for undef. If we had a proper enum for shufflevector mask special values, we could do like the x86 backend has done and easily transform the zero mask bit cases here too. Fixing that could be a follow-on patch. Otherwise, we'll try to deal with matching a 2-shuffle sequence in the x86 backend. But again, that's a separate patch (see the TODO comment in this one).
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 13661 bytes
Desc: not available
More information about the llvm-commits