[PATCH] [X86, AVX] instcombine common cases of vperm2* intrinsics into shuffles

Sanjay Patel spatel at rotateright.com
Fri Mar 20 12:45:35 PDT 2015

Comment at: lib/Transforms/InstCombine/InstCombineCalls.cpp:203
@@ +202,3 @@
+/// then ignore that half of the mask and clear that half of the vector.
+Instruction *InstCombiner::SimplifyX86vperm2(IntrinsicInst &II) {
+  VectorType *VecTy = cast<VectorType>(II.getType());
andreadb wrote:
> Why not have a static function instead? If you use a function, you can avoid to to change 'InstCombineInternal.h'.
No reason other than cut and paste from the code around this. If this is static, we'd have to pass in a reference to the InstCombiner in order to use ReplaceInstUsesWith(), right? Do you think that's cleaner?

Comment at: lib/Transforms/InstCombine/InstCombineCalls.cpp:245-248
@@ +244,6 @@
+    // TODO: If a single zero bit is set, we can model this vperm2 as a shuffle
+    // of the source vectors followed by a shuffle with a zero vector, but the
+    // x86 backend needs to be able to recognize that pattern and reconstitute
+    // a vperm2 instruction if necessary.
andreadb wrote:
> I don't think you need two shuffles for this case.
> If either b[3] or b[7] is set, you can always convert the 'vperm2f128' into a single shuffle instruction where one of the operands is a zero vector, and the other operand is either Op0 or Op2.
> Example:
> With mask 'Imm' equal to '0x08', the low 128-bit lane of the result is always going to be zero.
> The upper 128-bit lane is goint to be Op0[127:0]. So, you can expand the input 'x86_avx_vperm2f128_ps_256' into:
>   shufflevector <8 x float> zeroinitializer, <8 x float> Op0, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 8, i32 9, i32 10, i32 11>
> Basically, what I am trying to say is that if a 'zero bit' is set, then one of the input vectors becomes redundant.
> You can reuse the same logic you added for the case where 'zero mask bits are not set' to handle _all_ the possible cases. You would only need to check in advance if zero bits are set, and replace either Op0 or Op1 (or both) with a zero vector if needed.
Ah, nice catch. I'll adjust the comment and make that a small follow-on patch if it's ok with you.



More information about the llvm-commits mailing list