[PATCH] Added more insertps optimizations

Andrea Di Biagio Andrea_DiBiagio at sn.scee.net
Sat May 17 05:35:34 PDT 2014


Hi Filipe,

I tested yout patch and it works for me.
If you address the (minor) comments below, then the patch looks good to me!

================
Comment at: lib/Target/X86/X86ISelLowering.cpp:20287
@@ +20286,3 @@
+  if (MayFoldLoad(Ld)) {
+    unsigned DestIndex =
+        cast<ConstantSDNode>(N->getOperand(2))->getZExtValue() >> 6;
----------------
It might be useful to have a comment here explaining why you need a shift.
When the source is a memory operand, the Count_S bits of the immediate operand are not used to select the floating point element from the source memory location.
That's why we have to extract the 'Count_S' bits from the immediate operand and use them as 'index' for a new load instruction.

================
Comment at: lib/Target/X86/X86InstrSSE.td:6553
@@ -6552,1 +6552,3 @@
 
+let Predicates = [UseSSE41] in
+  // If we're inserting an element from a load or a null pshuf of a load,
----------------
You forgot to enclose both patterns between curly braces.
It still works fine because we never produce an X86insertps dag node if we don't have SSE4.1 :-)

================
Comment at: lib/Target/X86/X86InstrSSE.td:6564
@@ +6563,3 @@
+
+let Predicates = [UseAVX] in
+  // If we're inserting an element from a vbroadcast of a load, fold the
----------------
Same here, you should enclose the following two patterns between curly braces.

================
Comment at: test/CodeGen/X86/avx.ll:6-28
@@ +5,25 @@
+
+define <4 x i32> @blendvb_fallback_v4i32(<4 x i1> %mask, <4 x i32> %x, <4 x i32> %y) {
+; CHECK-LABEL: @blendvb_fallback_v4i32
+; CHECK: vblendvps
+; CHECK: ret
+  %ret = select <4 x i1> %mask, <4 x i32> %x, <4 x i32> %y
+  ret <4 x i32> %ret
+}
+
+define <8 x i32> @blendvb_fallback_v8i32(<8 x i1> %mask, <8 x i32> %x, <8 x i32> %y) {
+; CHECK-LABEL: @blendvb_fallback_v8i32
+; CHECK: vblendvps
+; CHECK: ret
+  %ret = select <8 x i1> %mask, <8 x i32> %x, <8 x i32> %y
+  ret <8 x i32> %ret
+}
+
+define <8 x float> @blendvb_fallback_v8f32(<8 x i1> %mask, <8 x float> %x, <8 x float> %y) {
+; CHECK-LABEL: @blendvb_fallback_v8f32
+; CHECK: vblendvps
+; CHECK: ret
+  %ret = select <8 x i1> %mask, <8 x float> %x, <8 x float> %y
+  ret <8 x float> %ret
+}
+
----------------
These three tests are not part of this patch.
I think you should add those in a separate commit.

http://reviews.llvm.org/D3581






More information about the llvm-commits mailing list