[PATCH] Added more insertps optimizations
Andrea Di Biagio
Andrea_DiBiagio at sn.scee.net
Sat May 17 05:35:34 PDT 2014
Hi Filipe,
I tested yout patch and it works for me.
If you address the (minor) comments below, then the patch looks good to me!
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:20287
@@ +20286,3 @@
+ if (MayFoldLoad(Ld)) {
+ unsigned DestIndex =
+ cast<ConstantSDNode>(N->getOperand(2))->getZExtValue() >> 6;
----------------
It might be useful to have a comment here explaining why you need a shift.
When the source is a memory operand, the Count_S bits of the immediate operand are not used to select the floating point element from the source memory location.
That's why we have to extract the 'Count_S' bits from the immediate operand and use them as 'index' for a new load instruction.
================
Comment at: lib/Target/X86/X86InstrSSE.td:6553
@@ -6552,1 +6552,3 @@
+let Predicates = [UseSSE41] in
+ // If we're inserting an element from a load or a null pshuf of a load,
----------------
You forgot to enclose both patterns between curly braces.
It still works fine because we never produce an X86insertps dag node if we don't have SSE4.1 :-)
================
Comment at: lib/Target/X86/X86InstrSSE.td:6564
@@ +6563,3 @@
+
+let Predicates = [UseAVX] in
+ // If we're inserting an element from a vbroadcast of a load, fold the
----------------
Same here, you should enclose the following two patterns between curly braces.
================
Comment at: test/CodeGen/X86/avx.ll:6-28
@@ +5,25 @@
+
+define <4 x i32> @blendvb_fallback_v4i32(<4 x i1> %mask, <4 x i32> %x, <4 x i32> %y) {
+; CHECK-LABEL: @blendvb_fallback_v4i32
+; CHECK: vblendvps
+; CHECK: ret
+ %ret = select <4 x i1> %mask, <4 x i32> %x, <4 x i32> %y
+ ret <4 x i32> %ret
+}
+
+define <8 x i32> @blendvb_fallback_v8i32(<8 x i1> %mask, <8 x i32> %x, <8 x i32> %y) {
+; CHECK-LABEL: @blendvb_fallback_v8i32
+; CHECK: vblendvps
+; CHECK: ret
+ %ret = select <8 x i1> %mask, <8 x i32> %x, <8 x i32> %y
+ ret <8 x i32> %ret
+}
+
+define <8 x float> @blendvb_fallback_v8f32(<8 x i1> %mask, <8 x float> %x, <8 x float> %y) {
+; CHECK-LABEL: @blendvb_fallback_v8f32
+; CHECK: vblendvps
+; CHECK: ret
+ %ret = select <8 x i1> %mask, <8 x float> %x, <8 x float> %y
+ ret <8 x float> %ret
+}
+
----------------
These three tests are not part of this patch.
I think you should add those in a separate commit.
http://reviews.llvm.org/D3581
More information about the llvm-commits
mailing list