[PATCH][X86] Add a check for 'isMOVHLPSMask' within method 'isShuffleMaskLegal'.
Andrea Di Biagio
andrea.dibiagio at gmail.com
Tue Jul 15 11:21:14 PDT 2014
Hi,
This patch adds a check for 'isMOVHLPSMask' inside method
'X86TargetLowering::isShuffleMaskLegal'.
The x86 backend already knows how to lower vector shuffles into
X86ISD::MOVHLPS dag nodes. Method 'LowerVECTOR_SHUFFLE' explicitly
calls function 'isMOVHLPSMask' to check if a shuffle dag node in input
can be safely expanded into a single 'movhlps' instruction.
I noticed however that method 'isShuffleMaskLegal' doesn't know that
shuffles implementing a 'movhlps' operation are perfectly legal for
SSE targets.
The reason why we should update that method is because the DAGCombiner
conservatively avoids combining two shuffles if the resulting shuffle
mask would be illegal for the target. At the moment, shuffles with a
MOVHLPS mask are wrongly considered not to be legal by
'isShuffleMaskLegal'.This is the root cause of some poor-code
generation bugs.
This patch fixes method 'isShuffleMaskLegal' adding a check for 'isMOVHLPSMask'.
Given the following IR:
define <4 x i32> @test_movhl_1(<4 x i32> %a, <4 x i32> %b) {
%1 = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 2, i32
7, i32 5, i32 3>
%2 = shufflevector <4 x i32> %1, <4 x i32> %b, <4 x i32> <i32 6, i32
1, i32 0, i32 3>
ret <4 x i32> %2
}
Before this patch, llc (-mcpu=corei7 -march=x86-64) generated the
following sub-optimal assembly sequence:
shufps $126, %xmm1, %xmm0
pshufd $120, %xmm0, %xmm0
shufps $18, %xmm0, %xmm1
shufps $-56, %xmm0, %xmm1
movapd %xmm1, %xmm0
With this patch, the backend recognizes that the two shuffles could be
combined into a single legal shuffle which is then lowered as a single
movhlps instruction:
movhlps %xmm1, %xmm0
I added some other examples in test 'combine-vec-shuffle-3.ll'
Please let me know if ok to submit.
Thanks,
Andrea
-------------- next part --------------
Index: lib/Target/X86/X86ISelLowering.cpp
===================================================================
--- lib/Target/X86/X86ISelLowering.cpp (revision 213081)
+++ lib/Target/X86/X86ISelLowering.cpp (working copy)
@@ -16880,6 +16880,7 @@
return (SVT.getVectorNumElements() == 2 ||
ShuffleVectorSDNode::isSplatMask(&M[0], VT) ||
isMOVLMask(M, SVT) ||
+ isMOVHLPSMask(M, SVT) ||
isSHUFPMask(M, SVT) ||
isPSHUFDMask(M, SVT) ||
isPSHUFHWMask(M, SVT, Subtarget->hasInt256()) ||
Index: test/CodeGen/X86/combine-vec-shuffle-3.ll
===================================================================
--- test/CodeGen/X86/combine-vec-shuffle-3.ll (revision 213081)
+++ test/CodeGen/X86/combine-vec-shuffle-3.ll (working copy)
@@ -35,14 +35,10 @@
%2 = shufflevector <4 x float> %1, <4 x float> %b, <4 x i32> <i32 6, i32 7, i32 0, i32 1>
ret <4 x float> %2
}
-; FIXME: this should be lowered as a single movhlps. However, the backend
-; wrongly thinks that shuffle mask [6,7,2,3] is not legal. Therefore, we
-; end up with the sub-optimal sequence 'shufps, palignr'.
; CHECK-LABEL: test4
; Mask: [6,7,2,3]
-; CHECK: shufps $94
-; CHECK: palignr $8
-; CHECK: ret
+; CHECK: movhlps
+; CHECK-NEXT: ret
define <4 x float> @test5(<4 x float> %a, <4 x float> %b) {
%1 = shufflevector <4 x float> %a, <4 x float> %b, <4 x i32> <i32 4, i32 1, i32 6, i32 3>
@@ -90,13 +86,10 @@
%2 = shufflevector <4 x i32> %1, <4 x i32> %b, <4 x i32> <i32 6, i32 7, i32 0, i32 1>
ret <4 x i32> %2
}
-; FIXME: this should be lowered as a single movhlps. However, the backend thinks that
-; shuffle mask [6,7,2,3] is not legal.
; CHECK-LABEL: test9
; Mask: [6,7,2,3]
-; CHECK: shufps $94
-; CHECK: palignr $8
-; CHECK: ret
+; CHECK: movhlps
+; CHECK-NEXT: ret
define <4 x i32> @test10(<4 x i32> %a, <4 x i32> %b) {
%1 = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 4, i32 1, i32 6, i32 3>
@@ -144,13 +137,9 @@
%2 = shufflevector <4 x float> %1, <4 x float> %a, <4 x i32> <i32 6, i32 7, i32 0, i32 1>
ret <4 x float> %2
}
-; FIXME: this should be lowered as a single movhlps. However, the backend
-; wrongly thinks that shuffle mask [6,7,2,3] is not legal. Therefore, we
-; end up with the sub-optimal sequence 'pshufd, palignr'.
; CHECK-LABEL: test14
; Mask: [6,7,2,3]
-; CHECK: pshufd $94
-; CHECK: palignr $8
+; CHECK: movhlps
; CHECK: ret
define <4 x float> @test15(<4 x float> %a, <4 x float> %b) {
@@ -199,13 +188,9 @@
%2 = shufflevector <4 x i32> %1, <4 x i32> %a, <4 x i32> <i32 6, i32 7, i32 0, i32 1>
ret <4 x i32> %2
}
-; FIXME: this should be lowered as a single movhlps. However, the backend
-; wrongly thinks that shuffle mask [6,7,2,3] is not legal. Therefore, we
-; end up with the sub-optimal sequence 'shufps, palignr'.
; CHECK-LABEL: test19
; Mask: [6,7,2,3]
-; CHECK: pshufd $94
-; CHECK: palignr $8
+; CHECK: movhlps
; CHECK: ret
define <4 x i32> @test20(<4 x i32> %a, <4 x i32> %b) {
@@ -371,3 +356,30 @@
; CHECK: movss
; CHECK: ret
+define <4 x i32> @test_movhl_1(<4 x i32> %a, <4 x i32> %b) {
+ %1 = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 2, i32 7, i32 5, i32 3>
+ %2 = shufflevector <4 x i32> %1, <4 x i32> %b, <4 x i32> <i32 6, i32 1, i32 0, i32 3>
+ ret <4 x i32> %2
+}
+; CHECK-LABEL: test_movlh_1
+; CHECK: movhlps
+; CHECK-NEXT: ret
+
+define <4 x i32> @test_movhl_2(<4 x i32> %a, <4 x i32> %b) {
+ %1 = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 2, i32 0, i32 3, i32 6>
+ %2 = shufflevector <4 x i32> %1, <4 x i32> %b, <4 x i32> <i32 3, i32 7, i32 0, i32 2>
+ ret <4 x i32> %2
+}
+; CHECK-LABEL: test_movlh_2
+; CHECK: movhlps
+; CHECK-NEXT: ret
+
+define <4 x i32> @test_movhl_3(<4 x i32> %a, <4 x i32> %b) {
+ %1 = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x i32> <i32 7, i32 6, i32 3, i32 2>
+ %2 = shufflevector <4 x i32> %1, <4 x i32> %b, <4 x i32> <i32 6, i32 0, i32 3, i32 2>
+ ret <4 x i32> %2
+}
+; CHECK-LABEL: test_movlh_3
+; CHECK: movhlps
+; CHECK-NEXT: ret
+
Index: test/CodeGen/X86/combine-vec-shuffle-4.ll
===================================================================
--- test/CodeGen/X86/combine-vec-shuffle-4.ll (revision 213081)
+++ test/CodeGen/X86/combine-vec-shuffle-4.ll (working copy)
@@ -38,14 +38,10 @@
%2 = shufflevector <4 x float> %1, <4 x float> %b, <4 x i32> <i32 6, i32 7, i32 0, i32 1>
ret <4 x float> %2
}
-; FIXME: this should be lowered as a single movhlps. However, the backend
-; wrongly thinks that shuffle mask [6,7,2,3] is not legal. Therefore, we
-; end up with the sub-optimal sequence 'movhlps, palignr'.
; CHECK-LABEL: test4
; Mask: [6,7,2,3]
; CHECK: movhlps
-; CHECK: palignr $8
-; CHECK: ret
+; CHECK-NEXT: ret
define <4 x float> @test5(<4 x float> %a, <4 x float> %b) {
%1 = shufflevector <4 x float> %a, <4 x float> undef, <4 x i32> <i32 0, i32 4, i32 1, i32 3>
More information about the llvm-commits
mailing list