[llvm] r344487 - [X86] Move promotion of vector and/or/xor from legalization to DAG combine
Craig Topper via llvm-commits
llvm-commits at lists.llvm.org
Sun Oct 14 18:51:58 PDT 2018
Author: ctopper
Date: Sun Oct 14 18:51:58 2018
New Revision: 344487
URL: http://llvm.org/viewvc/llvm-project?rev=344487&view=rev
Log:
[X86] Move promotion of vector and/or/xor from legalization to DAG combine
Summary:
I've noticed that the bitcasts we introduce for these make computeKnownBits and computeNumSignBits not work well in LegalizeVectorOps. LegalizeVectorOps legalizes bottom up while LegalizeDAG legalizes top down. The bottom up strategy for LegalizeVectorOps means operands are legalized before their uses. So we promote and/or/xor before we legalize the operands that use them making computeKnownBits/computeNumSignBits in places like LowerTruncate suboptimal. I looked at changing LegalizeVectorOps to be top down as well, but that was more disruptive and caused some regressions. I also looked at just moving promotion of binops to LegalizeDAG, but that had a few issues one around matching AND,ANDN,OR into VSELECT because I had to create ANDN as vXi64, but the other nodes hadn't legalized yet, I didn't look too hard at fixing that.
This patch seems to produce better results overall than my other attempts. We now form broadcasts of constants better in some cases. For at least some of them the AND was being introduced in LegalizeDAG, promoted to vXi64, and the BUILD_VECTOR was also legalized there. I think we got bad ordering of that. Now the promotion is out of the legalizer so we handle this better.
In the longer term I think we really should evaluate whether we should be doing this promotion at all. It's really there to reduce isel pattern count, but I'm wondering if we'd be better served just eating the pattern cost or doing C++ based isel for vector and/or/xor in X86ISelDAGToDAG. The masked and/or/xor will definitely be difficult in patterns if a bitcast gets between the vselect and the and/or/xor node. That becomes a lot of permutations to cover.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D53107
Modified:
llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
llvm/trunk/test/CodeGen/X86/avx-logic.ll
llvm/trunk/test/CodeGen/X86/avx512-ext.ll
llvm/trunk/test/CodeGen/X86/avx512-insert-extract.ll
llvm/trunk/test/CodeGen/X86/avx512-schedule.ll
llvm/trunk/test/CodeGen/X86/avx512vl-vec-masked-cmp.ll
llvm/trunk/test/CodeGen/X86/cast-vsel.ll
llvm/trunk/test/CodeGen/X86/combine-sdiv.ll
llvm/trunk/test/CodeGen/X86/combine-srl.ll
llvm/trunk/test/CodeGen/X86/gather-addresses.ll
llvm/trunk/test/CodeGen/X86/horizontal-reduce-umax.ll
llvm/trunk/test/CodeGen/X86/horizontal-reduce-umin.ll
llvm/trunk/test/CodeGen/X86/known-bits.ll
llvm/trunk/test/CodeGen/X86/nontemporal-loads.ll
llvm/trunk/test/CodeGen/X86/paddus.ll
llvm/trunk/test/CodeGen/X86/psubus.ll
llvm/trunk/test/CodeGen/X86/sat-add.ll
llvm/trunk/test/CodeGen/X86/setcc-lowering.ll
llvm/trunk/test/CodeGen/X86/sse2-intrinsics-canonical.ll
llvm/trunk/test/CodeGen/X86/unfold-masked-merge-vector-variablemask-const.ll
llvm/trunk/test/CodeGen/X86/v8i1-masks.ll
llvm/trunk/test/CodeGen/X86/vector-blend.ll
llvm/trunk/test/CodeGen/X86/vector-reduce-umax.ll
llvm/trunk/test/CodeGen/X86/vector-reduce-umin.ll
llvm/trunk/test/CodeGen/X86/vector-shift-lshr-128.ll
llvm/trunk/test/CodeGen/X86/vector-shift-shl-128.ll
llvm/trunk/test/CodeGen/X86/vector-shuffle-256-v16.ll
llvm/trunk/test/CodeGen/X86/vector-trunc-math.ll
llvm/trunk/test/CodeGen/X86/vector-trunc-packus.ll
llvm/trunk/test/CodeGen/X86/vector-trunc-ssat.ll
llvm/trunk/test/CodeGen/X86/vector-trunc-usat.ll
llvm/trunk/test/CodeGen/X86/vector-trunc.ll
llvm/trunk/test/CodeGen/X86/vshift-6.ll
llvm/trunk/test/CodeGen/X86/x86-interleaved-access.ll
Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
+++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Sun Oct 14 18:51:58 2018
@@ -871,9 +871,6 @@ X86TargetLowering::X86TargetLowering(con
// Promote v16i8, v8i16, v4i32 load, select, and, or, xor to v2i64.
for (auto VT : { MVT::v16i8, MVT::v8i16, MVT::v4i32 }) {
- setOperationPromotedToType(ISD::AND, VT, MVT::v2i64);
- setOperationPromotedToType(ISD::OR, VT, MVT::v2i64);
- setOperationPromotedToType(ISD::XOR, VT, MVT::v2i64);
setOperationPromotedToType(ISD::LOAD, VT, MVT::v2i64);
}
@@ -1183,9 +1180,6 @@ X86TargetLowering::X86TargetLowering(con
// Promote v32i8, v16i16, v8i32 select, and, or, xor to v4i64.
for (auto VT : { MVT::v32i8, MVT::v16i16, MVT::v8i32 }) {
- setOperationPromotedToType(ISD::AND, VT, MVT::v4i64);
- setOperationPromotedToType(ISD::OR, VT, MVT::v4i64);
- setOperationPromotedToType(ISD::XOR, VT, MVT::v4i64);
setOperationPromotedToType(ISD::LOAD, VT, MVT::v4i64);
}
@@ -1384,13 +1378,6 @@ X86TargetLowering::X86TargetLowering(con
setCondCodeAction(ISD::SETLE, VT, Custom);
}
- // Need to promote to 64-bit even though we have 32-bit masked instructions
- // because the IR optimizers rearrange bitcasts around logic ops leaving
- // too many variations to handle if we don't promote them.
- setOperationPromotedToType(ISD::AND, MVT::v16i32, MVT::v8i64);
- setOperationPromotedToType(ISD::OR, MVT::v16i32, MVT::v8i64);
- setOperationPromotedToType(ISD::XOR, MVT::v16i32, MVT::v8i64);
-
if (Subtarget.hasDQI()) {
setOperationAction(ISD::SINT_TO_FP, MVT::v8i64, Legal);
setOperationAction(ISD::UINT_TO_FP, MVT::v8i64, Legal);
@@ -1593,10 +1580,6 @@ X86TargetLowering::X86TargetLowering(con
setOperationAction(ISD::UMIN, VT, Legal);
setOperationAction(ISD::SETCC, VT, Custom);
- setOperationPromotedToType(ISD::AND, VT, MVT::v8i64);
- setOperationPromotedToType(ISD::OR, VT, MVT::v8i64);
- setOperationPromotedToType(ISD::XOR, VT, MVT::v8i64);
-
// The condition codes aren't legal in SSE/AVX and under AVX512 we use
// setcc all the way to isel and prefer SETGT in some isel patterns.
setCondCodeAction(ISD::SETLT, VT, Custom);
@@ -35226,6 +35209,10 @@ static SDValue combineAndMaskToShift(SDN
!SplatVal.isMask())
return SDValue();
+ // Don't prevent creation of ANDN.
+ if (isBitwiseNot(Op0))
+ return SDValue();
+
if (!SupportedVectorShiftWithImm(VT0.getSimpleVT(), Subtarget, ISD::SRL))
return SDValue();
@@ -35426,6 +35413,27 @@ static SDValue combineParity(SDNode *N,
return DAG.getNode(ISD::ZERO_EXTEND, DL, N->getValueType(0), Setnp);
}
+// This promotes vectors and/or/xor to a vXi64 type. We used to do this during
+// op legalization, but DAG combine yields better results.
+// TODO: This is largely just to reduce the number of isel patterns. Maybe we
+// can just add all the patterns or do C++ based selection in X86ISelDAGToDAG?
+static SDValue promoteVecLogicOp(SDNode *N, SelectionDAG &DAG) {
+ MVT VT = N->getSimpleValueType(0);
+
+ if (!VT.is128BitVector() && !VT.is256BitVector() && !VT.is512BitVector())
+ return SDValue();
+
+ // Already correct type.
+ if (VT.getVectorElementType() == MVT::i64)
+ return SDValue();
+
+ MVT NewVT = MVT::getVectorVT(MVT::i64, VT.getSizeInBits() / 64);
+ SDValue Op0 = DAG.getBitcast(NewVT, N->getOperand(0));
+ SDValue Op1 = DAG.getBitcast(NewVT, N->getOperand(1));
+ return DAG.getBitcast(VT, DAG.getNode(N->getOpcode(), SDLoc(N), NewVT,
+ Op0, Op1));
+}
+
static SDValue combineAnd(SDNode *N, SelectionDAG &DAG,
TargetLowering::DAGCombinerInfo &DCI,
const X86Subtarget &Subtarget) {
@@ -35460,6 +35468,9 @@ static SDValue combineAnd(SDNode *N, Sel
if (DCI.isBeforeLegalizeOps())
return SDValue();
+ if (SDValue V = promoteVecLogicOp(N, DAG))
+ return V;
+
if (SDValue R = combineCompareEqual(N, DAG, DCI, Subtarget))
return R;
@@ -35782,6 +35793,9 @@ static SDValue combineOr(SDNode *N, Sele
if (DCI.isBeforeLegalizeOps())
return SDValue();
+ if (SDValue V = promoteVecLogicOp(N, DAG))
+ return V;
+
if (SDValue R = combineCompareEqual(N, DAG, DCI, Subtarget))
return R;
@@ -37810,6 +37824,9 @@ static SDValue combineXor(SDNode *N, Sel
if (DCI.isBeforeLegalizeOps())
return SDValue();
+ if (SDValue V = promoteVecLogicOp(N, DAG))
+ return V;
+
if (SDValue SetCC = foldXor1SetCC(N, DAG))
return SetCC;
Modified: llvm/trunk/test/CodeGen/X86/avx-logic.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-logic.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/avx-logic.ll (original)
+++ llvm/trunk/test/CodeGen/X86/avx-logic.ll Sun Oct 14 18:51:58 2018
@@ -314,7 +314,7 @@ define <8 x i32> @and_disguised_i8_elts(
; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm1
; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0
; AVX1-NEXT: vpaddd %xmm1, %xmm0, %xmm0
-; AVX1-NEXT: vmovdqa {{.*#+}} xmm1 = [1095216660735,1095216660735]
+; AVX1-NEXT: vmovdqa {{.*#+}} xmm1 = [255,255,255,255]
; AVX1-NEXT: vpand %xmm1, %xmm0, %xmm0
; AVX1-NEXT: vextractf128 $1, %ymm2, %xmm4
; AVX1-NEXT: vpaddd %xmm4, %xmm0, %xmm0
@@ -342,7 +342,7 @@ define <8 x i32> @andn_disguised_i8_elts
; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm4
; AVX1-NEXT: vpaddd %xmm3, %xmm4, %xmm3
; AVX1-NEXT: vpaddd %xmm0, %xmm1, %xmm0
-; AVX1-NEXT: vmovdqa {{.*#+}} xmm1 = [1095216660735,1095216660735]
+; AVX1-NEXT: vmovdqa {{.*#+}} xmm1 = [255,255,255,255]
; AVX1-NEXT: vpandn %xmm1, %xmm0, %xmm0
; AVX1-NEXT: vpandn %xmm1, %xmm3, %xmm1
; AVX1-NEXT: vextractf128 $1, %ymm2, %xmm3
@@ -450,7 +450,7 @@ define <8 x i32> @or_disguised_i8_elts(<
; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm1
; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0
; AVX1-NEXT: vpaddd %xmm1, %xmm0, %xmm0
-; AVX1-NEXT: vmovdqa {{.*#+}} xmm1 = [1095216660735,1095216660735]
+; AVX1-NEXT: vmovdqa {{.*#+}} xmm1 = [255,255,255,255]
; AVX1-NEXT: vpor %xmm1, %xmm0, %xmm0
; AVX1-NEXT: vextractf128 $1, %ymm2, %xmm4
; AVX1-NEXT: vpaddd %xmm4, %xmm0, %xmm0
@@ -479,7 +479,7 @@ define <8 x i32> @xor_disguised_i8_elts(
; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm1
; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0
; AVX1-NEXT: vpaddd %xmm1, %xmm0, %xmm0
-; AVX1-NEXT: vmovdqa {{.*#+}} xmm1 = [1095216660735,1095216660735]
+; AVX1-NEXT: vmovdqa {{.*#+}} xmm1 = [255,255,255,255]
; AVX1-NEXT: vpxor %xmm1, %xmm0, %xmm0
; AVX1-NEXT: vextractf128 $1, %ymm2, %xmm4
; AVX1-NEXT: vpaddd %xmm4, %xmm0, %xmm0
@@ -537,7 +537,7 @@ define <8 x i32> @or_disguised_i16_elts(
; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm1
; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0
; AVX1-NEXT: vpaddd %xmm1, %xmm0, %xmm0
-; AVX1-NEXT: vmovdqa {{.*#+}} xmm1 = [281470681808895,281470681808895]
+; AVX1-NEXT: vmovdqa {{.*#+}} xmm1 = [65535,65535,65535,65535]
; AVX1-NEXT: vpor %xmm1, %xmm0, %xmm0
; AVX1-NEXT: vextractf128 $1, %ymm2, %xmm4
; AVX1-NEXT: vpaddd %xmm4, %xmm0, %xmm0
@@ -566,7 +566,7 @@ define <8 x i32> @xor_disguised_i16_elts
; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm1
; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm0
; AVX1-NEXT: vpaddd %xmm1, %xmm0, %xmm0
-; AVX1-NEXT: vmovdqa {{.*#+}} xmm1 = [281470681808895,281470681808895]
+; AVX1-NEXT: vmovdqa {{.*#+}} xmm1 = [65535,65535,65535,65535]
; AVX1-NEXT: vpxor %xmm1, %xmm0, %xmm0
; AVX1-NEXT: vextractf128 $1, %ymm2, %xmm4
; AVX1-NEXT: vpaddd %xmm4, %xmm0, %xmm0
Modified: llvm/trunk/test/CodeGen/X86/avx512-ext.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx512-ext.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/avx512-ext.ll (original)
+++ llvm/trunk/test/CodeGen/X86/avx512-ext.ll Sun Oct 14 18:51:58 2018
@@ -2157,7 +2157,7 @@ define <32 x i8> @zext_32xi1_to_32xi8(<3
define <4 x i32> @zext_4xi1_to_4x32(<4 x i8> %x, <4 x i8> %y) #0 {
; ALL-LABEL: zext_4xi1_to_4x32:
; ALL: # %bb.0:
-; ALL-NEXT: vmovdqa {{.*#+}} xmm2 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0]
+; ALL-NEXT: vpbroadcastd {{.*#+}} xmm2 = [255,255,255,255]
; ALL-NEXT: vpand %xmm2, %xmm1, %xmm1
; ALL-NEXT: vpand %xmm2, %xmm0, %xmm0
; ALL-NEXT: vpcmpeqd %xmm1, %xmm0, %xmm0
@@ -2171,7 +2171,7 @@ define <4 x i32> @zext_4xi1_to_4x32(<4 x
define <2 x i64> @zext_2xi1_to_2xi64(<2 x i8> %x, <2 x i8> %y) #0 {
; ALL-LABEL: zext_2xi1_to_2xi64:
; ALL: # %bb.0:
-; ALL-NEXT: vmovdqa {{.*#+}} xmm2 = [255,0,0,0,0,0,0,0,255,0,0,0,0,0,0,0]
+; ALL-NEXT: vpbroadcastq {{.*#+}} xmm2 = [255,255]
; ALL-NEXT: vpand %xmm2, %xmm1, %xmm1
; ALL-NEXT: vpand %xmm2, %xmm0, %xmm0
; ALL-NEXT: vpcmpeqq %xmm1, %xmm0, %xmm0
Modified: llvm/trunk/test/CodeGen/X86/avx512-insert-extract.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx512-insert-extract.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/avx512-insert-extract.ll (original)
+++ llvm/trunk/test/CodeGen/X86/avx512-insert-extract.ll Sun Oct 14 18:51:58 2018
@@ -993,7 +993,6 @@ define zeroext i8 @test_extractelement_v
; KNL-NEXT: vpcmpeqb %ymm1, %ymm0, %ymm0
; KNL-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; KNL-NEXT: vpmovsxbd %xmm0, %zmm0
-; KNL-NEXT: vpslld $31, %zmm0, %zmm0
; KNL-NEXT: vptestmd %zmm0, %zmm0, %k0
; KNL-NEXT: kshiftrw $2, %k0, %k0
; KNL-NEXT: kmovw %k0, %eax
@@ -1023,7 +1022,6 @@ define zeroext i8 @test_extractelement_v
; KNL-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; KNL-NEXT: vextracti128 $1, %ymm0, %xmm0
; KNL-NEXT: vpmovsxbd %xmm0, %zmm0
-; KNL-NEXT: vpslld $31, %zmm0, %zmm0
; KNL-NEXT: vptestmd %zmm0, %zmm0, %k0
; KNL-NEXT: kshiftrw $15, %k0, %k0
; KNL-NEXT: kmovw %k0, %eax
@@ -1059,7 +1057,6 @@ define zeroext i8 @extractelement_v64i1_
; KNL-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; KNL-NEXT: vextracti128 $1, %ymm0, %xmm0
; KNL-NEXT: vpmovsxbd %xmm0, %zmm0
-; KNL-NEXT: vpslld $31, %zmm0, %zmm0
; KNL-NEXT: vptestmd %zmm0, %zmm0, %k0
; KNL-NEXT: kshiftrw $15, %k0, %k0
; KNL-NEXT: kmovw %k0, %eax
Modified: llvm/trunk/test/CodeGen/X86/avx512-schedule.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx512-schedule.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/avx512-schedule.ll (original)
+++ llvm/trunk/test/CodeGen/X86/avx512-schedule.ll Sun Oct 14 18:51:58 2018
@@ -4711,7 +4711,7 @@ define <32 x i8> @zext_32xi1_to_32xi8(<3
define <4 x i32> @zext_4xi1_to_4x32(<4 x i8> %x, <4 x i8> %y) #0 {
; GENERIC-LABEL: zext_4xi1_to_4x32:
; GENERIC: # %bb.0:
-; GENERIC-NEXT: vmovdqa {{.*#+}} xmm2 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0] sched: [6:0.50]
+; GENERIC-NEXT: vpbroadcastd {{.*#+}} xmm2 = [255,255,255,255] sched: [7:0.50]
; GENERIC-NEXT: vpand %xmm2, %xmm1, %xmm1 # sched: [1:0.33]
; GENERIC-NEXT: vpand %xmm2, %xmm0, %xmm0 # sched: [1:0.33]
; GENERIC-NEXT: vpcmpeqd %xmm1, %xmm0, %xmm0 # sched: [1:0.50]
@@ -4720,7 +4720,7 @@ define <4 x i32> @zext_4xi1_to_4x32(<4 x
;
; SKX-LABEL: zext_4xi1_to_4x32:
; SKX: # %bb.0:
-; SKX-NEXT: vmovdqa {{.*#+}} xmm2 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0] sched: [6:0.50]
+; SKX-NEXT: vpbroadcastd {{.*#+}} xmm2 = [255,255,255,255] sched: [6:0.50]
; SKX-NEXT: vpand %xmm2, %xmm1, %xmm1 # sched: [1:0.33]
; SKX-NEXT: vpand %xmm2, %xmm0, %xmm0 # sched: [1:0.33]
; SKX-NEXT: vpcmpeqd %xmm1, %xmm0, %xmm0 # sched: [1:0.50]
@@ -4734,7 +4734,7 @@ define <4 x i32> @zext_4xi1_to_4x32(<4 x
define <2 x i64> @zext_2xi1_to_2xi64(<2 x i8> %x, <2 x i8> %y) #0 {
; GENERIC-LABEL: zext_2xi1_to_2xi64:
; GENERIC: # %bb.0:
-; GENERIC-NEXT: vmovdqa {{.*#+}} xmm2 = [255,0,0,0,0,0,0,0,255,0,0,0,0,0,0,0] sched: [6:0.50]
+; GENERIC-NEXT: vpbroadcastq {{.*#+}} xmm2 = [255,255] sched: [7:0.50]
; GENERIC-NEXT: vpand %xmm2, %xmm1, %xmm1 # sched: [1:0.33]
; GENERIC-NEXT: vpand %xmm2, %xmm0, %xmm0 # sched: [1:0.33]
; GENERIC-NEXT: vpcmpeqq %xmm1, %xmm0, %xmm0 # sched: [1:0.50]
@@ -4743,7 +4743,7 @@ define <2 x i64> @zext_2xi1_to_2xi64(<2
;
; SKX-LABEL: zext_2xi1_to_2xi64:
; SKX: # %bb.0:
-; SKX-NEXT: vmovdqa {{.*#+}} xmm2 = [255,0,0,0,0,0,0,0,255,0,0,0,0,0,0,0] sched: [6:0.50]
+; SKX-NEXT: vpbroadcastq {{.*#+}} xmm2 = [255,255] sched: [6:0.50]
; SKX-NEXT: vpand %xmm2, %xmm1, %xmm1 # sched: [1:0.33]
; SKX-NEXT: vpand %xmm2, %xmm0, %xmm0 # sched: [1:0.33]
; SKX-NEXT: vpcmpeqq %xmm1, %xmm0, %xmm0 # sched: [1:0.50]
Modified: llvm/trunk/test/CodeGen/X86/avx512vl-vec-masked-cmp.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx512vl-vec-masked-cmp.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/avx512vl-vec-masked-cmp.ll (original)
+++ llvm/trunk/test/CodeGen/X86/avx512vl-vec-masked-cmp.ll Sun Oct 14 18:51:58 2018
@@ -9780,7 +9780,6 @@ define zeroext i32 @test_vpcmpsgeb_v16i1
; NoVLX-NEXT: vpcmpgtb %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxbd %xmm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: vzeroupper
@@ -9807,7 +9806,6 @@ define zeroext i32 @test_vpcmpsgeb_v16i1
; NoVLX-NEXT: vpcmpgtb %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxbd %xmm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: vzeroupper
@@ -9835,7 +9833,6 @@ define zeroext i32 @test_masked_vpcmpsge
; NoVLX-NEXT: vpcmpgtb %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxbd %xmm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: andl %edi, %eax
@@ -9866,7 +9863,6 @@ define zeroext i32 @test_masked_vpcmpsge
; NoVLX-NEXT: vpcmpgtb %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxbd %xmm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: andl %edi, %eax
@@ -9897,7 +9893,6 @@ define zeroext i64 @test_vpcmpsgeb_v16i1
; NoVLX-NEXT: vpcmpgtb %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxbd %xmm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: movzwl %ax, %eax
@@ -9925,7 +9920,6 @@ define zeroext i64 @test_vpcmpsgeb_v16i1
; NoVLX-NEXT: vpcmpgtb %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxbd %xmm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: movzwl %ax, %eax
@@ -9954,7 +9948,6 @@ define zeroext i64 @test_masked_vpcmpsge
; NoVLX-NEXT: vpcmpgtb %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxbd %xmm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: andl %edi, %eax
@@ -9985,7 +9978,6 @@ define zeroext i64 @test_masked_vpcmpsge
; NoVLX-NEXT: vpcmpgtb %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxbd %xmm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: andl %edi, %eax
@@ -10017,12 +10009,10 @@ define zeroext i64 @test_vpcmpsgeb_v32i1
; NoVLX-NEXT: vpcmpgtb %ymm0, %ymm1, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxbd %xmm0, %zmm1
-; NoVLX-NEXT: vpslld $31, %zmm1, %zmm1
; NoVLX-NEXT: vptestmd %zmm1, %zmm1, %k0
; NoVLX-NEXT: kmovw %k0, %ecx
; NoVLX-NEXT: vextracti128 $1, %ymm0, %xmm0
; NoVLX-NEXT: vpmovsxbd %xmm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: shll $16, %eax
@@ -10052,12 +10042,10 @@ define zeroext i64 @test_vpcmpsgeb_v32i1
; NoVLX-NEXT: vpcmpgtb %ymm0, %ymm1, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxbd %xmm0, %zmm1
-; NoVLX-NEXT: vpslld $31, %zmm1, %zmm1
; NoVLX-NEXT: vptestmd %zmm1, %zmm1, %k0
; NoVLX-NEXT: kmovw %k0, %ecx
; NoVLX-NEXT: vextracti128 $1, %ymm0, %xmm0
; NoVLX-NEXT: vpmovsxbd %xmm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: shll $16, %eax
@@ -10088,14 +10076,12 @@ define zeroext i64 @test_masked_vpcmpsge
; NoVLX-NEXT: vpcmpgtb %ymm0, %ymm1, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxbd %xmm0, %zmm1
-; NoVLX-NEXT: vpslld $31, %zmm1, %zmm1
; NoVLX-NEXT: vptestmd %zmm1, %zmm1, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: andl %edi, %eax
; NoVLX-NEXT: shrl $16, %edi
; NoVLX-NEXT: vextracti128 $1, %ymm0, %xmm0
; NoVLX-NEXT: vpmovsxbd %xmm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %ecx
; NoVLX-NEXT: andl %edi, %ecx
@@ -10130,14 +10116,12 @@ define zeroext i64 @test_masked_vpcmpsge
; NoVLX-NEXT: vpcmpgtb %ymm0, %ymm1, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxbd %xmm0, %zmm1
-; NoVLX-NEXT: vpslld $31, %zmm1, %zmm1
; NoVLX-NEXT: vptestmd %zmm1, %zmm1, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: andl %edi, %eax
; NoVLX-NEXT: shrl $16, %edi
; NoVLX-NEXT: vextracti128 $1, %ymm0, %xmm0
; NoVLX-NEXT: vpmovsxbd %xmm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %ecx
; NoVLX-NEXT: andl %edi, %ecx
@@ -10172,7 +10156,6 @@ define zeroext i16 @test_vpcmpsgew_v8i1_
; NoVLX-NEXT: vpcmpgtw %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwq %xmm0, %zmm0
-; NoVLX-NEXT: vpsllq $63, %zmm0, %zmm0
; NoVLX-NEXT: vptestmq %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: # kill: def $ax killed $ax killed $eax
@@ -10201,7 +10184,6 @@ define zeroext i16 @test_vpcmpsgew_v8i1_
; NoVLX-NEXT: vpcmpgtw %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwq %xmm0, %zmm0
-; NoVLX-NEXT: vpsllq $63, %zmm0, %zmm0
; NoVLX-NEXT: vptestmq %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: # kill: def $ax killed $ax killed $eax
@@ -10231,7 +10213,6 @@ define zeroext i16 @test_masked_vpcmpsge
; NoVLX-NEXT: vpcmpgtw %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwq %xmm0, %zmm0
-; NoVLX-NEXT: vpsllq $63, %zmm0, %zmm0
; NoVLX-NEXT: kmovw %edi, %k1
; NoVLX-NEXT: vptestmq %zmm0, %zmm0, %k0 {%k1}
; NoVLX-NEXT: kmovw %k0, %eax
@@ -10264,7 +10245,6 @@ define zeroext i16 @test_masked_vpcmpsge
; NoVLX-NEXT: vpcmpgtw %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwq %xmm0, %zmm0
-; NoVLX-NEXT: vpsllq $63, %zmm0, %zmm0
; NoVLX-NEXT: kmovw %edi, %k1
; NoVLX-NEXT: vptestmq %zmm0, %zmm0, %k0 {%k1}
; NoVLX-NEXT: kmovw %k0, %eax
@@ -10296,7 +10276,6 @@ define zeroext i32 @test_vpcmpsgew_v8i1_
; NoVLX-NEXT: vpcmpgtw %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwq %xmm0, %zmm0
-; NoVLX-NEXT: vpsllq $63, %zmm0, %zmm0
; NoVLX-NEXT: vptestmq %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: vzeroupper
@@ -10323,7 +10302,6 @@ define zeroext i32 @test_vpcmpsgew_v8i1_
; NoVLX-NEXT: vpcmpgtw %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwq %xmm0, %zmm0
-; NoVLX-NEXT: vpsllq $63, %zmm0, %zmm0
; NoVLX-NEXT: vptestmq %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: vzeroupper
@@ -10351,7 +10329,6 @@ define zeroext i32 @test_masked_vpcmpsge
; NoVLX-NEXT: vpcmpgtw %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwq %xmm0, %zmm0
-; NoVLX-NEXT: vpsllq $63, %zmm0, %zmm0
; NoVLX-NEXT: kmovw %edi, %k1
; NoVLX-NEXT: vptestmq %zmm0, %zmm0, %k0 {%k1}
; NoVLX-NEXT: kmovw %k0, %eax
@@ -10382,7 +10359,6 @@ define zeroext i32 @test_masked_vpcmpsge
; NoVLX-NEXT: vpcmpgtw %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwq %xmm0, %zmm0
-; NoVLX-NEXT: vpsllq $63, %zmm0, %zmm0
; NoVLX-NEXT: kmovw %edi, %k1
; NoVLX-NEXT: vptestmq %zmm0, %zmm0, %k0 {%k1}
; NoVLX-NEXT: kmovw %k0, %eax
@@ -10413,7 +10389,6 @@ define zeroext i64 @test_vpcmpsgew_v8i1_
; NoVLX-NEXT: vpcmpgtw %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwq %xmm0, %zmm0
-; NoVLX-NEXT: vpsllq $63, %zmm0, %zmm0
; NoVLX-NEXT: vptestmq %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: movzwl %ax, %eax
@@ -10441,7 +10416,6 @@ define zeroext i64 @test_vpcmpsgew_v8i1_
; NoVLX-NEXT: vpcmpgtw %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwq %xmm0, %zmm0
-; NoVLX-NEXT: vpsllq $63, %zmm0, %zmm0
; NoVLX-NEXT: vptestmq %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: movzwl %ax, %eax
@@ -10470,7 +10444,6 @@ define zeroext i64 @test_masked_vpcmpsge
; NoVLX-NEXT: vpcmpgtw %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwq %xmm0, %zmm0
-; NoVLX-NEXT: vpsllq $63, %zmm0, %zmm0
; NoVLX-NEXT: kmovw %edi, %k1
; NoVLX-NEXT: vptestmq %zmm0, %zmm0, %k0 {%k1}
; NoVLX-NEXT: kmovw %k0, %eax
@@ -10502,7 +10475,6 @@ define zeroext i64 @test_masked_vpcmpsge
; NoVLX-NEXT: vpcmpgtw %xmm0, %xmm1, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwq %xmm0, %zmm0
-; NoVLX-NEXT: vpsllq $63, %zmm0, %zmm0
; NoVLX-NEXT: kmovw %edi, %k1
; NoVLX-NEXT: vptestmq %zmm0, %zmm0, %k0 {%k1}
; NoVLX-NEXT: kmovw %k0, %eax
@@ -10535,7 +10507,6 @@ define zeroext i32 @test_vpcmpsgew_v16i1
; NoVLX-NEXT: vpcmpgtw %ymm0, %ymm1, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwd %ymm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: vzeroupper
@@ -10563,7 +10534,6 @@ define zeroext i32 @test_vpcmpsgew_v16i1
; NoVLX-NEXT: vpcmpgtw %ymm0, %ymm1, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwd %ymm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: vzeroupper
@@ -10592,7 +10562,6 @@ define zeroext i32 @test_masked_vpcmpsge
; NoVLX-NEXT: vpcmpgtw %ymm0, %ymm1, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwd %ymm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: andl %edi, %eax
@@ -10624,7 +10593,6 @@ define zeroext i32 @test_masked_vpcmpsge
; NoVLX-NEXT: vpcmpgtw %ymm0, %ymm1, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwd %ymm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: andl %edi, %eax
@@ -10656,7 +10624,6 @@ define zeroext i64 @test_vpcmpsgew_v16i1
; NoVLX-NEXT: vpcmpgtw %ymm0, %ymm1, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwd %ymm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: movzwl %ax, %eax
@@ -10685,7 +10652,6 @@ define zeroext i64 @test_vpcmpsgew_v16i1
; NoVLX-NEXT: vpcmpgtw %ymm0, %ymm1, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwd %ymm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: movzwl %ax, %eax
@@ -10715,7 +10681,6 @@ define zeroext i64 @test_masked_vpcmpsge
; NoVLX-NEXT: vpcmpgtw %ymm0, %ymm1, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwd %ymm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: andl %edi, %eax
@@ -10747,7 +10712,6 @@ define zeroext i64 @test_masked_vpcmpsge
; NoVLX-NEXT: vpcmpgtw %ymm0, %ymm1, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwd %ymm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: andl %edi, %eax
@@ -10782,12 +10746,10 @@ define zeroext i64 @test_vpcmpsgew_v32i1
; NoVLX-NEXT: vpcmpgtw %ymm0, %ymm1, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwd %ymm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %ecx
; NoVLX-NEXT: vpternlogq $15, %zmm2, %zmm2, %zmm2
; NoVLX-NEXT: vpmovsxwd %ymm2, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: shll $16, %eax
@@ -10820,12 +10782,10 @@ define zeroext i64 @test_vpcmpsgew_v32i1
; NoVLX-NEXT: vpcmpgtw %ymm0, %ymm2, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwd %ymm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %ecx
; NoVLX-NEXT: vpternlogq $15, %zmm1, %zmm1, %zmm1
; NoVLX-NEXT: vpmovsxwd %ymm1, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: shll $16, %eax
@@ -10856,7 +10816,6 @@ define zeroext i64 @test_masked_vpcmpsge
; NoVLX-NEXT: vpcmpgtw %ymm0, %ymm1, %ymm2
; NoVLX-NEXT: vpternlogq $15, %zmm2, %zmm2, %zmm2
; NoVLX-NEXT: vpmovsxwd %ymm2, %zmm2
-; NoVLX-NEXT: vpslld $31, %zmm2, %zmm2
; NoVLX-NEXT: vptestmd %zmm2, %zmm2, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: andl %edi, %eax
@@ -10866,7 +10825,6 @@ define zeroext i64 @test_masked_vpcmpsge
; NoVLX-NEXT: vpcmpgtw %ymm0, %ymm1, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwd %ymm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %ecx
; NoVLX-NEXT: andl %edi, %ecx
@@ -10901,7 +10859,6 @@ define zeroext i64 @test_masked_vpcmpsge
; NoVLX-NEXT: vpcmpgtw %ymm0, %ymm1, %ymm1
; NoVLX-NEXT: vpternlogq $15, %zmm1, %zmm1, %zmm1
; NoVLX-NEXT: vpmovsxwd %ymm1, %zmm1
-; NoVLX-NEXT: vpslld $31, %zmm1, %zmm1
; NoVLX-NEXT: vptestmd %zmm1, %zmm1, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: andl %edi, %eax
@@ -10911,7 +10868,6 @@ define zeroext i64 @test_masked_vpcmpsge
; NoVLX-NEXT: vpcmpgtw %ymm0, %ymm1, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwd %ymm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %ecx
; NoVLX-NEXT: andl %edi, %ecx
@@ -14768,7 +14724,6 @@ define zeroext i32 @test_vpcmpultb_v16i1
; NoVLX-NEXT: vpcmpeqb %xmm1, %xmm0, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxbd %xmm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: vzeroupper
@@ -14795,7 +14750,6 @@ define zeroext i32 @test_vpcmpultb_v16i1
; NoVLX-NEXT: vpcmpeqb %xmm1, %xmm0, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxbd %xmm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: vzeroupper
@@ -14824,7 +14778,6 @@ define zeroext i32 @test_masked_vpcmpult
; NoVLX-NEXT: vpcmpeqb %xmm1, %xmm0, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxbd %xmm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: andl %edi, %eax
@@ -14855,7 +14808,6 @@ define zeroext i32 @test_masked_vpcmpult
; NoVLX-NEXT: vpcmpeqb %xmm1, %xmm0, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxbd %xmm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: andl %edi, %eax
@@ -14887,7 +14839,6 @@ define zeroext i64 @test_vpcmpultb_v16i1
; NoVLX-NEXT: vpcmpeqb %xmm1, %xmm0, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxbd %xmm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: movzwl %ax, %eax
@@ -14915,7 +14866,6 @@ define zeroext i64 @test_vpcmpultb_v16i1
; NoVLX-NEXT: vpcmpeqb %xmm1, %xmm0, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxbd %xmm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: movzwl %ax, %eax
@@ -14945,7 +14895,6 @@ define zeroext i64 @test_masked_vpcmpult
; NoVLX-NEXT: vpcmpeqb %xmm1, %xmm0, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxbd %xmm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: andl %edi, %eax
@@ -14976,7 +14925,6 @@ define zeroext i64 @test_masked_vpcmpult
; NoVLX-NEXT: vpcmpeqb %xmm1, %xmm0, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxbd %xmm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: andl %edi, %eax
@@ -15009,12 +14957,10 @@ define zeroext i64 @test_vpcmpultb_v32i1
; NoVLX-NEXT: vpcmpeqb %ymm1, %ymm0, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxbd %xmm0, %zmm1
-; NoVLX-NEXT: vpslld $31, %zmm1, %zmm1
; NoVLX-NEXT: vptestmd %zmm1, %zmm1, %k0
; NoVLX-NEXT: kmovw %k0, %ecx
; NoVLX-NEXT: vextracti128 $1, %ymm0, %xmm0
; NoVLX-NEXT: vpmovsxbd %xmm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: shll $16, %eax
@@ -15044,12 +14990,10 @@ define zeroext i64 @test_vpcmpultb_v32i1
; NoVLX-NEXT: vpcmpeqb %ymm1, %ymm0, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxbd %xmm0, %zmm1
-; NoVLX-NEXT: vpslld $31, %zmm1, %zmm1
; NoVLX-NEXT: vptestmd %zmm1, %zmm1, %k0
; NoVLX-NEXT: kmovw %k0, %ecx
; NoVLX-NEXT: vextracti128 $1, %ymm0, %xmm0
; NoVLX-NEXT: vpmovsxbd %xmm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: shll $16, %eax
@@ -15081,14 +15025,12 @@ define zeroext i64 @test_masked_vpcmpult
; NoVLX-NEXT: vpcmpeqb %ymm1, %ymm0, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxbd %xmm0, %zmm1
-; NoVLX-NEXT: vpslld $31, %zmm1, %zmm1
; NoVLX-NEXT: vptestmd %zmm1, %zmm1, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: andl %edi, %eax
; NoVLX-NEXT: shrl $16, %edi
; NoVLX-NEXT: vextracti128 $1, %ymm0, %xmm0
; NoVLX-NEXT: vpmovsxbd %xmm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %ecx
; NoVLX-NEXT: andl %edi, %ecx
@@ -15123,14 +15065,12 @@ define zeroext i64 @test_masked_vpcmpult
; NoVLX-NEXT: vpcmpeqb %ymm1, %ymm0, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxbd %xmm0, %zmm1
-; NoVLX-NEXT: vpslld $31, %zmm1, %zmm1
; NoVLX-NEXT: vptestmd %zmm1, %zmm1, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: andl %edi, %eax
; NoVLX-NEXT: shrl $16, %edi
; NoVLX-NEXT: vextracti128 $1, %ymm0, %xmm0
; NoVLX-NEXT: vpmovsxbd %xmm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %ecx
; NoVLX-NEXT: andl %edi, %ecx
@@ -15166,7 +15106,6 @@ define zeroext i16 @test_vpcmpultw_v8i1_
; NoVLX-NEXT: vpcmpeqw %xmm1, %xmm0, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwq %xmm0, %zmm0
-; NoVLX-NEXT: vpsllq $63, %zmm0, %zmm0
; NoVLX-NEXT: vptestmq %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: # kill: def $ax killed $ax killed $eax
@@ -15195,7 +15134,6 @@ define zeroext i16 @test_vpcmpultw_v8i1_
; NoVLX-NEXT: vpcmpeqw %xmm1, %xmm0, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwq %xmm0, %zmm0
-; NoVLX-NEXT: vpsllq $63, %zmm0, %zmm0
; NoVLX-NEXT: vptestmq %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: # kill: def $ax killed $ax killed $eax
@@ -15226,7 +15164,6 @@ define zeroext i16 @test_masked_vpcmpult
; NoVLX-NEXT: vpcmpeqw %xmm1, %xmm0, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwq %xmm0, %zmm0
-; NoVLX-NEXT: vpsllq $63, %zmm0, %zmm0
; NoVLX-NEXT: kmovw %edi, %k1
; NoVLX-NEXT: vptestmq %zmm0, %zmm0, %k0 {%k1}
; NoVLX-NEXT: kmovw %k0, %eax
@@ -15259,7 +15196,6 @@ define zeroext i16 @test_masked_vpcmpult
; NoVLX-NEXT: vpcmpeqw %xmm1, %xmm0, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwq %xmm0, %zmm0
-; NoVLX-NEXT: vpsllq $63, %zmm0, %zmm0
; NoVLX-NEXT: kmovw %edi, %k1
; NoVLX-NEXT: vptestmq %zmm0, %zmm0, %k0 {%k1}
; NoVLX-NEXT: kmovw %k0, %eax
@@ -15292,7 +15228,6 @@ define zeroext i32 @test_vpcmpultw_v8i1_
; NoVLX-NEXT: vpcmpeqw %xmm1, %xmm0, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwq %xmm0, %zmm0
-; NoVLX-NEXT: vpsllq $63, %zmm0, %zmm0
; NoVLX-NEXT: vptestmq %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: vzeroupper
@@ -15319,7 +15254,6 @@ define zeroext i32 @test_vpcmpultw_v8i1_
; NoVLX-NEXT: vpcmpeqw %xmm1, %xmm0, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwq %xmm0, %zmm0
-; NoVLX-NEXT: vpsllq $63, %zmm0, %zmm0
; NoVLX-NEXT: vptestmq %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: vzeroupper
@@ -15348,7 +15282,6 @@ define zeroext i32 @test_masked_vpcmpult
; NoVLX-NEXT: vpcmpeqw %xmm1, %xmm0, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwq %xmm0, %zmm0
-; NoVLX-NEXT: vpsllq $63, %zmm0, %zmm0
; NoVLX-NEXT: kmovw %edi, %k1
; NoVLX-NEXT: vptestmq %zmm0, %zmm0, %k0 {%k1}
; NoVLX-NEXT: kmovw %k0, %eax
@@ -15379,7 +15312,6 @@ define zeroext i32 @test_masked_vpcmpult
; NoVLX-NEXT: vpcmpeqw %xmm1, %xmm0, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwq %xmm0, %zmm0
-; NoVLX-NEXT: vpsllq $63, %zmm0, %zmm0
; NoVLX-NEXT: kmovw %edi, %k1
; NoVLX-NEXT: vptestmq %zmm0, %zmm0, %k0 {%k1}
; NoVLX-NEXT: kmovw %k0, %eax
@@ -15411,7 +15343,6 @@ define zeroext i64 @test_vpcmpultw_v8i1_
; NoVLX-NEXT: vpcmpeqw %xmm1, %xmm0, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwq %xmm0, %zmm0
-; NoVLX-NEXT: vpsllq $63, %zmm0, %zmm0
; NoVLX-NEXT: vptestmq %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: movzwl %ax, %eax
@@ -15439,7 +15370,6 @@ define zeroext i64 @test_vpcmpultw_v8i1_
; NoVLX-NEXT: vpcmpeqw %xmm1, %xmm0, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwq %xmm0, %zmm0
-; NoVLX-NEXT: vpsllq $63, %zmm0, %zmm0
; NoVLX-NEXT: vptestmq %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: movzwl %ax, %eax
@@ -15469,7 +15399,6 @@ define zeroext i64 @test_masked_vpcmpult
; NoVLX-NEXT: vpcmpeqw %xmm1, %xmm0, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwq %xmm0, %zmm0
-; NoVLX-NEXT: vpsllq $63, %zmm0, %zmm0
; NoVLX-NEXT: kmovw %edi, %k1
; NoVLX-NEXT: vptestmq %zmm0, %zmm0, %k0 {%k1}
; NoVLX-NEXT: kmovw %k0, %eax
@@ -15501,7 +15430,6 @@ define zeroext i64 @test_masked_vpcmpult
; NoVLX-NEXT: vpcmpeqw %xmm1, %xmm0, %xmm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwq %xmm0, %zmm0
-; NoVLX-NEXT: vpsllq $63, %zmm0, %zmm0
; NoVLX-NEXT: kmovw %edi, %k1
; NoVLX-NEXT: vptestmq %zmm0, %zmm0, %k0 {%k1}
; NoVLX-NEXT: kmovw %k0, %eax
@@ -15535,7 +15463,6 @@ define zeroext i32 @test_vpcmpultw_v16i1
; NoVLX-NEXT: vpcmpeqw %ymm1, %ymm0, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwd %ymm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: vzeroupper
@@ -15563,7 +15490,6 @@ define zeroext i32 @test_vpcmpultw_v16i1
; NoVLX-NEXT: vpcmpeqw %ymm1, %ymm0, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwd %ymm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: vzeroupper
@@ -15593,7 +15519,6 @@ define zeroext i32 @test_masked_vpcmpult
; NoVLX-NEXT: vpcmpeqw %ymm1, %ymm0, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwd %ymm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: andl %edi, %eax
@@ -15625,7 +15550,6 @@ define zeroext i32 @test_masked_vpcmpult
; NoVLX-NEXT: vpcmpeqw %ymm1, %ymm0, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwd %ymm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: andl %edi, %eax
@@ -15658,7 +15582,6 @@ define zeroext i64 @test_vpcmpultw_v16i1
; NoVLX-NEXT: vpcmpeqw %ymm1, %ymm0, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwd %ymm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: movzwl %ax, %eax
@@ -15687,7 +15610,6 @@ define zeroext i64 @test_vpcmpultw_v16i1
; NoVLX-NEXT: vpcmpeqw %ymm1, %ymm0, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwd %ymm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: movzwl %ax, %eax
@@ -15718,7 +15640,6 @@ define zeroext i64 @test_masked_vpcmpult
; NoVLX-NEXT: vpcmpeqw %ymm1, %ymm0, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwd %ymm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: andl %edi, %eax
@@ -15750,7 +15671,6 @@ define zeroext i64 @test_masked_vpcmpult
; NoVLX-NEXT: vpcmpeqw %ymm1, %ymm0, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwd %ymm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: andl %edi, %eax
@@ -15785,14 +15705,12 @@ define zeroext i64 @test_vpcmpultw_v32i1
; NoVLX-NEXT: vpcmpeqw %ymm1, %ymm0, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwd %ymm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %ecx
; NoVLX-NEXT: vpmaxuw %ymm3, %ymm2, %ymm0
; NoVLX-NEXT: vpcmpeqw %ymm0, %ymm2, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwd %ymm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: shll $16, %eax
@@ -15823,14 +15741,12 @@ define zeroext i64 @test_vpcmpultw_v32i1
; NoVLX-NEXT: vpcmpeqw %ymm2, %ymm0, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwd %ymm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %ecx
; NoVLX-NEXT: vpmaxuw 32(%rdi), %ymm1, %ymm0
; NoVLX-NEXT: vpcmpeqw %ymm0, %ymm1, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwd %ymm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: shll $16, %eax
@@ -15862,7 +15778,6 @@ define zeroext i64 @test_masked_vpcmpult
; NoVLX-NEXT: vpcmpeqw %ymm2, %ymm0, %ymm2
; NoVLX-NEXT: vpternlogq $15, %zmm2, %zmm2, %zmm2
; NoVLX-NEXT: vpmovsxwd %ymm2, %zmm2
-; NoVLX-NEXT: vpslld $31, %zmm2, %zmm2
; NoVLX-NEXT: vptestmd %zmm2, %zmm2, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: andl %edi, %eax
@@ -15873,7 +15788,6 @@ define zeroext i64 @test_masked_vpcmpult
; NoVLX-NEXT: vpcmpeqw %ymm1, %ymm0, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwd %ymm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %ecx
; NoVLX-NEXT: andl %edi, %ecx
@@ -15908,7 +15822,6 @@ define zeroext i64 @test_masked_vpcmpult
; NoVLX-NEXT: vpcmpeqw %ymm1, %ymm0, %ymm1
; NoVLX-NEXT: vpternlogq $15, %zmm1, %zmm1, %zmm1
; NoVLX-NEXT: vpmovsxwd %ymm1, %zmm1
-; NoVLX-NEXT: vpslld $31, %zmm1, %zmm1
; NoVLX-NEXT: vptestmd %zmm1, %zmm1, %k0
; NoVLX-NEXT: kmovw %k0, %eax
; NoVLX-NEXT: andl %edi, %eax
@@ -15918,7 +15831,6 @@ define zeroext i64 @test_masked_vpcmpult
; NoVLX-NEXT: vpcmpeqw %ymm1, %ymm0, %ymm0
; NoVLX-NEXT: vpternlogq $15, %zmm0, %zmm0, %zmm0
; NoVLX-NEXT: vpmovsxwd %ymm0, %zmm0
-; NoVLX-NEXT: vpslld $31, %zmm0, %zmm0
; NoVLX-NEXT: vptestmd %zmm0, %zmm0, %k0
; NoVLX-NEXT: kmovw %k0, %ecx
; NoVLX-NEXT: andl %edi, %ecx
Modified: llvm/trunk/test/CodeGen/X86/cast-vsel.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/cast-vsel.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/cast-vsel.ll (original)
+++ llvm/trunk/test/CodeGen/X86/cast-vsel.ll Sun Oct 14 18:51:58 2018
@@ -357,17 +357,16 @@ define void @example25() nounwind {
; AVX2-LABEL: example25:
; AVX2: # %bb.0: # %vector.ph
; AVX2-NEXT: movq $-4096, %rax # imm = 0xF000
-; AVX2-NEXT: vbroadcastss {{.*#+}} ymm0 = [1,1,1,1,1,1,1,1]
; AVX2-NEXT: .p2align 4, 0x90
; AVX2-NEXT: .LBB5_1: # %vector.body
; AVX2-NEXT: # =>This Inner Loop Header: Depth=1
-; AVX2-NEXT: vmovups da+4096(%rax), %ymm1
-; AVX2-NEXT: vcmpltps db+4096(%rax), %ymm1, %ymm1
-; AVX2-NEXT: vmovups dc+4096(%rax), %ymm2
-; AVX2-NEXT: vcmpltps dd+4096(%rax), %ymm2, %ymm2
-; AVX2-NEXT: vandps %ymm0, %ymm2, %ymm2
-; AVX2-NEXT: vandps %ymm2, %ymm1, %ymm1
-; AVX2-NEXT: vmovups %ymm1, dj+4096(%rax)
+; AVX2-NEXT: vmovups da+4096(%rax), %ymm0
+; AVX2-NEXT: vcmpltps db+4096(%rax), %ymm0, %ymm0
+; AVX2-NEXT: vmovups dc+4096(%rax), %ymm1
+; AVX2-NEXT: vcmpltps dd+4096(%rax), %ymm1, %ymm1
+; AVX2-NEXT: vandps %ymm1, %ymm0, %ymm0
+; AVX2-NEXT: vpsrld $31, %ymm0, %ymm0
+; AVX2-NEXT: vmovdqu %ymm0, dj+4096(%rax)
; AVX2-NEXT: addq $32, %rax
; AVX2-NEXT: jne .LBB5_1
; AVX2-NEXT: # %bb.2: # %for.end
Modified: llvm/trunk/test/CodeGen/X86/combine-sdiv.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/combine-sdiv.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/combine-sdiv.ll (original)
+++ llvm/trunk/test/CodeGen/X86/combine-sdiv.ll Sun Oct 14 18:51:58 2018
@@ -726,7 +726,8 @@ define <16 x i16> @combine_vec_sdiv_by_p
; AVX1-NEXT: vpsraw $1, %xmm3, %xmm3
; AVX1-NEXT: vpblendw {{.*#+}} xmm2 = xmm2[0,1],xmm3[2],xmm2[3,4,5,6],xmm3[7]
; AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm2, %ymm1
-; AVX1-NEXT: vmovaps {{.*#+}} ymm2 = [0,65535,65535,65535,65535,65535,65535,65535,0,65535,65535,65535,65535,65535,65535,65535]
+; AVX1-NEXT: vbroadcastf128 {{.*#+}} ymm2 = [0,65535,65535,65535,65535,65535,65535,65535,0,65535,65535,65535,65535,65535,65535,65535]
+; AVX1-NEXT: # ymm2 = mem[0,1,0,1]
; AVX1-NEXT: vandps %ymm2, %ymm1, %ymm1
; AVX1-NEXT: vandnps %ymm0, %ymm2, %ymm0
; AVX1-NEXT: vorps %ymm0, %ymm1, %ymm0
@@ -777,7 +778,9 @@ define <16 x i16> @combine_vec_sdiv_by_p
; XOP-NEXT: vpaddw %xmm3, %xmm0, %xmm3
; XOP-NEXT: vpshaw %xmm2, %xmm3, %xmm2
; XOP-NEXT: vinsertf128 $1, %xmm1, %ymm2, %ymm1
-; XOP-NEXT: vpcmov {{.*}}(%rip), %ymm0, %ymm1, %ymm0
+; XOP-NEXT: vbroadcastf128 {{.*#+}} ymm2 = [0,65535,65535,65535,65535,65535,65535,65535,0,65535,65535,65535,65535,65535,65535,65535]
+; XOP-NEXT: # ymm2 = mem[0,1,0,1]
+; XOP-NEXT: vpcmov %ymm2, %ymm0, %ymm1, %ymm0
; XOP-NEXT: retq
%1 = sdiv <16 x i16> %x, <i16 1, i16 4, i16 2, i16 16, i16 8, i16 32, i16 64, i16 2, i16 1, i16 4, i16 2, i16 16, i16 8, i16 32, i16 64, i16 2>
ret <16 x i16> %1
@@ -960,7 +963,8 @@ define <32 x i16> @combine_vec_sdiv_by_p
; AVX1-NEXT: vpsraw $1, %xmm5, %xmm5
; AVX1-NEXT: vpblendw {{.*#+}} xmm5 = xmm6[0,1],xmm5[2],xmm6[3,4,5,6],xmm5[7]
; AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm5, %ymm2
-; AVX1-NEXT: vmovaps {{.*#+}} ymm5 = [0,65535,65535,65535,65535,65535,65535,65535,0,65535,65535,65535,65535,65535,65535,65535]
+; AVX1-NEXT: vbroadcastf128 {{.*#+}} ymm5 = [0,65535,65535,65535,65535,65535,65535,65535,0,65535,65535,65535,65535,65535,65535,65535]
+; AVX1-NEXT: # ymm5 = mem[0,1,0,1]
; AVX1-NEXT: vandps %ymm5, %ymm2, %ymm2
; AVX1-NEXT: vandnps %ymm0, %ymm5, %ymm0
; AVX1-NEXT: vorps %ymm0, %ymm2, %ymm0
@@ -1055,7 +1059,8 @@ define <32 x i16> @combine_vec_sdiv_by_p
; XOP-NEXT: vpaddw %xmm5, %xmm0, %xmm5
; XOP-NEXT: vpshaw %xmm3, %xmm5, %xmm5
; XOP-NEXT: vinsertf128 $1, %xmm2, %ymm5, %ymm2
-; XOP-NEXT: vmovdqa {{.*#+}} ymm5 = [0,65535,65535,65535,65535,65535,65535,65535,0,65535,65535,65535,65535,65535,65535,65535]
+; XOP-NEXT: vbroadcastf128 {{.*#+}} ymm5 = [0,65535,65535,65535,65535,65535,65535,65535,0,65535,65535,65535,65535,65535,65535,65535]
+; XOP-NEXT: # ymm5 = mem[0,1,0,1]
; XOP-NEXT: vpcmov %ymm5, %ymm0, %ymm2, %ymm0
; XOP-NEXT: vextractf128 $1, %ymm1, %xmm2
; XOP-NEXT: vpsraw $15, %xmm2, %xmm6
Modified: llvm/trunk/test/CodeGen/X86/combine-srl.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/combine-srl.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/combine-srl.ll (original)
+++ llvm/trunk/test/CodeGen/X86/combine-srl.ll Sun Oct 14 18:51:58 2018
@@ -357,55 +357,50 @@ define <4 x i32> @combine_vec_lshr_lzcnt
; SSE-LABEL: combine_vec_lshr_lzcnt_bit1:
; SSE: # %bb.0:
; SSE-NEXT: pand {{.*}}(%rip), %xmm0
-; SSE-NEXT: movdqa {{.*#+}} xmm2 = [15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15]
-; SSE-NEXT: movdqa %xmm0, %xmm1
-; SSE-NEXT: pand %xmm2, %xmm1
-; SSE-NEXT: movdqa {{.*#+}} xmm3 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
-; SSE-NEXT: movdqa %xmm3, %xmm4
-; SSE-NEXT: pshufb %xmm1, %xmm4
; SSE-NEXT: movdqa %xmm0, %xmm1
; SSE-NEXT: psrlw $4, %xmm1
-; SSE-NEXT: pand %xmm2, %xmm1
; SSE-NEXT: pxor %xmm2, %xmm2
+; SSE-NEXT: movdqa {{.*#+}} xmm3 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
+; SSE-NEXT: movdqa %xmm3, %xmm4
; SSE-NEXT: pshufb %xmm1, %xmm3
; SSE-NEXT: pcmpeqb %xmm2, %xmm1
-; SSE-NEXT: pand %xmm4, %xmm1
-; SSE-NEXT: paddb %xmm3, %xmm1
-; SSE-NEXT: movdqa %xmm0, %xmm3
-; SSE-NEXT: pcmpeqb %xmm2, %xmm3
-; SSE-NEXT: psrlw $8, %xmm3
-; SSE-NEXT: pand %xmm1, %xmm3
+; SSE-NEXT: movdqa {{.*#+}} xmm5 = [15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15]
+; SSE-NEXT: pand %xmm0, %xmm5
+; SSE-NEXT: pshufb %xmm5, %xmm4
+; SSE-NEXT: pand %xmm1, %xmm4
+; SSE-NEXT: paddb %xmm4, %xmm3
+; SSE-NEXT: movdqa %xmm0, %xmm1
+; SSE-NEXT: pcmpeqb %xmm2, %xmm1
; SSE-NEXT: psrlw $8, %xmm1
-; SSE-NEXT: paddw %xmm3, %xmm1
+; SSE-NEXT: pand %xmm3, %xmm1
+; SSE-NEXT: psrlw $8, %xmm3
+; SSE-NEXT: paddw %xmm1, %xmm3
; SSE-NEXT: pcmpeqw %xmm2, %xmm0
; SSE-NEXT: psrld $16, %xmm0
-; SSE-NEXT: pand %xmm1, %xmm0
-; SSE-NEXT: psrld $16, %xmm1
-; SSE-NEXT: paddd %xmm0, %xmm1
-; SSE-NEXT: psrld $5, %xmm1
-; SSE-NEXT: movdqa %xmm1, %xmm0
+; SSE-NEXT: pand %xmm3, %xmm0
+; SSE-NEXT: psrld $16, %xmm3
+; SSE-NEXT: paddd %xmm3, %xmm0
+; SSE-NEXT: psrld $5, %xmm0
; SSE-NEXT: retq
;
; AVX-LABEL: combine_vec_lshr_lzcnt_bit1:
; AVX: # %bb.0:
; AVX-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0
-; AVX-NEXT: vmovdqa {{.*#+}} xmm1 = [15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15]
-; AVX-NEXT: vpand %xmm1, %xmm0, %xmm2
-; AVX-NEXT: vmovdqa {{.*#+}} xmm3 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
-; AVX-NEXT: vpshufb %xmm2, %xmm3, %xmm2
-; AVX-NEXT: vpsrlw $4, %xmm0, %xmm4
-; AVX-NEXT: vpand %xmm1, %xmm4, %xmm1
-; AVX-NEXT: vpxor %xmm4, %xmm4, %xmm4
-; AVX-NEXT: vpcmpeqb %xmm4, %xmm1, %xmm5
-; AVX-NEXT: vpand %xmm5, %xmm2, %xmm2
-; AVX-NEXT: vpshufb %xmm1, %xmm3, %xmm1
-; AVX-NEXT: vpaddb %xmm1, %xmm2, %xmm1
-; AVX-NEXT: vpcmpeqb %xmm4, %xmm0, %xmm2
-; AVX-NEXT: vpsrlw $8, %xmm2, %xmm2
-; AVX-NEXT: vpand %xmm2, %xmm1, %xmm2
+; AVX-NEXT: vpsrlw $4, %xmm0, %xmm1
+; AVX-NEXT: vpxor %xmm2, %xmm2, %xmm2
+; AVX-NEXT: vpcmpeqb %xmm2, %xmm1, %xmm3
+; AVX-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm4
+; AVX-NEXT: vmovdqa {{.*#+}} xmm5 = [4,3,2,2,1,1,1,1,0,0,0,0,0,0,0,0]
+; AVX-NEXT: vpshufb %xmm4, %xmm5, %xmm4
+; AVX-NEXT: vpand %xmm3, %xmm4, %xmm3
+; AVX-NEXT: vpshufb %xmm1, %xmm5, %xmm1
+; AVX-NEXT: vpaddb %xmm1, %xmm3, %xmm1
+; AVX-NEXT: vpcmpeqb %xmm2, %xmm0, %xmm3
+; AVX-NEXT: vpsrlw $8, %xmm3, %xmm3
+; AVX-NEXT: vpand %xmm3, %xmm1, %xmm3
; AVX-NEXT: vpsrlw $8, %xmm1, %xmm1
-; AVX-NEXT: vpaddw %xmm2, %xmm1, %xmm1
-; AVX-NEXT: vpcmpeqw %xmm4, %xmm0, %xmm0
+; AVX-NEXT: vpaddw %xmm3, %xmm1, %xmm1
+; AVX-NEXT: vpcmpeqw %xmm2, %xmm0, %xmm0
; AVX-NEXT: vpsrld $16, %xmm0, %xmm0
; AVX-NEXT: vpand %xmm0, %xmm1, %xmm0
; AVX-NEXT: vpsrld $16, %xmm1, %xmm1
Modified: llvm/trunk/test/CodeGen/X86/gather-addresses.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/gather-addresses.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/gather-addresses.ll (original)
+++ llvm/trunk/test/CodeGen/X86/gather-addresses.ll Sun Oct 14 18:51:58 2018
@@ -149,11 +149,11 @@ define <4 x i64> @old(double* %p, <4 x i
; LIN-SSE2-NEXT: andl %ecx, %edx
; LIN-SSE2-NEXT: andl %ecx, %esi
; LIN-SSE2-NEXT: andl %ecx, %edi
-; LIN-SSE2-NEXT: movd %eax, %xmm0
-; LIN-SSE2-NEXT: movd %edx, %xmm1
+; LIN-SSE2-NEXT: movq %rax, %xmm0
+; LIN-SSE2-NEXT: movq %rdx, %xmm1
; LIN-SSE2-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
-; LIN-SSE2-NEXT: movd %edi, %xmm2
-; LIN-SSE2-NEXT: movd %esi, %xmm1
+; LIN-SSE2-NEXT: movq %rdi, %xmm2
+; LIN-SSE2-NEXT: movq %rsi, %xmm1
; LIN-SSE2-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm2[0]
; LIN-SSE2-NEXT: retq
;
@@ -169,11 +169,11 @@ define <4 x i64> @old(double* %p, <4 x i
; LIN-SSE4-NEXT: andl %ecx, %edx
; LIN-SSE4-NEXT: andl %ecx, %esi
; LIN-SSE4-NEXT: andl %ecx, %edi
-; LIN-SSE4-NEXT: movd %edx, %xmm1
-; LIN-SSE4-NEXT: movd %eax, %xmm0
+; LIN-SSE4-NEXT: movq %rdx, %xmm1
+; LIN-SSE4-NEXT: movq %rax, %xmm0
; LIN-SSE4-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
-; LIN-SSE4-NEXT: movd %edi, %xmm2
-; LIN-SSE4-NEXT: movd %esi, %xmm1
+; LIN-SSE4-NEXT: movq %rdi, %xmm2
+; LIN-SSE4-NEXT: movq %rsi, %xmm1
; LIN-SSE4-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm2[0]
; LIN-SSE4-NEXT: retq
;
@@ -192,11 +192,11 @@ define <4 x i64> @old(double* %p, <4 x i
; WIN-SSE2-NEXT: andl %r9d, %ecx
; WIN-SSE2-NEXT: andl %r9d, %r8d
; WIN-SSE2-NEXT: andl %r9d, %edx
-; WIN-SSE2-NEXT: movd %eax, %xmm0
-; WIN-SSE2-NEXT: movd %ecx, %xmm1
+; WIN-SSE2-NEXT: movq %rax, %xmm0
+; WIN-SSE2-NEXT: movq %rcx, %xmm1
; WIN-SSE2-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
-; WIN-SSE2-NEXT: movd %edx, %xmm2
-; WIN-SSE2-NEXT: movd %r8d, %xmm1
+; WIN-SSE2-NEXT: movq %rdx, %xmm2
+; WIN-SSE2-NEXT: movq %r8, %xmm1
; WIN-SSE2-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm2[0]
; WIN-SSE2-NEXT: retq
;
@@ -212,11 +212,11 @@ define <4 x i64> @old(double* %p, <4 x i
; WIN-SSE4-NEXT: andl %r9d, %ecx
; WIN-SSE4-NEXT: andl %r9d, %r8d
; WIN-SSE4-NEXT: andl %r9d, %edx
-; WIN-SSE4-NEXT: movd %ecx, %xmm1
-; WIN-SSE4-NEXT: movd %eax, %xmm0
+; WIN-SSE4-NEXT: movq %rcx, %xmm1
+; WIN-SSE4-NEXT: movq %rax, %xmm0
; WIN-SSE4-NEXT: punpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
-; WIN-SSE4-NEXT: movd %edx, %xmm2
-; WIN-SSE4-NEXT: movd %r8d, %xmm1
+; WIN-SSE4-NEXT: movq %rdx, %xmm2
+; WIN-SSE4-NEXT: movq %r8, %xmm1
; WIN-SSE4-NEXT: punpcklqdq {{.*#+}} xmm1 = xmm1[0],xmm2[0]
; WIN-SSE4-NEXT: retq
;
Modified: llvm/trunk/test/CodeGen/X86/horizontal-reduce-umax.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/horizontal-reduce-umax.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/horizontal-reduce-umax.ll (original)
+++ llvm/trunk/test/CodeGen/X86/horizontal-reduce-umax.ll Sun Oct 14 18:51:58 2018
@@ -230,15 +230,14 @@ define i16 @test_reduce_v8i16(<8 x i16>
; X86-SSE2-NEXT: pxor %xmm2, %xmm0
; X86-SSE2-NEXT: pxor %xmm2, %xmm1
; X86-SSE2-NEXT: pmaxsw %xmm0, %xmm1
-; X86-SSE2-NEXT: pxor %xmm2, %xmm1
-; X86-SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,1,2,3]
-; X86-SSE2-NEXT: pxor %xmm2, %xmm1
+; X86-SSE2-NEXT: movdqa %xmm1, %xmm0
; X86-SSE2-NEXT: pxor %xmm2, %xmm0
-; X86-SSE2-NEXT: pmaxsw %xmm1, %xmm0
+; X86-SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
; X86-SSE2-NEXT: pxor %xmm2, %xmm0
+; X86-SSE2-NEXT: pmaxsw %xmm1, %xmm0
; X86-SSE2-NEXT: movdqa %xmm0, %xmm1
+; X86-SSE2-NEXT: pxor %xmm2, %xmm1
; X86-SSE2-NEXT: psrld $16, %xmm1
-; X86-SSE2-NEXT: pxor %xmm2, %xmm0
; X86-SSE2-NEXT: pxor %xmm2, %xmm1
; X86-SSE2-NEXT: pmaxsw %xmm0, %xmm1
; X86-SSE2-NEXT: pxor %xmm2, %xmm1
@@ -273,15 +272,14 @@ define i16 @test_reduce_v8i16(<8 x i16>
; X64-SSE2-NEXT: pxor %xmm2, %xmm0
; X64-SSE2-NEXT: pxor %xmm2, %xmm1
; X64-SSE2-NEXT: pmaxsw %xmm0, %xmm1
-; X64-SSE2-NEXT: pxor %xmm2, %xmm1
-; X64-SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,1,2,3]
-; X64-SSE2-NEXT: pxor %xmm2, %xmm1
+; X64-SSE2-NEXT: movdqa %xmm1, %xmm0
; X64-SSE2-NEXT: pxor %xmm2, %xmm0
-; X64-SSE2-NEXT: pmaxsw %xmm1, %xmm0
+; X64-SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
; X64-SSE2-NEXT: pxor %xmm2, %xmm0
+; X64-SSE2-NEXT: pmaxsw %xmm1, %xmm0
; X64-SSE2-NEXT: movdqa %xmm0, %xmm1
+; X64-SSE2-NEXT: pxor %xmm2, %xmm1
; X64-SSE2-NEXT: psrld $16, %xmm1
-; X64-SSE2-NEXT: pxor %xmm2, %xmm0
; X64-SSE2-NEXT: pxor %xmm2, %xmm1
; X64-SSE2-NEXT: pmaxsw %xmm0, %xmm1
; X64-SSE2-NEXT: pxor %xmm2, %xmm1
@@ -832,20 +830,19 @@ define i16 @test_reduce_v16i16(<16 x i16
; X86-SSE2-NEXT: pxor %xmm2, %xmm1
; X86-SSE2-NEXT: pxor %xmm2, %xmm0
; X86-SSE2-NEXT: pmaxsw %xmm1, %xmm0
-; X86-SSE2-NEXT: pxor %xmm2, %xmm0
-; X86-SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm0[2,3,0,1]
-; X86-SSE2-NEXT: pxor %xmm2, %xmm0
-; X86-SSE2-NEXT: pxor %xmm2, %xmm1
-; X86-SSE2-NEXT: pmaxsw %xmm0, %xmm1
+; X86-SSE2-NEXT: movdqa %xmm0, %xmm1
; X86-SSE2-NEXT: pxor %xmm2, %xmm1
-; X86-SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,1,2,3]
+; X86-SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,3,0,1]
; X86-SSE2-NEXT: pxor %xmm2, %xmm1
+; X86-SSE2-NEXT: pmaxsw %xmm0, %xmm1
+; X86-SSE2-NEXT: movdqa %xmm1, %xmm0
; X86-SSE2-NEXT: pxor %xmm2, %xmm0
-; X86-SSE2-NEXT: pmaxsw %xmm1, %xmm0
+; X86-SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
; X86-SSE2-NEXT: pxor %xmm2, %xmm0
+; X86-SSE2-NEXT: pmaxsw %xmm1, %xmm0
; X86-SSE2-NEXT: movdqa %xmm0, %xmm1
+; X86-SSE2-NEXT: pxor %xmm2, %xmm1
; X86-SSE2-NEXT: psrld $16, %xmm1
-; X86-SSE2-NEXT: pxor %xmm2, %xmm0
; X86-SSE2-NEXT: pxor %xmm2, %xmm1
; X86-SSE2-NEXT: pmaxsw %xmm0, %xmm1
; X86-SSE2-NEXT: pxor %xmm2, %xmm1
@@ -896,20 +893,19 @@ define i16 @test_reduce_v16i16(<16 x i16
; X64-SSE2-NEXT: pxor %xmm2, %xmm1
; X64-SSE2-NEXT: pxor %xmm2, %xmm0
; X64-SSE2-NEXT: pmaxsw %xmm1, %xmm0
-; X64-SSE2-NEXT: pxor %xmm2, %xmm0
-; X64-SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm0[2,3,0,1]
-; X64-SSE2-NEXT: pxor %xmm2, %xmm0
-; X64-SSE2-NEXT: pxor %xmm2, %xmm1
-; X64-SSE2-NEXT: pmaxsw %xmm0, %xmm1
+; X64-SSE2-NEXT: movdqa %xmm0, %xmm1
; X64-SSE2-NEXT: pxor %xmm2, %xmm1
-; X64-SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,1,2,3]
+; X64-SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,3,0,1]
; X64-SSE2-NEXT: pxor %xmm2, %xmm1
+; X64-SSE2-NEXT: pmaxsw %xmm0, %xmm1
+; X64-SSE2-NEXT: movdqa %xmm1, %xmm0
; X64-SSE2-NEXT: pxor %xmm2, %xmm0
-; X64-SSE2-NEXT: pmaxsw %xmm1, %xmm0
+; X64-SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
; X64-SSE2-NEXT: pxor %xmm2, %xmm0
+; X64-SSE2-NEXT: pmaxsw %xmm1, %xmm0
; X64-SSE2-NEXT: movdqa %xmm0, %xmm1
+; X64-SSE2-NEXT: pxor %xmm2, %xmm1
; X64-SSE2-NEXT: psrld $16, %xmm1
-; X64-SSE2-NEXT: pxor %xmm2, %xmm0
; X64-SSE2-NEXT: pxor %xmm2, %xmm1
; X64-SSE2-NEXT: pmaxsw %xmm0, %xmm1
; X64-SSE2-NEXT: pxor %xmm2, %xmm1
@@ -1670,35 +1666,30 @@ define i16 @test_reduce_v32i16(<32 x i16
; X86-SSE2-LABEL: test_reduce_v32i16:
; X86-SSE2: ## %bb.0:
; X86-SSE2-NEXT: movdqa {{.*#+}} xmm4 = [32768,32768,32768,32768,32768,32768,32768,32768]
-; X86-SSE2-NEXT: pxor %xmm4, %xmm2
-; X86-SSE2-NEXT: pxor %xmm4, %xmm0
-; X86-SSE2-NEXT: pmaxsw %xmm2, %xmm0
; X86-SSE2-NEXT: pxor %xmm4, %xmm3
; X86-SSE2-NEXT: pxor %xmm4, %xmm1
; X86-SSE2-NEXT: pmaxsw %xmm3, %xmm1
-; X86-SSE2-NEXT: movdqa %xmm4, %xmm2
-; X86-SSE2-NEXT: pxor %xmm4, %xmm2
-; X86-SSE2-NEXT: pxor %xmm2, %xmm1
-; X86-SSE2-NEXT: pxor %xmm0, %xmm2
-; X86-SSE2-NEXT: pmaxsw %xmm1, %xmm2
-; X86-SSE2-NEXT: pxor %xmm4, %xmm2
-; X86-SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm2[2,3,0,1]
; X86-SSE2-NEXT: pxor %xmm4, %xmm2
; X86-SSE2-NEXT: pxor %xmm4, %xmm0
; X86-SSE2-NEXT: pmaxsw %xmm2, %xmm0
-; X86-SSE2-NEXT: pxor %xmm4, %xmm0
-; X86-SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm0[1,1,2,3]
-; X86-SSE2-NEXT: pxor %xmm4, %xmm0
+; X86-SSE2-NEXT: pmaxsw %xmm1, %xmm0
+; X86-SSE2-NEXT: movdqa %xmm0, %xmm1
; X86-SSE2-NEXT: pxor %xmm4, %xmm1
-; X86-SSE2-NEXT: pmaxsw %xmm0, %xmm1
+; X86-SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,3,0,1]
; X86-SSE2-NEXT: pxor %xmm4, %xmm1
+; X86-SSE2-NEXT: pmaxsw %xmm0, %xmm1
; X86-SSE2-NEXT: movdqa %xmm1, %xmm0
-; X86-SSE2-NEXT: psrld $16, %xmm0
-; X86-SSE2-NEXT: pxor %xmm4, %xmm1
; X86-SSE2-NEXT: pxor %xmm4, %xmm0
-; X86-SSE2-NEXT: pmaxsw %xmm1, %xmm0
+; X86-SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
; X86-SSE2-NEXT: pxor %xmm4, %xmm0
-; X86-SSE2-NEXT: movd %xmm0, %eax
+; X86-SSE2-NEXT: pmaxsw %xmm1, %xmm0
+; X86-SSE2-NEXT: movdqa %xmm0, %xmm1
+; X86-SSE2-NEXT: pxor %xmm4, %xmm1
+; X86-SSE2-NEXT: psrld $16, %xmm1
+; X86-SSE2-NEXT: pxor %xmm4, %xmm1
+; X86-SSE2-NEXT: pmaxsw %xmm0, %xmm1
+; X86-SSE2-NEXT: pxor %xmm4, %xmm1
+; X86-SSE2-NEXT: movd %xmm1, %eax
; X86-SSE2-NEXT: ## kill: def $ax killed $ax killed $eax
; X86-SSE2-NEXT: retl
;
@@ -1748,35 +1739,30 @@ define i16 @test_reduce_v32i16(<32 x i16
; X64-SSE2-LABEL: test_reduce_v32i16:
; X64-SSE2: ## %bb.0:
; X64-SSE2-NEXT: movdqa {{.*#+}} xmm4 = [32768,32768,32768,32768,32768,32768,32768,32768]
-; X64-SSE2-NEXT: pxor %xmm4, %xmm2
-; X64-SSE2-NEXT: pxor %xmm4, %xmm0
-; X64-SSE2-NEXT: pmaxsw %xmm2, %xmm0
; X64-SSE2-NEXT: pxor %xmm4, %xmm3
; X64-SSE2-NEXT: pxor %xmm4, %xmm1
; X64-SSE2-NEXT: pmaxsw %xmm3, %xmm1
-; X64-SSE2-NEXT: movdqa %xmm4, %xmm2
-; X64-SSE2-NEXT: pxor %xmm4, %xmm2
-; X64-SSE2-NEXT: pxor %xmm2, %xmm1
-; X64-SSE2-NEXT: pxor %xmm0, %xmm2
-; X64-SSE2-NEXT: pmaxsw %xmm1, %xmm2
-; X64-SSE2-NEXT: pxor %xmm4, %xmm2
-; X64-SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm2[2,3,0,1]
; X64-SSE2-NEXT: pxor %xmm4, %xmm2
; X64-SSE2-NEXT: pxor %xmm4, %xmm0
; X64-SSE2-NEXT: pmaxsw %xmm2, %xmm0
-; X64-SSE2-NEXT: pxor %xmm4, %xmm0
-; X64-SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm0[1,1,2,3]
-; X64-SSE2-NEXT: pxor %xmm4, %xmm0
+; X64-SSE2-NEXT: pmaxsw %xmm1, %xmm0
+; X64-SSE2-NEXT: movdqa %xmm0, %xmm1
; X64-SSE2-NEXT: pxor %xmm4, %xmm1
-; X64-SSE2-NEXT: pmaxsw %xmm0, %xmm1
+; X64-SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,3,0,1]
; X64-SSE2-NEXT: pxor %xmm4, %xmm1
+; X64-SSE2-NEXT: pmaxsw %xmm0, %xmm1
; X64-SSE2-NEXT: movdqa %xmm1, %xmm0
-; X64-SSE2-NEXT: psrld $16, %xmm0
-; X64-SSE2-NEXT: pxor %xmm4, %xmm1
; X64-SSE2-NEXT: pxor %xmm4, %xmm0
-; X64-SSE2-NEXT: pmaxsw %xmm1, %xmm0
+; X64-SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
; X64-SSE2-NEXT: pxor %xmm4, %xmm0
-; X64-SSE2-NEXT: movd %xmm0, %eax
+; X64-SSE2-NEXT: pmaxsw %xmm1, %xmm0
+; X64-SSE2-NEXT: movdqa %xmm0, %xmm1
+; X64-SSE2-NEXT: pxor %xmm4, %xmm1
+; X64-SSE2-NEXT: psrld $16, %xmm1
+; X64-SSE2-NEXT: pxor %xmm4, %xmm1
+; X64-SSE2-NEXT: pmaxsw %xmm0, %xmm1
+; X64-SSE2-NEXT: pxor %xmm4, %xmm1
+; X64-SSE2-NEXT: movd %xmm1, %eax
; X64-SSE2-NEXT: ## kill: def $ax killed $ax killed $eax
; X64-SSE2-NEXT: retq
;
Modified: llvm/trunk/test/CodeGen/X86/horizontal-reduce-umin.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/horizontal-reduce-umin.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/horizontal-reduce-umin.ll (original)
+++ llvm/trunk/test/CodeGen/X86/horizontal-reduce-umin.ll Sun Oct 14 18:51:58 2018
@@ -232,15 +232,14 @@ define i16 @test_reduce_v8i16(<8 x i16>
; X86-SSE2-NEXT: pxor %xmm2, %xmm0
; X86-SSE2-NEXT: pxor %xmm2, %xmm1
; X86-SSE2-NEXT: pminsw %xmm0, %xmm1
-; X86-SSE2-NEXT: pxor %xmm2, %xmm1
-; X86-SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,1,2,3]
-; X86-SSE2-NEXT: pxor %xmm2, %xmm1
+; X86-SSE2-NEXT: movdqa %xmm1, %xmm0
; X86-SSE2-NEXT: pxor %xmm2, %xmm0
-; X86-SSE2-NEXT: pminsw %xmm1, %xmm0
+; X86-SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
; X86-SSE2-NEXT: pxor %xmm2, %xmm0
+; X86-SSE2-NEXT: pminsw %xmm1, %xmm0
; X86-SSE2-NEXT: movdqa %xmm0, %xmm1
+; X86-SSE2-NEXT: pxor %xmm2, %xmm1
; X86-SSE2-NEXT: psrld $16, %xmm1
-; X86-SSE2-NEXT: pxor %xmm2, %xmm0
; X86-SSE2-NEXT: pxor %xmm2, %xmm1
; X86-SSE2-NEXT: pminsw %xmm0, %xmm1
; X86-SSE2-NEXT: pxor %xmm2, %xmm1
@@ -269,15 +268,14 @@ define i16 @test_reduce_v8i16(<8 x i16>
; X64-SSE2-NEXT: pxor %xmm2, %xmm0
; X64-SSE2-NEXT: pxor %xmm2, %xmm1
; X64-SSE2-NEXT: pminsw %xmm0, %xmm1
-; X64-SSE2-NEXT: pxor %xmm2, %xmm1
-; X64-SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,1,2,3]
-; X64-SSE2-NEXT: pxor %xmm2, %xmm1
+; X64-SSE2-NEXT: movdqa %xmm1, %xmm0
; X64-SSE2-NEXT: pxor %xmm2, %xmm0
-; X64-SSE2-NEXT: pminsw %xmm1, %xmm0
+; X64-SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
; X64-SSE2-NEXT: pxor %xmm2, %xmm0
+; X64-SSE2-NEXT: pminsw %xmm1, %xmm0
; X64-SSE2-NEXT: movdqa %xmm0, %xmm1
+; X64-SSE2-NEXT: pxor %xmm2, %xmm1
; X64-SSE2-NEXT: psrld $16, %xmm1
-; X64-SSE2-NEXT: pxor %xmm2, %xmm0
; X64-SSE2-NEXT: pxor %xmm2, %xmm1
; X64-SSE2-NEXT: pminsw %xmm0, %xmm1
; X64-SSE2-NEXT: pxor %xmm2, %xmm1
@@ -772,20 +770,19 @@ define i16 @test_reduce_v16i16(<16 x i16
; X86-SSE2-NEXT: pxor %xmm2, %xmm1
; X86-SSE2-NEXT: pxor %xmm2, %xmm0
; X86-SSE2-NEXT: pminsw %xmm1, %xmm0
-; X86-SSE2-NEXT: pxor %xmm2, %xmm0
-; X86-SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm0[2,3,0,1]
-; X86-SSE2-NEXT: pxor %xmm2, %xmm0
-; X86-SSE2-NEXT: pxor %xmm2, %xmm1
-; X86-SSE2-NEXT: pminsw %xmm0, %xmm1
+; X86-SSE2-NEXT: movdqa %xmm0, %xmm1
; X86-SSE2-NEXT: pxor %xmm2, %xmm1
-; X86-SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,1,2,3]
+; X86-SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,3,0,1]
; X86-SSE2-NEXT: pxor %xmm2, %xmm1
+; X86-SSE2-NEXT: pminsw %xmm0, %xmm1
+; X86-SSE2-NEXT: movdqa %xmm1, %xmm0
; X86-SSE2-NEXT: pxor %xmm2, %xmm0
-; X86-SSE2-NEXT: pminsw %xmm1, %xmm0
+; X86-SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
; X86-SSE2-NEXT: pxor %xmm2, %xmm0
+; X86-SSE2-NEXT: pminsw %xmm1, %xmm0
; X86-SSE2-NEXT: movdqa %xmm0, %xmm1
+; X86-SSE2-NEXT: pxor %xmm2, %xmm1
; X86-SSE2-NEXT: psrld $16, %xmm1
-; X86-SSE2-NEXT: pxor %xmm2, %xmm0
; X86-SSE2-NEXT: pxor %xmm2, %xmm1
; X86-SSE2-NEXT: pminsw %xmm0, %xmm1
; X86-SSE2-NEXT: pxor %xmm2, %xmm1
@@ -827,20 +824,19 @@ define i16 @test_reduce_v16i16(<16 x i16
; X64-SSE2-NEXT: pxor %xmm2, %xmm1
; X64-SSE2-NEXT: pxor %xmm2, %xmm0
; X64-SSE2-NEXT: pminsw %xmm1, %xmm0
-; X64-SSE2-NEXT: pxor %xmm2, %xmm0
-; X64-SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm0[2,3,0,1]
-; X64-SSE2-NEXT: pxor %xmm2, %xmm0
-; X64-SSE2-NEXT: pxor %xmm2, %xmm1
-; X64-SSE2-NEXT: pminsw %xmm0, %xmm1
+; X64-SSE2-NEXT: movdqa %xmm0, %xmm1
; X64-SSE2-NEXT: pxor %xmm2, %xmm1
-; X64-SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,1,2,3]
+; X64-SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,3,0,1]
; X64-SSE2-NEXT: pxor %xmm2, %xmm1
+; X64-SSE2-NEXT: pminsw %xmm0, %xmm1
+; X64-SSE2-NEXT: movdqa %xmm1, %xmm0
; X64-SSE2-NEXT: pxor %xmm2, %xmm0
-; X64-SSE2-NEXT: pminsw %xmm1, %xmm0
+; X64-SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
; X64-SSE2-NEXT: pxor %xmm2, %xmm0
+; X64-SSE2-NEXT: pminsw %xmm1, %xmm0
; X64-SSE2-NEXT: movdqa %xmm0, %xmm1
+; X64-SSE2-NEXT: pxor %xmm2, %xmm1
; X64-SSE2-NEXT: psrld $16, %xmm1
-; X64-SSE2-NEXT: pxor %xmm2, %xmm0
; X64-SSE2-NEXT: pxor %xmm2, %xmm1
; X64-SSE2-NEXT: pminsw %xmm0, %xmm1
; X64-SSE2-NEXT: pxor %xmm2, %xmm1
@@ -1574,35 +1570,30 @@ define i16 @test_reduce_v32i16(<32 x i16
; X86-SSE2-LABEL: test_reduce_v32i16:
; X86-SSE2: ## %bb.0:
; X86-SSE2-NEXT: movdqa {{.*#+}} xmm4 = [32768,32768,32768,32768,32768,32768,32768,32768]
-; X86-SSE2-NEXT: pxor %xmm4, %xmm2
-; X86-SSE2-NEXT: pxor %xmm4, %xmm0
-; X86-SSE2-NEXT: pminsw %xmm2, %xmm0
; X86-SSE2-NEXT: pxor %xmm4, %xmm3
; X86-SSE2-NEXT: pxor %xmm4, %xmm1
; X86-SSE2-NEXT: pminsw %xmm3, %xmm1
-; X86-SSE2-NEXT: movdqa %xmm4, %xmm2
-; X86-SSE2-NEXT: pxor %xmm4, %xmm2
-; X86-SSE2-NEXT: pxor %xmm2, %xmm1
-; X86-SSE2-NEXT: pxor %xmm0, %xmm2
-; X86-SSE2-NEXT: pminsw %xmm1, %xmm2
-; X86-SSE2-NEXT: pxor %xmm4, %xmm2
-; X86-SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm2[2,3,0,1]
; X86-SSE2-NEXT: pxor %xmm4, %xmm2
; X86-SSE2-NEXT: pxor %xmm4, %xmm0
; X86-SSE2-NEXT: pminsw %xmm2, %xmm0
-; X86-SSE2-NEXT: pxor %xmm4, %xmm0
-; X86-SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm0[1,1,2,3]
-; X86-SSE2-NEXT: pxor %xmm4, %xmm0
+; X86-SSE2-NEXT: pminsw %xmm1, %xmm0
+; X86-SSE2-NEXT: movdqa %xmm0, %xmm1
; X86-SSE2-NEXT: pxor %xmm4, %xmm1
-; X86-SSE2-NEXT: pminsw %xmm0, %xmm1
+; X86-SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,3,0,1]
; X86-SSE2-NEXT: pxor %xmm4, %xmm1
+; X86-SSE2-NEXT: pminsw %xmm0, %xmm1
; X86-SSE2-NEXT: movdqa %xmm1, %xmm0
-; X86-SSE2-NEXT: psrld $16, %xmm0
-; X86-SSE2-NEXT: pxor %xmm4, %xmm1
; X86-SSE2-NEXT: pxor %xmm4, %xmm0
-; X86-SSE2-NEXT: pminsw %xmm1, %xmm0
+; X86-SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
; X86-SSE2-NEXT: pxor %xmm4, %xmm0
-; X86-SSE2-NEXT: movd %xmm0, %eax
+; X86-SSE2-NEXT: pminsw %xmm1, %xmm0
+; X86-SSE2-NEXT: movdqa %xmm0, %xmm1
+; X86-SSE2-NEXT: pxor %xmm4, %xmm1
+; X86-SSE2-NEXT: psrld $16, %xmm1
+; X86-SSE2-NEXT: pxor %xmm4, %xmm1
+; X86-SSE2-NEXT: pminsw %xmm0, %xmm1
+; X86-SSE2-NEXT: pxor %xmm4, %xmm1
+; X86-SSE2-NEXT: movd %xmm1, %eax
; X86-SSE2-NEXT: ## kill: def $ax killed $ax killed $eax
; X86-SSE2-NEXT: retl
;
@@ -1643,35 +1634,30 @@ define i16 @test_reduce_v32i16(<32 x i16
; X64-SSE2-LABEL: test_reduce_v32i16:
; X64-SSE2: ## %bb.0:
; X64-SSE2-NEXT: movdqa {{.*#+}} xmm4 = [32768,32768,32768,32768,32768,32768,32768,32768]
-; X64-SSE2-NEXT: pxor %xmm4, %xmm2
-; X64-SSE2-NEXT: pxor %xmm4, %xmm0
-; X64-SSE2-NEXT: pminsw %xmm2, %xmm0
; X64-SSE2-NEXT: pxor %xmm4, %xmm3
; X64-SSE2-NEXT: pxor %xmm4, %xmm1
; X64-SSE2-NEXT: pminsw %xmm3, %xmm1
-; X64-SSE2-NEXT: movdqa %xmm4, %xmm2
-; X64-SSE2-NEXT: pxor %xmm4, %xmm2
-; X64-SSE2-NEXT: pxor %xmm2, %xmm1
-; X64-SSE2-NEXT: pxor %xmm0, %xmm2
-; X64-SSE2-NEXT: pminsw %xmm1, %xmm2
-; X64-SSE2-NEXT: pxor %xmm4, %xmm2
-; X64-SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm2[2,3,0,1]
; X64-SSE2-NEXT: pxor %xmm4, %xmm2
; X64-SSE2-NEXT: pxor %xmm4, %xmm0
; X64-SSE2-NEXT: pminsw %xmm2, %xmm0
-; X64-SSE2-NEXT: pxor %xmm4, %xmm0
-; X64-SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm0[1,1,2,3]
-; X64-SSE2-NEXT: pxor %xmm4, %xmm0
+; X64-SSE2-NEXT: pminsw %xmm1, %xmm0
+; X64-SSE2-NEXT: movdqa %xmm0, %xmm1
; X64-SSE2-NEXT: pxor %xmm4, %xmm1
-; X64-SSE2-NEXT: pminsw %xmm0, %xmm1
+; X64-SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,3,0,1]
; X64-SSE2-NEXT: pxor %xmm4, %xmm1
+; X64-SSE2-NEXT: pminsw %xmm0, %xmm1
; X64-SSE2-NEXT: movdqa %xmm1, %xmm0
-; X64-SSE2-NEXT: psrld $16, %xmm0
-; X64-SSE2-NEXT: pxor %xmm4, %xmm1
; X64-SSE2-NEXT: pxor %xmm4, %xmm0
-; X64-SSE2-NEXT: pminsw %xmm1, %xmm0
+; X64-SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
; X64-SSE2-NEXT: pxor %xmm4, %xmm0
-; X64-SSE2-NEXT: movd %xmm0, %eax
+; X64-SSE2-NEXT: pminsw %xmm1, %xmm0
+; X64-SSE2-NEXT: movdqa %xmm0, %xmm1
+; X64-SSE2-NEXT: pxor %xmm4, %xmm1
+; X64-SSE2-NEXT: psrld $16, %xmm1
+; X64-SSE2-NEXT: pxor %xmm4, %xmm1
+; X64-SSE2-NEXT: pminsw %xmm0, %xmm1
+; X64-SSE2-NEXT: pxor %xmm4, %xmm1
+; X64-SSE2-NEXT: movd %xmm1, %eax
; X64-SSE2-NEXT: ## kill: def $ax killed $ax killed $eax
; X64-SSE2-NEXT: retq
;
Modified: llvm/trunk/test/CodeGen/X86/known-bits.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/known-bits.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/known-bits.ll (original)
+++ llvm/trunk/test/CodeGen/X86/known-bits.ll Sun Oct 14 18:51:58 2018
@@ -19,7 +19,7 @@ define void @knownbits_zext_in_reg(i8*)
; X32-NEXT: movzbl %al, %eax
; X32-NEXT: vpxor %xmm0, %xmm0, %xmm0
; X32-NEXT: vpinsrd $1, %eax, %xmm0, %xmm1
-; X32-NEXT: vmovdqa {{.*#+}} xmm2 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0]
+; X32-NEXT: vbroadcastss {{.*#+}} xmm2 = [3.57331108E-43,3.57331108E-43,3.57331108E-43,3.57331108E-43]
; X32-NEXT: vpand %xmm2, %xmm1, %xmm1
; X32-NEXT: movzbl %cl, %eax
; X32-NEXT: vpinsrd $1, %eax, %xmm0, %xmm0
@@ -69,7 +69,7 @@ define void @knownbits_zext_in_reg(i8*)
; X64-NEXT: movzbl %cl, %ecx
; X64-NEXT: vpxor %xmm0, %xmm0, %xmm0
; X64-NEXT: vpinsrd $1, %ecx, %xmm0, %xmm1
-; X64-NEXT: vmovdqa {{.*#+}} xmm2 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0]
+; X64-NEXT: vbroadcastss {{.*#+}} xmm2 = [3.57331108E-43,3.57331108E-43,3.57331108E-43,3.57331108E-43]
; X64-NEXT: vpand %xmm2, %xmm1, %xmm1
; X64-NEXT: movzbl %al, %eax
; X64-NEXT: vpinsrd $1, %eax, %xmm0, %xmm0
Modified: llvm/trunk/test/CodeGen/X86/nontemporal-loads.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/nontemporal-loads.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/nontemporal-loads.ll (original)
+++ llvm/trunk/test/CodeGen/X86/nontemporal-loads.ll Sun Oct 14 18:51:58 2018
@@ -1800,35 +1800,23 @@ define <64 x i8> @test_unaligned_v64i8(<
define <16 x i32> @test_masked_v16i32(i8 * %addr, <16 x i32> %old, <16 x i32> %mask1) {
; SSE2-LABEL: test_masked_v16i32:
; SSE2: # %bb.0:
-; SSE2-NEXT: movdqa %xmm0, %xmm10
-; SSE2-NEXT: pxor %xmm12, %xmm12
-; SSE2-NEXT: pcmpeqd %xmm12, %xmm7
-; SSE2-NEXT: pcmpeqd %xmm0, %xmm0
-; SSE2-NEXT: movdqa %xmm7, %xmm8
-; SSE2-NEXT: pxor %xmm0, %xmm8
-; SSE2-NEXT: pcmpeqd %xmm12, %xmm6
-; SSE2-NEXT: movdqa %xmm6, %xmm9
-; SSE2-NEXT: pxor %xmm0, %xmm9
-; SSE2-NEXT: pcmpeqd %xmm12, %xmm5
-; SSE2-NEXT: movdqa %xmm5, %xmm11
-; SSE2-NEXT: pxor %xmm0, %xmm11
-; SSE2-NEXT: pcmpeqd %xmm12, %xmm4
-; SSE2-NEXT: pxor %xmm4, %xmm0
+; SSE2-NEXT: pxor %xmm8, %xmm8
+; SSE2-NEXT: pcmpeqd %xmm8, %xmm7
+; SSE2-NEXT: pcmpeqd %xmm8, %xmm6
+; SSE2-NEXT: pcmpeqd %xmm8, %xmm5
+; SSE2-NEXT: pcmpeqd %xmm8, %xmm4
+; SSE2-NEXT: pand %xmm4, %xmm0
; SSE2-NEXT: pandn (%rdi), %xmm4
-; SSE2-NEXT: pandn %xmm10, %xmm0
; SSE2-NEXT: por %xmm4, %xmm0
+; SSE2-NEXT: pand %xmm5, %xmm1
; SSE2-NEXT: pandn 16(%rdi), %xmm5
-; SSE2-NEXT: pandn %xmm1, %xmm11
-; SSE2-NEXT: por %xmm5, %xmm11
+; SSE2-NEXT: por %xmm5, %xmm1
+; SSE2-NEXT: pand %xmm6, %xmm2
; SSE2-NEXT: pandn 32(%rdi), %xmm6
-; SSE2-NEXT: pandn %xmm2, %xmm9
-; SSE2-NEXT: por %xmm6, %xmm9
+; SSE2-NEXT: por %xmm6, %xmm2
+; SSE2-NEXT: pand %xmm7, %xmm3
; SSE2-NEXT: pandn 48(%rdi), %xmm7
-; SSE2-NEXT: pandn %xmm3, %xmm8
-; SSE2-NEXT: por %xmm7, %xmm8
-; SSE2-NEXT: movdqa %xmm11, %xmm1
-; SSE2-NEXT: movdqa %xmm9, %xmm2
-; SSE2-NEXT: movdqa %xmm8, %xmm3
+; SSE2-NEXT: por %xmm7, %xmm3
; SSE2-NEXT: retq
;
; SSE41-LABEL: test_masked_v16i32:
Modified: llvm/trunk/test/CodeGen/X86/paddus.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/paddus.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/paddus.ll (original)
+++ llvm/trunk/test/CodeGen/X86/paddus.ll Sun Oct 14 18:51:58 2018
@@ -801,22 +801,20 @@ define <8 x i16> @test23(<8 x i16> %x) {
; SSE2-LABEL: test23:
; SSE2: # %bb.0:
; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [32768,32768,32768,32768,32768,32768,32768,32768]
-; SSE2-NEXT: pxor %xmm2, %xmm0
; SSE2-NEXT: pxor %xmm0, %xmm2
-; SSE2-NEXT: movdqa %xmm0, %xmm1
-; SSE2-NEXT: pcmpgtw %xmm2, %xmm1
-; SSE2-NEXT: por %xmm0, %xmm1
+; SSE2-NEXT: movdqa %xmm2, %xmm1
+; SSE2-NEXT: pcmpgtw %xmm0, %xmm1
+; SSE2-NEXT: por %xmm2, %xmm1
; SSE2-NEXT: movdqa %xmm1, %xmm0
; SSE2-NEXT: retq
;
; SSSE3-LABEL: test23:
; SSSE3: # %bb.0:
; SSSE3-NEXT: movdqa {{.*#+}} xmm2 = [32768,32768,32768,32768,32768,32768,32768,32768]
-; SSSE3-NEXT: pxor %xmm2, %xmm0
; SSSE3-NEXT: pxor %xmm0, %xmm2
-; SSSE3-NEXT: movdqa %xmm0, %xmm1
-; SSSE3-NEXT: pcmpgtw %xmm2, %xmm1
-; SSSE3-NEXT: por %xmm0, %xmm1
+; SSSE3-NEXT: movdqa %xmm2, %xmm1
+; SSSE3-NEXT: pcmpgtw %xmm0, %xmm1
+; SSSE3-NEXT: por %xmm2, %xmm1
; SSSE3-NEXT: movdqa %xmm1, %xmm0
; SSSE3-NEXT: retq
;
@@ -1029,37 +1027,33 @@ define <16 x i16> @test28(<16 x i16> %x)
define <16 x i16> @test29(<16 x i16> %x) {
; SSE2-LABEL: test29:
; SSE2: # %bb.0:
-; SSE2-NEXT: movdqa {{.*#+}} xmm4 = [32768,32768,32768,32768,32768,32768,32768,32768]
-; SSE2-NEXT: pxor %xmm4, %xmm0
-; SSE2-NEXT: pxor %xmm4, %xmm1
-; SSE2-NEXT: movdqa %xmm1, %xmm3
-; SSE2-NEXT: pxor %xmm4, %xmm3
-; SSE2-NEXT: movdqa %xmm1, %xmm2
-; SSE2-NEXT: pcmpgtw %xmm3, %xmm2
-; SSE2-NEXT: pxor %xmm0, %xmm4
-; SSE2-NEXT: movdqa %xmm0, %xmm3
-; SSE2-NEXT: pcmpgtw %xmm4, %xmm3
-; SSE2-NEXT: por %xmm0, %xmm3
-; SSE2-NEXT: por %xmm1, %xmm2
-; SSE2-NEXT: movdqa %xmm3, %xmm0
+; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [32768,32768,32768,32768,32768,32768,32768,32768]
+; SSE2-NEXT: movdqa %xmm0, %xmm4
+; SSE2-NEXT: pxor %xmm3, %xmm4
+; SSE2-NEXT: pxor %xmm1, %xmm3
+; SSE2-NEXT: movdqa %xmm3, %xmm2
+; SSE2-NEXT: pcmpgtw %xmm1, %xmm2
+; SSE2-NEXT: movdqa %xmm4, %xmm1
+; SSE2-NEXT: pcmpgtw %xmm0, %xmm1
+; SSE2-NEXT: por %xmm4, %xmm1
+; SSE2-NEXT: por %xmm3, %xmm2
+; SSE2-NEXT: movdqa %xmm1, %xmm0
; SSE2-NEXT: movdqa %xmm2, %xmm1
; SSE2-NEXT: retq
;
; SSSE3-LABEL: test29:
; SSSE3: # %bb.0:
-; SSSE3-NEXT: movdqa {{.*#+}} xmm4 = [32768,32768,32768,32768,32768,32768,32768,32768]
-; SSSE3-NEXT: pxor %xmm4, %xmm0
-; SSSE3-NEXT: pxor %xmm4, %xmm1
-; SSSE3-NEXT: movdqa %xmm1, %xmm3
-; SSSE3-NEXT: pxor %xmm4, %xmm3
-; SSSE3-NEXT: movdqa %xmm1, %xmm2
-; SSSE3-NEXT: pcmpgtw %xmm3, %xmm2
-; SSSE3-NEXT: pxor %xmm0, %xmm4
-; SSSE3-NEXT: movdqa %xmm0, %xmm3
-; SSSE3-NEXT: pcmpgtw %xmm4, %xmm3
-; SSSE3-NEXT: por %xmm0, %xmm3
-; SSSE3-NEXT: por %xmm1, %xmm2
-; SSSE3-NEXT: movdqa %xmm3, %xmm0
+; SSSE3-NEXT: movdqa {{.*#+}} xmm3 = [32768,32768,32768,32768,32768,32768,32768,32768]
+; SSSE3-NEXT: movdqa %xmm0, %xmm4
+; SSSE3-NEXT: pxor %xmm3, %xmm4
+; SSSE3-NEXT: pxor %xmm1, %xmm3
+; SSSE3-NEXT: movdqa %xmm3, %xmm2
+; SSSE3-NEXT: pcmpgtw %xmm1, %xmm2
+; SSSE3-NEXT: movdqa %xmm4, %xmm1
+; SSSE3-NEXT: pcmpgtw %xmm0, %xmm1
+; SSSE3-NEXT: por %xmm4, %xmm1
+; SSSE3-NEXT: por %xmm3, %xmm2
+; SSSE3-NEXT: movdqa %xmm1, %xmm0
; SSSE3-NEXT: movdqa %xmm2, %xmm1
; SSSE3-NEXT: retq
;
@@ -1343,66 +1337,58 @@ define <32 x i16> @test34(<32 x i16> %x)
define <32 x i16> @test35(<32 x i16> %x) {
; SSE2-LABEL: test35:
; SSE2: # %bb.0:
-; SSE2-NEXT: movdqa {{.*#+}} xmm4 = [32768,32768,32768,32768,32768,32768,32768,32768]
-; SSE2-NEXT: pxor %xmm4, %xmm0
-; SSE2-NEXT: pxor %xmm4, %xmm1
-; SSE2-NEXT: pxor %xmm4, %xmm2
-; SSE2-NEXT: pxor %xmm4, %xmm3
-; SSE2-NEXT: movdqa %xmm3, %xmm5
-; SSE2-NEXT: pxor %xmm4, %xmm5
-; SSE2-NEXT: movdqa %xmm3, %xmm8
-; SSE2-NEXT: pcmpgtw %xmm5, %xmm8
-; SSE2-NEXT: movdqa %xmm2, %xmm6
-; SSE2-NEXT: pxor %xmm4, %xmm6
-; SSE2-NEXT: movdqa %xmm2, %xmm5
-; SSE2-NEXT: pcmpgtw %xmm6, %xmm5
+; SSE2-NEXT: movdqa {{.*#+}} xmm5 = [32768,32768,32768,32768,32768,32768,32768,32768]
+; SSE2-NEXT: movdqa %xmm0, %xmm6
+; SSE2-NEXT: pxor %xmm5, %xmm6
; SSE2-NEXT: movdqa %xmm1, %xmm7
-; SSE2-NEXT: pxor %xmm4, %xmm7
-; SSE2-NEXT: movdqa %xmm1, %xmm6
-; SSE2-NEXT: pcmpgtw %xmm7, %xmm6
-; SSE2-NEXT: pxor %xmm0, %xmm4
-; SSE2-NEXT: movdqa %xmm0, %xmm7
-; SSE2-NEXT: pcmpgtw %xmm4, %xmm7
-; SSE2-NEXT: por %xmm0, %xmm7
-; SSE2-NEXT: por %xmm1, %xmm6
-; SSE2-NEXT: por %xmm2, %xmm5
-; SSE2-NEXT: por %xmm3, %xmm8
-; SSE2-NEXT: movdqa %xmm7, %xmm0
-; SSE2-NEXT: movdqa %xmm6, %xmm1
-; SSE2-NEXT: movdqa %xmm5, %xmm2
+; SSE2-NEXT: pxor %xmm5, %xmm7
+; SSE2-NEXT: movdqa %xmm2, %xmm8
+; SSE2-NEXT: pxor %xmm5, %xmm8
+; SSE2-NEXT: pxor %xmm3, %xmm5
+; SSE2-NEXT: movdqa %xmm5, %xmm4
+; SSE2-NEXT: pcmpgtw %xmm3, %xmm4
; SSE2-NEXT: movdqa %xmm8, %xmm3
+; SSE2-NEXT: pcmpgtw %xmm2, %xmm3
+; SSE2-NEXT: movdqa %xmm7, %xmm2
+; SSE2-NEXT: pcmpgtw %xmm1, %xmm2
+; SSE2-NEXT: movdqa %xmm6, %xmm1
+; SSE2-NEXT: pcmpgtw %xmm0, %xmm1
+; SSE2-NEXT: por %xmm6, %xmm1
+; SSE2-NEXT: por %xmm7, %xmm2
+; SSE2-NEXT: por %xmm8, %xmm3
+; SSE2-NEXT: por %xmm5, %xmm4
+; SSE2-NEXT: movdqa %xmm1, %xmm0
+; SSE2-NEXT: movdqa %xmm2, %xmm1
+; SSE2-NEXT: movdqa %xmm3, %xmm2
+; SSE2-NEXT: movdqa %xmm4, %xmm3
; SSE2-NEXT: retq
;
; SSSE3-LABEL: test35:
; SSSE3: # %bb.0:
-; SSSE3-NEXT: movdqa {{.*#+}} xmm4 = [32768,32768,32768,32768,32768,32768,32768,32768]
-; SSSE3-NEXT: pxor %xmm4, %xmm0
-; SSSE3-NEXT: pxor %xmm4, %xmm1
-; SSSE3-NEXT: pxor %xmm4, %xmm2
-; SSSE3-NEXT: pxor %xmm4, %xmm3
-; SSSE3-NEXT: movdqa %xmm3, %xmm5
-; SSSE3-NEXT: pxor %xmm4, %xmm5
-; SSSE3-NEXT: movdqa %xmm3, %xmm8
-; SSSE3-NEXT: pcmpgtw %xmm5, %xmm8
-; SSSE3-NEXT: movdqa %xmm2, %xmm6
-; SSSE3-NEXT: pxor %xmm4, %xmm6
-; SSSE3-NEXT: movdqa %xmm2, %xmm5
-; SSSE3-NEXT: pcmpgtw %xmm6, %xmm5
+; SSSE3-NEXT: movdqa {{.*#+}} xmm5 = [32768,32768,32768,32768,32768,32768,32768,32768]
+; SSSE3-NEXT: movdqa %xmm0, %xmm6
+; SSSE3-NEXT: pxor %xmm5, %xmm6
; SSSE3-NEXT: movdqa %xmm1, %xmm7
-; SSSE3-NEXT: pxor %xmm4, %xmm7
-; SSSE3-NEXT: movdqa %xmm1, %xmm6
-; SSSE3-NEXT: pcmpgtw %xmm7, %xmm6
-; SSSE3-NEXT: pxor %xmm0, %xmm4
-; SSSE3-NEXT: movdqa %xmm0, %xmm7
-; SSSE3-NEXT: pcmpgtw %xmm4, %xmm7
-; SSSE3-NEXT: por %xmm0, %xmm7
-; SSSE3-NEXT: por %xmm1, %xmm6
-; SSSE3-NEXT: por %xmm2, %xmm5
-; SSSE3-NEXT: por %xmm3, %xmm8
-; SSSE3-NEXT: movdqa %xmm7, %xmm0
-; SSSE3-NEXT: movdqa %xmm6, %xmm1
-; SSSE3-NEXT: movdqa %xmm5, %xmm2
+; SSSE3-NEXT: pxor %xmm5, %xmm7
+; SSSE3-NEXT: movdqa %xmm2, %xmm8
+; SSSE3-NEXT: pxor %xmm5, %xmm8
+; SSSE3-NEXT: pxor %xmm3, %xmm5
+; SSSE3-NEXT: movdqa %xmm5, %xmm4
+; SSSE3-NEXT: pcmpgtw %xmm3, %xmm4
; SSSE3-NEXT: movdqa %xmm8, %xmm3
+; SSSE3-NEXT: pcmpgtw %xmm2, %xmm3
+; SSSE3-NEXT: movdqa %xmm7, %xmm2
+; SSSE3-NEXT: pcmpgtw %xmm1, %xmm2
+; SSSE3-NEXT: movdqa %xmm6, %xmm1
+; SSSE3-NEXT: pcmpgtw %xmm0, %xmm1
+; SSSE3-NEXT: por %xmm6, %xmm1
+; SSSE3-NEXT: por %xmm7, %xmm2
+; SSSE3-NEXT: por %xmm8, %xmm3
+; SSSE3-NEXT: por %xmm5, %xmm4
+; SSSE3-NEXT: movdqa %xmm1, %xmm0
+; SSSE3-NEXT: movdqa %xmm2, %xmm1
+; SSSE3-NEXT: movdqa %xmm3, %xmm2
+; SSSE3-NEXT: movdqa %xmm4, %xmm3
; SSSE3-NEXT: retq
;
; SSE41-LABEL: test35:
Modified: llvm/trunk/test/CodeGen/X86/psubus.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/psubus.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/psubus.ll (original)
+++ llvm/trunk/test/CodeGen/X86/psubus.ll Sun Oct 14 18:51:58 2018
@@ -792,7 +792,7 @@ define <16 x i8> @test14(<16 x i8> %x, <
; AVX1-NEXT: vpsubd %xmm9, %xmm1, %xmm1
; AVX1-NEXT: vpsubd %xmm11, %xmm2, %xmm2
; AVX1-NEXT: vpsubd %xmm0, %xmm6, %xmm0
-; AVX1-NEXT: vmovdqa {{.*#+}} xmm5 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0]
+; AVX1-NEXT: vbroadcastss {{.*#+}} xmm5 = [3.57331108E-43,3.57331108E-43,3.57331108E-43,3.57331108E-43]
; AVX1-NEXT: vpand %xmm5, %xmm0, %xmm0
; AVX1-NEXT: vpand %xmm5, %xmm2, %xmm2
; AVX1-NEXT: vpackusdw %xmm0, %xmm2, %xmm0
Modified: llvm/trunk/test/CodeGen/X86/sat-add.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/sat-add.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/sat-add.ll (original)
+++ llvm/trunk/test/CodeGen/X86/sat-add.ll Sun Oct 14 18:51:58 2018
@@ -679,13 +679,12 @@ define <16 x i8> @unsigned_sat_variable_
define <8 x i16> @unsigned_sat_variable_v8i16_using_min(<8 x i16> %x, <8 x i16> %y) {
; SSE2-LABEL: unsigned_sat_variable_v8i16_using_min:
; SSE2: # %bb.0:
-; SSE2-NEXT: pcmpeqd %xmm2, %xmm2
-; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [32768,32768,32768,32768,32768,32768,32768,32768]
-; SSE2-NEXT: pxor %xmm3, %xmm0
-; SSE2-NEXT: pxor %xmm3, %xmm2
-; SSE2-NEXT: pxor %xmm1, %xmm2
-; SSE2-NEXT: pminsw %xmm2, %xmm0
-; SSE2-NEXT: pxor %xmm3, %xmm0
+; SSE2-NEXT: movdqa {{.*#+}} xmm2 = [32768,32768,32768,32768,32768,32768,32768,32768]
+; SSE2-NEXT: pxor %xmm2, %xmm0
+; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [32767,32767,32767,32767,32767,32767,32767,32767]
+; SSE2-NEXT: pxor %xmm1, %xmm3
+; SSE2-NEXT: pminsw %xmm3, %xmm0
+; SSE2-NEXT: pxor %xmm2, %xmm0
; SSE2-NEXT: paddw %xmm1, %xmm0
; SSE2-NEXT: retq
;
@@ -717,15 +716,12 @@ define <8 x i16> @unsigned_sat_variable_
define <8 x i16> @unsigned_sat_variable_v8i16_using_cmp_notval(<8 x i16> %x, <8 x i16> %y) {
; SSE2-LABEL: unsigned_sat_variable_v8i16_using_cmp_notval:
; SSE2: # %bb.0:
-; SSE2-NEXT: pcmpeqd %xmm2, %xmm2
-; SSE2-NEXT: movdqa %xmm0, %xmm3
-; SSE2-NEXT: paddw %xmm1, %xmm3
-; SSE2-NEXT: movdqa {{.*#+}} xmm4 = [32768,32768,32768,32768,32768,32768,32768,32768]
-; SSE2-NEXT: pxor %xmm4, %xmm0
-; SSE2-NEXT: pxor %xmm4, %xmm2
-; SSE2-NEXT: pxor %xmm1, %xmm2
-; SSE2-NEXT: pcmpgtw %xmm2, %xmm0
-; SSE2-NEXT: por %xmm3, %xmm0
+; SSE2-NEXT: movdqa %xmm0, %xmm2
+; SSE2-NEXT: paddw %xmm1, %xmm2
+; SSE2-NEXT: pxor {{.*}}(%rip), %xmm1
+; SSE2-NEXT: pxor {{.*}}(%rip), %xmm0
+; SSE2-NEXT: pcmpgtw %xmm1, %xmm0
+; SSE2-NEXT: por %xmm2, %xmm0
; SSE2-NEXT: retq
;
; SSE41-LABEL: unsigned_sat_variable_v8i16_using_cmp_notval:
@@ -750,17 +746,15 @@ define <4 x i32> @unsigned_sat_variable_
; SSE2-LABEL: unsigned_sat_variable_v4i32_using_min:
; SSE2: # %bb.0:
; SSE2-NEXT: pcmpeqd %xmm2, %xmm2
+; SSE2-NEXT: pxor %xmm1, %xmm2
; SSE2-NEXT: movdqa {{.*#+}} xmm3 = [2147483648,2147483648,2147483648,2147483648]
-; SSE2-NEXT: movdqa %xmm0, %xmm4
-; SSE2-NEXT: pxor %xmm3, %xmm4
-; SSE2-NEXT: pxor %xmm2, %xmm3
-; SSE2-NEXT: pxor %xmm1, %xmm3
-; SSE2-NEXT: pcmpgtd %xmm4, %xmm3
-; SSE2-NEXT: pand %xmm3, %xmm0
-; SSE2-NEXT: pxor %xmm2, %xmm3
-; SSE2-NEXT: movdqa %xmm1, %xmm2
-; SSE2-NEXT: pandn %xmm3, %xmm2
-; SSE2-NEXT: por %xmm2, %xmm0
+; SSE2-NEXT: pxor %xmm0, %xmm3
+; SSE2-NEXT: movdqa {{.*#+}} xmm4 = [2147483647,2147483647,2147483647,2147483647]
+; SSE2-NEXT: pxor %xmm1, %xmm4
+; SSE2-NEXT: pcmpgtd %xmm3, %xmm4
+; SSE2-NEXT: pand %xmm4, %xmm0
+; SSE2-NEXT: pandn %xmm2, %xmm4
+; SSE2-NEXT: por %xmm4, %xmm0
; SSE2-NEXT: paddd %xmm1, %xmm0
; SSE2-NEXT: retq
;
@@ -809,15 +803,12 @@ define <4 x i32> @unsigned_sat_variable_
define <4 x i32> @unsigned_sat_variable_v4i32_using_cmp_notval(<4 x i32> %x, <4 x i32> %y) {
; SSE2-LABEL: unsigned_sat_variable_v4i32_using_cmp_notval:
; SSE2: # %bb.0:
-; SSE2-NEXT: pcmpeqd %xmm2, %xmm2
-; SSE2-NEXT: movdqa %xmm0, %xmm3
-; SSE2-NEXT: paddd %xmm1, %xmm3
-; SSE2-NEXT: movdqa {{.*#+}} xmm4 = [2147483648,2147483648,2147483648,2147483648]
-; SSE2-NEXT: pxor %xmm4, %xmm0
-; SSE2-NEXT: pxor %xmm4, %xmm2
-; SSE2-NEXT: pxor %xmm1, %xmm2
-; SSE2-NEXT: pcmpgtd %xmm2, %xmm0
-; SSE2-NEXT: por %xmm3, %xmm0
+; SSE2-NEXT: movdqa %xmm0, %xmm2
+; SSE2-NEXT: paddd %xmm1, %xmm2
+; SSE2-NEXT: pxor {{.*}}(%rip), %xmm1
+; SSE2-NEXT: pxor {{.*}}(%rip), %xmm0
+; SSE2-NEXT: pcmpgtd %xmm1, %xmm0
+; SSE2-NEXT: por %xmm2, %xmm0
; SSE2-NEXT: retq
;
; SSE41-LABEL: unsigned_sat_variable_v4i32_using_cmp_notval:
Modified: llvm/trunk/test/CodeGen/X86/setcc-lowering.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/setcc-lowering.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/setcc-lowering.ll (original)
+++ llvm/trunk/test/CodeGen/X86/setcc-lowering.ll Sun Oct 14 18:51:58 2018
@@ -45,19 +45,17 @@ define void @pr26232(i64 %a, <16 x i1> %
; AVX-LABEL: pr26232:
; AVX: # %bb.0: # %allocas
; AVX-NEXT: vpxor %xmm1, %xmm1, %xmm1
-; AVX-NEXT: vmovdqa {{.*#+}} xmm2 = [128,128,128,128,128,128,128,128,128,128,128,128,128,128,128,128]
; AVX-NEXT: .p2align 4, 0x90
; AVX-NEXT: .LBB1_1: # %for_loop599
; AVX-NEXT: # =>This Inner Loop Header: Depth=1
; AVX-NEXT: xorl %eax, %eax
; AVX-NEXT: cmpq $65536, %rdi # imm = 0x10000
; AVX-NEXT: setl %al
-; AVX-NEXT: vmovd %eax, %xmm3
-; AVX-NEXT: vpshufb %xmm1, %xmm3, %xmm3
-; AVX-NEXT: vpand %xmm0, %xmm3, %xmm3
-; AVX-NEXT: vpsllw $7, %xmm3, %xmm3
-; AVX-NEXT: vpand %xmm2, %xmm3, %xmm3
-; AVX-NEXT: vpmovmskb %xmm3, %eax
+; AVX-NEXT: vmovd %eax, %xmm2
+; AVX-NEXT: vpshufb %xmm1, %xmm2, %xmm2
+; AVX-NEXT: vpand %xmm0, %xmm2, %xmm2
+; AVX-NEXT: vpsllw $7, %xmm2, %xmm2
+; AVX-NEXT: vpmovmskb %xmm2, %eax
; AVX-NEXT: testw %ax, %ax
; AVX-NEXT: jne .LBB1_1
; AVX-NEXT: # %bb.2: # %for_exit600
Modified: llvm/trunk/test/CodeGen/X86/sse2-intrinsics-canonical.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/sse2-intrinsics-canonical.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/sse2-intrinsics-canonical.ll (original)
+++ llvm/trunk/test/CodeGen/X86/sse2-intrinsics-canonical.ll Sun Oct 14 18:51:58 2018
@@ -198,9 +198,9 @@ define <8 x i8> @test_x86_sse2_psubus_b_
;
; AVX2-LABEL: test_x86_sse2_psubus_b_64:
; AVX2: ## %bb.0:
-; AVX2-NEXT: vmovdqa {{.*#+}} xmm2 = [255,0,255,0,255,0,255,0,255,0,255,0,255,0,255,0]
-; AVX2-NEXT: ## encoding: [0xc5,0xf9,0x6f,0x15,A,A,A,A]
-; AVX2-NEXT: ## fixup A - offset: 4, value: LCPI6_0, kind: FK_Data_4
+; AVX2-NEXT: vpbroadcastw {{.*#+}} xmm2 = [255,255,255,255,255,255,255,255]
+; AVX2-NEXT: ## encoding: [0xc4,0xe2,0x79,0x79,0x15,A,A,A,A]
+; AVX2-NEXT: ## fixup A - offset: 5, value: LCPI6_0, kind: FK_Data_4
; AVX2-NEXT: vpand %xmm2, %xmm1, %xmm3 ## encoding: [0xc5,0xf1,0xdb,0xda]
; AVX2-NEXT: vpand %xmm2, %xmm0, %xmm0 ## encoding: [0xc5,0xf9,0xdb,0xc2]
; AVX2-NEXT: vpmaxuw %xmm3, %xmm0, %xmm0 ## encoding: [0xc4,0xe2,0x79,0x3e,0xc3]
@@ -209,9 +209,9 @@ define <8 x i8> @test_x86_sse2_psubus_b_
;
; SKX-LABEL: test_x86_sse2_psubus_b_64:
; SKX: ## %bb.0:
-; SKX-NEXT: vmovdqa LCPI6_0, %xmm2 ## EVEX TO VEX Compression xmm2 = [255,0,255,0,255,0,255,0,255,0,255,0,255,0,255,0]
-; SKX-NEXT: ## encoding: [0xc5,0xf9,0x6f,0x15,A,A,A,A]
-; SKX-NEXT: ## fixup A - offset: 4, value: LCPI6_0, kind: FK_Data_4
+; SKX-NEXT: vpbroadcastw LCPI6_0, %xmm2 ## EVEX TO VEX Compression xmm2 = [255,255,255,255,255,255,255,255]
+; SKX-NEXT: ## encoding: [0xc4,0xe2,0x79,0x79,0x15,A,A,A,A]
+; SKX-NEXT: ## fixup A - offset: 5, value: LCPI6_0, kind: FK_Data_4
; SKX-NEXT: vpand %xmm2, %xmm1, %xmm3 ## EVEX TO VEX Compression encoding: [0xc5,0xf1,0xdb,0xda]
; SKX-NEXT: vpand %xmm2, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc5,0xf9,0xdb,0xc2]
; SKX-NEXT: vpmaxuw %xmm3, %xmm0, %xmm0 ## EVEX TO VEX Compression encoding: [0xc4,0xe2,0x79,0x3e,0xc3]
Modified: llvm/trunk/test/CodeGen/X86/unfold-masked-merge-vector-variablemask-const.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/unfold-masked-merge-vector-variablemask-const.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/unfold-masked-merge-vector-variablemask-const.ll (original)
+++ llvm/trunk/test/CodeGen/X86/unfold-masked-merge-vector-variablemask-const.ll Sun Oct 14 18:51:58 2018
@@ -132,9 +132,9 @@ define <4 x i32> @in_constant_varx_mone_
;
; CHECK-SSE2-LABEL: in_constant_varx_mone_invmask:
; CHECK-SSE2: # %bb.0:
-; CHECK-SSE2-NEXT: movdqa (%rdi), %xmm0
+; CHECK-SSE2-NEXT: movdqa (%rdx), %xmm0
; CHECK-SSE2-NEXT: pcmpeqd %xmm1, %xmm1
-; CHECK-SSE2-NEXT: movdqa (%rdx), %xmm2
+; CHECK-SSE2-NEXT: movdqa (%rdi), %xmm2
; CHECK-SSE2-NEXT: pxor %xmm1, %xmm2
; CHECK-SSE2-NEXT: pandn %xmm2, %xmm0
; CHECK-SSE2-NEXT: pxor %xmm1, %xmm0
@@ -142,9 +142,9 @@ define <4 x i32> @in_constant_varx_mone_
;
; CHECK-XOP-LABEL: in_constant_varx_mone_invmask:
; CHECK-XOP: # %bb.0:
-; CHECK-XOP-NEXT: vmovdqa (%rdi), %xmm0
+; CHECK-XOP-NEXT: vmovdqa (%rdx), %xmm0
; CHECK-XOP-NEXT: vpcmpeqd %xmm1, %xmm1, %xmm1
-; CHECK-XOP-NEXT: vpxor (%rdx), %xmm1, %xmm2
+; CHECK-XOP-NEXT: vpxor (%rdi), %xmm1, %xmm2
; CHECK-XOP-NEXT: vpandn %xmm2, %xmm0, %xmm0
; CHECK-XOP-NEXT: vpxor %xmm1, %xmm0, %xmm0
; CHECK-XOP-NEXT: retq
Modified: llvm/trunk/test/CodeGen/X86/v8i1-masks.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/v8i1-masks.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/v8i1-masks.ll (original)
+++ llvm/trunk/test/CodeGen/X86/v8i1-masks.ll Sun Oct 14 18:51:58 2018
@@ -44,10 +44,9 @@ define void @and_masks(<8 x float>* %a,
; X32-AVX2-NEXT: vcmpltps %ymm0, %ymm1, %ymm1
; X32-AVX2-NEXT: vmovups (%eax), %ymm2
; X32-AVX2-NEXT: vcmpltps %ymm0, %ymm2, %ymm0
-; X32-AVX2-NEXT: vbroadcastss {{.*#+}} ymm2 = [1,1,1,1,1,1,1,1]
-; X32-AVX2-NEXT: vandps %ymm2, %ymm1, %ymm1
; X32-AVX2-NEXT: vandps %ymm1, %ymm0, %ymm0
-; X32-AVX2-NEXT: vmovaps %ymm0, (%eax)
+; X32-AVX2-NEXT: vpsrld $31, %ymm0, %ymm0
+; X32-AVX2-NEXT: vmovdqa %ymm0, (%eax)
; X32-AVX2-NEXT: vzeroupper
; X32-AVX2-NEXT: retl
;
@@ -58,10 +57,9 @@ define void @and_masks(<8 x float>* %a,
; X64-AVX2-NEXT: vcmpltps %ymm0, %ymm1, %ymm1
; X64-AVX2-NEXT: vmovups (%rdx), %ymm2
; X64-AVX2-NEXT: vcmpltps %ymm0, %ymm2, %ymm0
-; X64-AVX2-NEXT: vbroadcastss {{.*#+}} ymm2 = [1,1,1,1,1,1,1,1]
-; X64-AVX2-NEXT: vandps %ymm2, %ymm1, %ymm1
; X64-AVX2-NEXT: vandps %ymm1, %ymm0, %ymm0
-; X64-AVX2-NEXT: vmovaps %ymm0, (%rax)
+; X64-AVX2-NEXT: vpsrld $31, %ymm0, %ymm0
+; X64-AVX2-NEXT: vmovdqa %ymm0, (%rax)
; X64-AVX2-NEXT: vzeroupper
; X64-AVX2-NEXT: retq
%v0 = load <8 x float>, <8 x float>* %a, align 16
Modified: llvm/trunk/test/CodeGen/X86/vector-blend.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-blend.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/vector-blend.ll (original)
+++ llvm/trunk/test/CodeGen/X86/vector-blend.ll Sun Oct 14 18:51:58 2018
@@ -629,7 +629,7 @@ define <32 x i8> @constant_pblendvb_avx2
;
; AVX1-LABEL: constant_pblendvb_avx2:
; AVX1: # %bb.0: # %entry
-; AVX1-NEXT: vmovaps {{.*#+}} ymm2 = [255,255,0,255,0,0,0,255,255,255,0,255,0,0,0,255,255,255,0,255,0,0,0,255,255,255,0,255,0,0,0,255]
+; AVX1-NEXT: vbroadcastsd {{.*#+}} ymm2 = [-5.4861292804117373E+303,-5.4861292804117373E+303,-5.4861292804117373E+303,-5.4861292804117373E+303]
; AVX1-NEXT: vandnps %ymm0, %ymm2, %ymm0
; AVX1-NEXT: vandps %ymm2, %ymm1, %ymm1
; AVX1-NEXT: vorps %ymm0, %ymm1, %ymm0
Modified: llvm/trunk/test/CodeGen/X86/vector-reduce-umax.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-reduce-umax.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/vector-reduce-umax.ll (original)
+++ llvm/trunk/test/CodeGen/X86/vector-reduce-umax.ll Sun Oct 14 18:51:58 2018
@@ -1141,15 +1141,14 @@ define i16 @test_v8i16(<8 x i16> %a0) {
; SSE2-NEXT: pxor %xmm2, %xmm0
; SSE2-NEXT: pxor %xmm2, %xmm1
; SSE2-NEXT: pmaxsw %xmm0, %xmm1
-; SSE2-NEXT: pxor %xmm2, %xmm1
-; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,1,2,3]
-; SSE2-NEXT: pxor %xmm2, %xmm1
+; SSE2-NEXT: movdqa %xmm1, %xmm0
; SSE2-NEXT: pxor %xmm2, %xmm0
-; SSE2-NEXT: pmaxsw %xmm1, %xmm0
+; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
; SSE2-NEXT: pxor %xmm2, %xmm0
+; SSE2-NEXT: pmaxsw %xmm1, %xmm0
; SSE2-NEXT: movdqa %xmm0, %xmm1
+; SSE2-NEXT: pxor %xmm2, %xmm1
; SSE2-NEXT: psrld $16, %xmm1
-; SSE2-NEXT: pxor %xmm2, %xmm0
; SSE2-NEXT: pxor %xmm2, %xmm1
; SSE2-NEXT: pmaxsw %xmm0, %xmm1
; SSE2-NEXT: pxor %xmm2, %xmm1
@@ -1207,20 +1206,19 @@ define i16 @test_v16i16(<16 x i16> %a0)
; SSE2-NEXT: pxor %xmm2, %xmm1
; SSE2-NEXT: pxor %xmm2, %xmm0
; SSE2-NEXT: pmaxsw %xmm1, %xmm0
-; SSE2-NEXT: pxor %xmm2, %xmm0
-; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm0[2,3,0,1]
-; SSE2-NEXT: pxor %xmm2, %xmm0
-; SSE2-NEXT: pxor %xmm2, %xmm1
-; SSE2-NEXT: pmaxsw %xmm0, %xmm1
+; SSE2-NEXT: movdqa %xmm0, %xmm1
; SSE2-NEXT: pxor %xmm2, %xmm1
-; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,1,2,3]
+; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,3,0,1]
; SSE2-NEXT: pxor %xmm2, %xmm1
+; SSE2-NEXT: pmaxsw %xmm0, %xmm1
+; SSE2-NEXT: movdqa %xmm1, %xmm0
; SSE2-NEXT: pxor %xmm2, %xmm0
-; SSE2-NEXT: pmaxsw %xmm1, %xmm0
+; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
; SSE2-NEXT: pxor %xmm2, %xmm0
+; SSE2-NEXT: pmaxsw %xmm1, %xmm0
; SSE2-NEXT: movdqa %xmm0, %xmm1
+; SSE2-NEXT: pxor %xmm2, %xmm1
; SSE2-NEXT: psrld $16, %xmm1
-; SSE2-NEXT: pxor %xmm2, %xmm0
; SSE2-NEXT: pxor %xmm2, %xmm1
; SSE2-NEXT: pmaxsw %xmm0, %xmm1
; SSE2-NEXT: pxor %xmm2, %xmm1
@@ -1296,35 +1294,30 @@ define i16 @test_v32i16(<32 x i16> %a0)
; SSE2-LABEL: test_v32i16:
; SSE2: # %bb.0:
; SSE2-NEXT: movdqa {{.*#+}} xmm4 = [32768,32768,32768,32768,32768,32768,32768,32768]
-; SSE2-NEXT: pxor %xmm4, %xmm2
-; SSE2-NEXT: pxor %xmm4, %xmm0
-; SSE2-NEXT: pmaxsw %xmm2, %xmm0
; SSE2-NEXT: pxor %xmm4, %xmm3
; SSE2-NEXT: pxor %xmm4, %xmm1
; SSE2-NEXT: pmaxsw %xmm3, %xmm1
-; SSE2-NEXT: movdqa %xmm4, %xmm2
-; SSE2-NEXT: pxor %xmm4, %xmm2
-; SSE2-NEXT: pxor %xmm2, %xmm1
-; SSE2-NEXT: pxor %xmm0, %xmm2
-; SSE2-NEXT: pmaxsw %xmm1, %xmm2
-; SSE2-NEXT: pxor %xmm4, %xmm2
-; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm2[2,3,0,1]
; SSE2-NEXT: pxor %xmm4, %xmm2
; SSE2-NEXT: pxor %xmm4, %xmm0
; SSE2-NEXT: pmaxsw %xmm2, %xmm0
-; SSE2-NEXT: pxor %xmm4, %xmm0
-; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm0[1,1,2,3]
-; SSE2-NEXT: pxor %xmm4, %xmm0
+; SSE2-NEXT: pmaxsw %xmm1, %xmm0
+; SSE2-NEXT: movdqa %xmm0, %xmm1
; SSE2-NEXT: pxor %xmm4, %xmm1
-; SSE2-NEXT: pmaxsw %xmm0, %xmm1
+; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,3,0,1]
; SSE2-NEXT: pxor %xmm4, %xmm1
+; SSE2-NEXT: pmaxsw %xmm0, %xmm1
; SSE2-NEXT: movdqa %xmm1, %xmm0
-; SSE2-NEXT: psrld $16, %xmm0
-; SSE2-NEXT: pxor %xmm4, %xmm1
; SSE2-NEXT: pxor %xmm4, %xmm0
-; SSE2-NEXT: pmaxsw %xmm1, %xmm0
+; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
; SSE2-NEXT: pxor %xmm4, %xmm0
-; SSE2-NEXT: movd %xmm0, %eax
+; SSE2-NEXT: pmaxsw %xmm1, %xmm0
+; SSE2-NEXT: movdqa %xmm0, %xmm1
+; SSE2-NEXT: pxor %xmm4, %xmm1
+; SSE2-NEXT: psrld $16, %xmm1
+; SSE2-NEXT: pxor %xmm4, %xmm1
+; SSE2-NEXT: pmaxsw %xmm0, %xmm1
+; SSE2-NEXT: pxor %xmm4, %xmm1
+; SSE2-NEXT: movd %xmm1, %eax
; SSE2-NEXT: # kill: def $ax killed $ax killed $eax
; SSE2-NEXT: retq
;
@@ -1406,47 +1399,38 @@ define i16 @test_v64i16(<64 x i16> %a0)
; SSE2-LABEL: test_v64i16:
; SSE2: # %bb.0:
; SSE2-NEXT: movdqa {{.*#+}} xmm8 = [32768,32768,32768,32768,32768,32768,32768,32768]
-; SSE2-NEXT: pxor %xmm8, %xmm5
-; SSE2-NEXT: pxor %xmm8, %xmm1
-; SSE2-NEXT: pmaxsw %xmm5, %xmm1
-; SSE2-NEXT: pxor %xmm8, %xmm7
-; SSE2-NEXT: pxor %xmm8, %xmm3
-; SSE2-NEXT: pmaxsw %xmm7, %xmm3
-; SSE2-NEXT: pxor %xmm8, %xmm4
-; SSE2-NEXT: pxor %xmm8, %xmm0
-; SSE2-NEXT: pmaxsw %xmm4, %xmm0
; SSE2-NEXT: pxor %xmm8, %xmm6
; SSE2-NEXT: pxor %xmm8, %xmm2
; SSE2-NEXT: pmaxsw %xmm6, %xmm2
-; SSE2-NEXT: movdqa %xmm8, %xmm4
; SSE2-NEXT: pxor %xmm8, %xmm4
-; SSE2-NEXT: pxor %xmm4, %xmm2
-; SSE2-NEXT: pxor %xmm4, %xmm0
+; SSE2-NEXT: pxor %xmm8, %xmm0
+; SSE2-NEXT: pmaxsw %xmm4, %xmm0
; SSE2-NEXT: pmaxsw %xmm2, %xmm0
-; SSE2-NEXT: pxor %xmm4, %xmm3
-; SSE2-NEXT: pxor %xmm4, %xmm1
+; SSE2-NEXT: pxor %xmm8, %xmm7
+; SSE2-NEXT: pxor %xmm8, %xmm3
+; SSE2-NEXT: pmaxsw %xmm7, %xmm3
+; SSE2-NEXT: pxor %xmm8, %xmm5
+; SSE2-NEXT: pxor %xmm8, %xmm1
+; SSE2-NEXT: pmaxsw %xmm5, %xmm1
; SSE2-NEXT: pmaxsw %xmm3, %xmm1
-; SSE2-NEXT: pxor %xmm4, %xmm1
-; SSE2-NEXT: pxor %xmm4, %xmm0
-; SSE2-NEXT: pmaxsw %xmm1, %xmm0
+; SSE2-NEXT: pmaxsw %xmm0, %xmm1
+; SSE2-NEXT: movdqa %xmm1, %xmm0
; SSE2-NEXT: pxor %xmm8, %xmm0
-; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm0[2,3,0,1]
+; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
; SSE2-NEXT: pxor %xmm8, %xmm0
+; SSE2-NEXT: pmaxsw %xmm1, %xmm0
+; SSE2-NEXT: movdqa %xmm0, %xmm1
; SSE2-NEXT: pxor %xmm8, %xmm1
-; SSE2-NEXT: pmaxsw %xmm0, %xmm1
-; SSE2-NEXT: pxor %xmm8, %xmm1
-; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,1,2,3]
+; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[1,1,2,3]
; SSE2-NEXT: pxor %xmm8, %xmm1
+; SSE2-NEXT: pmaxsw %xmm0, %xmm1
+; SSE2-NEXT: movdqa %xmm1, %xmm0
; SSE2-NEXT: pxor %xmm8, %xmm0
-; SSE2-NEXT: pmaxsw %xmm1, %xmm0
+; SSE2-NEXT: psrld $16, %xmm0
; SSE2-NEXT: pxor %xmm8, %xmm0
-; SSE2-NEXT: movdqa %xmm0, %xmm1
-; SSE2-NEXT: psrld $16, %xmm1
+; SSE2-NEXT: pmaxsw %xmm1, %xmm0
; SSE2-NEXT: pxor %xmm8, %xmm0
-; SSE2-NEXT: pxor %xmm8, %xmm1
-; SSE2-NEXT: pmaxsw %xmm0, %xmm1
-; SSE2-NEXT: pxor %xmm8, %xmm1
-; SSE2-NEXT: movd %xmm1, %eax
+; SSE2-NEXT: movd %xmm0, %eax
; SSE2-NEXT: # kill: def $ax killed $ax killed $eax
; SSE2-NEXT: retq
;
Modified: llvm/trunk/test/CodeGen/X86/vector-reduce-umin.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-reduce-umin.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/vector-reduce-umin.ll (original)
+++ llvm/trunk/test/CodeGen/X86/vector-reduce-umin.ll Sun Oct 14 18:51:58 2018
@@ -1140,15 +1140,14 @@ define i16 @test_v8i16(<8 x i16> %a0) {
; SSE2-NEXT: pxor %xmm2, %xmm0
; SSE2-NEXT: pxor %xmm2, %xmm1
; SSE2-NEXT: pminsw %xmm0, %xmm1
-; SSE2-NEXT: pxor %xmm2, %xmm1
-; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,1,2,3]
-; SSE2-NEXT: pxor %xmm2, %xmm1
+; SSE2-NEXT: movdqa %xmm1, %xmm0
; SSE2-NEXT: pxor %xmm2, %xmm0
-; SSE2-NEXT: pminsw %xmm1, %xmm0
+; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
; SSE2-NEXT: pxor %xmm2, %xmm0
+; SSE2-NEXT: pminsw %xmm1, %xmm0
; SSE2-NEXT: movdqa %xmm0, %xmm1
+; SSE2-NEXT: pxor %xmm2, %xmm1
; SSE2-NEXT: psrld $16, %xmm1
-; SSE2-NEXT: pxor %xmm2, %xmm0
; SSE2-NEXT: pxor %xmm2, %xmm1
; SSE2-NEXT: pminsw %xmm0, %xmm1
; SSE2-NEXT: pxor %xmm2, %xmm1
@@ -1187,20 +1186,19 @@ define i16 @test_v16i16(<16 x i16> %a0)
; SSE2-NEXT: pxor %xmm2, %xmm1
; SSE2-NEXT: pxor %xmm2, %xmm0
; SSE2-NEXT: pminsw %xmm1, %xmm0
-; SSE2-NEXT: pxor %xmm2, %xmm0
-; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm0[2,3,0,1]
-; SSE2-NEXT: pxor %xmm2, %xmm0
-; SSE2-NEXT: pxor %xmm2, %xmm1
-; SSE2-NEXT: pminsw %xmm0, %xmm1
+; SSE2-NEXT: movdqa %xmm0, %xmm1
; SSE2-NEXT: pxor %xmm2, %xmm1
-; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,1,2,3]
+; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,3,0,1]
; SSE2-NEXT: pxor %xmm2, %xmm1
+; SSE2-NEXT: pminsw %xmm0, %xmm1
+; SSE2-NEXT: movdqa %xmm1, %xmm0
; SSE2-NEXT: pxor %xmm2, %xmm0
-; SSE2-NEXT: pminsw %xmm1, %xmm0
+; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
; SSE2-NEXT: pxor %xmm2, %xmm0
+; SSE2-NEXT: pminsw %xmm1, %xmm0
; SSE2-NEXT: movdqa %xmm0, %xmm1
+; SSE2-NEXT: pxor %xmm2, %xmm1
; SSE2-NEXT: psrld $16, %xmm1
-; SSE2-NEXT: pxor %xmm2, %xmm0
; SSE2-NEXT: pxor %xmm2, %xmm1
; SSE2-NEXT: pminsw %xmm0, %xmm1
; SSE2-NEXT: pxor %xmm2, %xmm1
@@ -1253,35 +1251,30 @@ define i16 @test_v32i16(<32 x i16> %a0)
; SSE2-LABEL: test_v32i16:
; SSE2: # %bb.0:
; SSE2-NEXT: movdqa {{.*#+}} xmm4 = [32768,32768,32768,32768,32768,32768,32768,32768]
-; SSE2-NEXT: pxor %xmm4, %xmm2
-; SSE2-NEXT: pxor %xmm4, %xmm0
-; SSE2-NEXT: pminsw %xmm2, %xmm0
; SSE2-NEXT: pxor %xmm4, %xmm3
; SSE2-NEXT: pxor %xmm4, %xmm1
; SSE2-NEXT: pminsw %xmm3, %xmm1
-; SSE2-NEXT: movdqa %xmm4, %xmm2
-; SSE2-NEXT: pxor %xmm4, %xmm2
-; SSE2-NEXT: pxor %xmm2, %xmm1
-; SSE2-NEXT: pxor %xmm0, %xmm2
-; SSE2-NEXT: pminsw %xmm1, %xmm2
-; SSE2-NEXT: pxor %xmm4, %xmm2
-; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm2[2,3,0,1]
; SSE2-NEXT: pxor %xmm4, %xmm2
; SSE2-NEXT: pxor %xmm4, %xmm0
; SSE2-NEXT: pminsw %xmm2, %xmm0
-; SSE2-NEXT: pxor %xmm4, %xmm0
-; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm0[1,1,2,3]
-; SSE2-NEXT: pxor %xmm4, %xmm0
+; SSE2-NEXT: pminsw %xmm1, %xmm0
+; SSE2-NEXT: movdqa %xmm0, %xmm1
; SSE2-NEXT: pxor %xmm4, %xmm1
-; SSE2-NEXT: pminsw %xmm0, %xmm1
+; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[2,3,0,1]
; SSE2-NEXT: pxor %xmm4, %xmm1
+; SSE2-NEXT: pminsw %xmm0, %xmm1
; SSE2-NEXT: movdqa %xmm1, %xmm0
-; SSE2-NEXT: psrld $16, %xmm0
-; SSE2-NEXT: pxor %xmm4, %xmm1
; SSE2-NEXT: pxor %xmm4, %xmm0
-; SSE2-NEXT: pminsw %xmm1, %xmm0
+; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,2,3]
; SSE2-NEXT: pxor %xmm4, %xmm0
-; SSE2-NEXT: movd %xmm0, %eax
+; SSE2-NEXT: pminsw %xmm1, %xmm0
+; SSE2-NEXT: movdqa %xmm0, %xmm1
+; SSE2-NEXT: pxor %xmm4, %xmm1
+; SSE2-NEXT: psrld $16, %xmm1
+; SSE2-NEXT: pxor %xmm4, %xmm1
+; SSE2-NEXT: pminsw %xmm0, %xmm1
+; SSE2-NEXT: pxor %xmm4, %xmm1
+; SSE2-NEXT: movd %xmm1, %eax
; SSE2-NEXT: # kill: def $ax killed $ax killed $eax
; SSE2-NEXT: retq
;
@@ -1338,47 +1331,38 @@ define i16 @test_v64i16(<64 x i16> %a0)
; SSE2-LABEL: test_v64i16:
; SSE2: # %bb.0:
; SSE2-NEXT: movdqa {{.*#+}} xmm8 = [32768,32768,32768,32768,32768,32768,32768,32768]
-; SSE2-NEXT: pxor %xmm8, %xmm5
-; SSE2-NEXT: pxor %xmm8, %xmm1
-; SSE2-NEXT: pminsw %xmm5, %xmm1
-; SSE2-NEXT: pxor %xmm8, %xmm7
-; SSE2-NEXT: pxor %xmm8, %xmm3
-; SSE2-NEXT: pminsw %xmm7, %xmm3
-; SSE2-NEXT: pxor %xmm8, %xmm4
-; SSE2-NEXT: pxor %xmm8, %xmm0
-; SSE2-NEXT: pminsw %xmm4, %xmm0
; SSE2-NEXT: pxor %xmm8, %xmm6
; SSE2-NEXT: pxor %xmm8, %xmm2
; SSE2-NEXT: pminsw %xmm6, %xmm2
-; SSE2-NEXT: movdqa %xmm8, %xmm4
; SSE2-NEXT: pxor %xmm8, %xmm4
-; SSE2-NEXT: pxor %xmm4, %xmm2
-; SSE2-NEXT: pxor %xmm4, %xmm0
+; SSE2-NEXT: pxor %xmm8, %xmm0
+; SSE2-NEXT: pminsw %xmm4, %xmm0
; SSE2-NEXT: pminsw %xmm2, %xmm0
-; SSE2-NEXT: pxor %xmm4, %xmm3
-; SSE2-NEXT: pxor %xmm4, %xmm1
+; SSE2-NEXT: pxor %xmm8, %xmm7
+; SSE2-NEXT: pxor %xmm8, %xmm3
+; SSE2-NEXT: pminsw %xmm7, %xmm3
+; SSE2-NEXT: pxor %xmm8, %xmm5
+; SSE2-NEXT: pxor %xmm8, %xmm1
+; SSE2-NEXT: pminsw %xmm5, %xmm1
; SSE2-NEXT: pminsw %xmm3, %xmm1
-; SSE2-NEXT: pxor %xmm4, %xmm1
-; SSE2-NEXT: pxor %xmm4, %xmm0
-; SSE2-NEXT: pminsw %xmm1, %xmm0
+; SSE2-NEXT: pminsw %xmm0, %xmm1
+; SSE2-NEXT: movdqa %xmm1, %xmm0
; SSE2-NEXT: pxor %xmm8, %xmm0
-; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm0[2,3,0,1]
+; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
; SSE2-NEXT: pxor %xmm8, %xmm0
+; SSE2-NEXT: pminsw %xmm1, %xmm0
+; SSE2-NEXT: movdqa %xmm0, %xmm1
; SSE2-NEXT: pxor %xmm8, %xmm1
-; SSE2-NEXT: pminsw %xmm0, %xmm1
-; SSE2-NEXT: pxor %xmm8, %xmm1
-; SSE2-NEXT: pshufd {{.*#+}} xmm0 = xmm1[1,1,2,3]
+; SSE2-NEXT: pshufd {{.*#+}} xmm1 = xmm1[1,1,2,3]
; SSE2-NEXT: pxor %xmm8, %xmm1
+; SSE2-NEXT: pminsw %xmm0, %xmm1
+; SSE2-NEXT: movdqa %xmm1, %xmm0
; SSE2-NEXT: pxor %xmm8, %xmm0
-; SSE2-NEXT: pminsw %xmm1, %xmm0
+; SSE2-NEXT: psrld $16, %xmm0
; SSE2-NEXT: pxor %xmm8, %xmm0
-; SSE2-NEXT: movdqa %xmm0, %xmm1
-; SSE2-NEXT: psrld $16, %xmm1
+; SSE2-NEXT: pminsw %xmm1, %xmm0
; SSE2-NEXT: pxor %xmm8, %xmm0
-; SSE2-NEXT: pxor %xmm8, %xmm1
-; SSE2-NEXT: pminsw %xmm0, %xmm1
-; SSE2-NEXT: pxor %xmm8, %xmm1
-; SSE2-NEXT: movd %xmm1, %eax
+; SSE2-NEXT: movd %xmm0, %eax
; SSE2-NEXT: # kill: def $ax killed $ax killed $eax
; SSE2-NEXT: retq
;
Modified: llvm/trunk/test/CodeGen/X86/vector-shift-lshr-128.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-shift-lshr-128.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/vector-shift-lshr-128.ll (original)
+++ llvm/trunk/test/CodeGen/X86/vector-shift-lshr-128.ll Sun Oct 14 18:51:58 2018
@@ -373,8 +373,8 @@ define <16 x i8> @var_shift_v16i8(<16 x
; SSE2-NEXT: movdqa %xmm3, %xmm4
; SSE2-NEXT: pandn %xmm0, %xmm4
; SSE2-NEXT: psrlw $4, %xmm0
-; SSE2-NEXT: pand {{.*}}(%rip), %xmm0
; SSE2-NEXT: pand %xmm3, %xmm0
+; SSE2-NEXT: pand {{.*}}(%rip), %xmm0
; SSE2-NEXT: por %xmm4, %xmm0
; SSE2-NEXT: paddb %xmm1, %xmm1
; SSE2-NEXT: pxor %xmm3, %xmm3
@@ -382,16 +382,16 @@ define <16 x i8> @var_shift_v16i8(<16 x
; SSE2-NEXT: movdqa %xmm3, %xmm4
; SSE2-NEXT: pandn %xmm0, %xmm4
; SSE2-NEXT: psrlw $2, %xmm0
-; SSE2-NEXT: pand {{.*}}(%rip), %xmm0
; SSE2-NEXT: pand %xmm3, %xmm0
+; SSE2-NEXT: pand {{.*}}(%rip), %xmm0
; SSE2-NEXT: por %xmm4, %xmm0
; SSE2-NEXT: paddb %xmm1, %xmm1
; SSE2-NEXT: pcmpgtb %xmm1, %xmm2
; SSE2-NEXT: movdqa %xmm2, %xmm1
; SSE2-NEXT: pandn %xmm0, %xmm1
; SSE2-NEXT: psrlw $1, %xmm0
-; SSE2-NEXT: pand {{.*}}(%rip), %xmm0
; SSE2-NEXT: pand %xmm2, %xmm0
+; SSE2-NEXT: pand {{.*}}(%rip), %xmm0
; SSE2-NEXT: por %xmm1, %xmm0
; SSE2-NEXT: retq
;
@@ -488,8 +488,8 @@ define <16 x i8> @var_shift_v16i8(<16 x
; X32-SSE-NEXT: movdqa %xmm3, %xmm4
; X32-SSE-NEXT: pandn %xmm0, %xmm4
; X32-SSE-NEXT: psrlw $4, %xmm0
-; X32-SSE-NEXT: pand {{\.LCPI.*}}, %xmm0
; X32-SSE-NEXT: pand %xmm3, %xmm0
+; X32-SSE-NEXT: pand {{\.LCPI.*}}, %xmm0
; X32-SSE-NEXT: por %xmm4, %xmm0
; X32-SSE-NEXT: paddb %xmm1, %xmm1
; X32-SSE-NEXT: pxor %xmm3, %xmm3
@@ -497,16 +497,16 @@ define <16 x i8> @var_shift_v16i8(<16 x
; X32-SSE-NEXT: movdqa %xmm3, %xmm4
; X32-SSE-NEXT: pandn %xmm0, %xmm4
; X32-SSE-NEXT: psrlw $2, %xmm0
-; X32-SSE-NEXT: pand {{\.LCPI.*}}, %xmm0
; X32-SSE-NEXT: pand %xmm3, %xmm0
+; X32-SSE-NEXT: pand {{\.LCPI.*}}, %xmm0
; X32-SSE-NEXT: por %xmm4, %xmm0
; X32-SSE-NEXT: paddb %xmm1, %xmm1
; X32-SSE-NEXT: pcmpgtb %xmm1, %xmm2
; X32-SSE-NEXT: movdqa %xmm2, %xmm1
; X32-SSE-NEXT: pandn %xmm0, %xmm1
; X32-SSE-NEXT: psrlw $1, %xmm0
-; X32-SSE-NEXT: pand {{\.LCPI.*}}, %xmm0
; X32-SSE-NEXT: pand %xmm2, %xmm0
+; X32-SSE-NEXT: pand {{\.LCPI.*}}, %xmm0
; X32-SSE-NEXT: por %xmm1, %xmm0
; X32-SSE-NEXT: retl
%shift = lshr <16 x i8> %a, %b
Modified: llvm/trunk/test/CodeGen/X86/vector-shift-shl-128.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-shift-shl-128.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/vector-shift-shl-128.ll (original)
+++ llvm/trunk/test/CodeGen/X86/vector-shift-shl-128.ll Sun Oct 14 18:51:58 2018
@@ -295,8 +295,8 @@ define <16 x i8> @var_shift_v16i8(<16 x
; SSE2-NEXT: movdqa %xmm3, %xmm4
; SSE2-NEXT: pandn %xmm0, %xmm4
; SSE2-NEXT: psllw $4, %xmm0
-; SSE2-NEXT: pand {{.*}}(%rip), %xmm0
; SSE2-NEXT: pand %xmm3, %xmm0
+; SSE2-NEXT: pand {{.*}}(%rip), %xmm0
; SSE2-NEXT: por %xmm4, %xmm0
; SSE2-NEXT: paddb %xmm1, %xmm1
; SSE2-NEXT: pxor %xmm3, %xmm3
@@ -304,8 +304,8 @@ define <16 x i8> @var_shift_v16i8(<16 x
; SSE2-NEXT: movdqa %xmm3, %xmm4
; SSE2-NEXT: pandn %xmm0, %xmm4
; SSE2-NEXT: psllw $2, %xmm0
-; SSE2-NEXT: pand {{.*}}(%rip), %xmm0
; SSE2-NEXT: pand %xmm3, %xmm0
+; SSE2-NEXT: pand {{.*}}(%rip), %xmm0
; SSE2-NEXT: por %xmm4, %xmm0
; SSE2-NEXT: paddb %xmm1, %xmm1
; SSE2-NEXT: pcmpgtb %xmm1, %xmm2
@@ -405,8 +405,8 @@ define <16 x i8> @var_shift_v16i8(<16 x
; X32-SSE-NEXT: movdqa %xmm3, %xmm4
; X32-SSE-NEXT: pandn %xmm0, %xmm4
; X32-SSE-NEXT: psllw $4, %xmm0
-; X32-SSE-NEXT: pand {{\.LCPI.*}}, %xmm0
; X32-SSE-NEXT: pand %xmm3, %xmm0
+; X32-SSE-NEXT: pand {{\.LCPI.*}}, %xmm0
; X32-SSE-NEXT: por %xmm4, %xmm0
; X32-SSE-NEXT: paddb %xmm1, %xmm1
; X32-SSE-NEXT: pxor %xmm3, %xmm3
@@ -414,8 +414,8 @@ define <16 x i8> @var_shift_v16i8(<16 x
; X32-SSE-NEXT: movdqa %xmm3, %xmm4
; X32-SSE-NEXT: pandn %xmm0, %xmm4
; X32-SSE-NEXT: psllw $2, %xmm0
-; X32-SSE-NEXT: pand {{\.LCPI.*}}, %xmm0
; X32-SSE-NEXT: pand %xmm3, %xmm0
+; X32-SSE-NEXT: pand {{\.LCPI.*}}, %xmm0
; X32-SSE-NEXT: por %xmm4, %xmm0
; X32-SSE-NEXT: paddb %xmm1, %xmm1
; X32-SSE-NEXT: pcmpgtb %xmm1, %xmm2
Modified: llvm/trunk/test/CodeGen/X86/vector-shuffle-256-v16.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-shuffle-256-v16.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/vector-shuffle-256-v16.ll (original)
+++ llvm/trunk/test/CodeGen/X86/vector-shuffle-256-v16.ll Sun Oct 14 18:51:58 2018
@@ -846,7 +846,7 @@ define <16 x i16> @shuffle_v16i16_07_00_
define <16 x i16> @shuffle_v16i16_00_17_02_19_04_21_06_23_08_25_10_27_12_29_14_31(<16 x i16> %a, <16 x i16> %b) {
; AVX1-LABEL: shuffle_v16i16_00_17_02_19_04_21_06_23_08_25_10_27_12_29_14_31:
; AVX1: # %bb.0:
-; AVX1-NEXT: vmovaps {{.*#+}} ymm2 = [65535,0,65535,0,65535,0,65535,0,65535,0,65535,0,65535,0,65535,0]
+; AVX1-NEXT: vbroadcastss {{.*#+}} ymm2 = [9.18340949E-41,9.18340949E-41,9.18340949E-41,9.18340949E-41,9.18340949E-41,9.18340949E-41,9.18340949E-41,9.18340949E-41]
; AVX1-NEXT: vandnps %ymm1, %ymm2, %ymm1
; AVX1-NEXT: vandps %ymm2, %ymm0, %ymm0
; AVX1-NEXT: vorps %ymm1, %ymm0, %ymm0
@@ -863,7 +863,7 @@ define <16 x i16> @shuffle_v16i16_00_17_
define <16 x i16> @shuffle_v16i16_16_01_18_03_20_05_22_07_24_09_26_11_28_13_30_15(<16 x i16> %a, <16 x i16> %b) {
; AVX1-LABEL: shuffle_v16i16_16_01_18_03_20_05_22_07_24_09_26_11_28_13_30_15:
; AVX1: # %bb.0:
-; AVX1-NEXT: vmovaps {{.*#+}} ymm2 = [65535,0,65535,0,65535,0,65535,0,65535,0,65535,0,65535,0,65535,0]
+; AVX1-NEXT: vbroadcastss {{.*#+}} ymm2 = [9.18340949E-41,9.18340949E-41,9.18340949E-41,9.18340949E-41,9.18340949E-41,9.18340949E-41,9.18340949E-41,9.18340949E-41]
; AVX1-NEXT: vandnps %ymm0, %ymm2, %ymm0
; AVX1-NEXT: vandps %ymm2, %ymm1, %ymm1
; AVX1-NEXT: vorps %ymm0, %ymm1, %ymm0
Modified: llvm/trunk/test/CodeGen/X86/vector-trunc-math.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-trunc-math.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/vector-trunc-math.ll (original)
+++ llvm/trunk/test/CodeGen/X86/vector-trunc-math.ll Sun Oct 14 18:51:58 2018
@@ -233,7 +233,8 @@ define <16 x i8> @trunc_add_v16i64_v16i8
; AVX1-NEXT: vextractf128 $1, %ymm7, %xmm7
; AVX1-NEXT: vextractf128 $1, %ymm3, %xmm3
; AVX1-NEXT: vpaddq %xmm7, %xmm3, %xmm3
-; AVX1-NEXT: vmovdqa {{.*#+}} xmm7 = [255,0,0,0,0,0,0,0,255,0,0,0,0,0,0,0]
+; AVX1-NEXT: vmovddup {{.*#+}} xmm7 = [1.2598673968951787E-321,1.2598673968951787E-321]
+; AVX1-NEXT: # xmm7 = mem[0,0]
; AVX1-NEXT: vpand %xmm7, %xmm3, %xmm3
; AVX1-NEXT: vpand %xmm7, %xmm6, %xmm6
; AVX1-NEXT: vpackusdw %xmm3, %xmm6, %xmm3
@@ -347,7 +348,7 @@ define <16 x i8> @trunc_add_v16i32_v16i8
; AVX1-NEXT: vextractf128 $1, %ymm3, %xmm3
; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm1
; AVX1-NEXT: vpaddd %xmm3, %xmm1, %xmm1
-; AVX1-NEXT: vmovdqa {{.*#+}} xmm3 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0]
+; AVX1-NEXT: vbroadcastss {{.*#+}} xmm3 = [3.57331108E-43,3.57331108E-43,3.57331108E-43,3.57331108E-43]
; AVX1-NEXT: vpand %xmm3, %xmm1, %xmm1
; AVX1-NEXT: vpand %xmm3, %xmm2, %xmm2
; AVX1-NEXT: vpackusdw %xmm1, %xmm2, %xmm1
@@ -680,22 +681,23 @@ define <16 x i8> @trunc_add_const_v16i64
; AVX1-LABEL: trunc_add_const_v16i64_v16i8:
; AVX1: # %bb.0:
; AVX1-NEXT: vextractf128 $1, %ymm3, %xmm4
-; AVX1-NEXT: vmovdqa {{.*#+}} xmm5 = [255,0,0,0,0,0,0,0,255,0,0,0,0,0,0,0]
-; AVX1-NEXT: vpand %xmm5, %xmm4, %xmm4
-; AVX1-NEXT: vpand %xmm5, %xmm3, %xmm3
+; AVX1-NEXT: vmovddup {{.*#+}} xmm5 = [1.2598673968951787E-321,1.2598673968951787E-321]
+; AVX1-NEXT: # xmm5 = mem[0,0]
+; AVX1-NEXT: vandpd %xmm5, %xmm4, %xmm4
+; AVX1-NEXT: vandpd %xmm5, %xmm3, %xmm3
; AVX1-NEXT: vpackusdw %xmm4, %xmm3, %xmm3
; AVX1-NEXT: vextractf128 $1, %ymm2, %xmm4
-; AVX1-NEXT: vpand %xmm5, %xmm4, %xmm4
-; AVX1-NEXT: vpand %xmm5, %xmm2, %xmm2
+; AVX1-NEXT: vandpd %xmm5, %xmm4, %xmm4
+; AVX1-NEXT: vandpd %xmm5, %xmm2, %xmm2
; AVX1-NEXT: vpackusdw %xmm4, %xmm2, %xmm2
; AVX1-NEXT: vpackusdw %xmm3, %xmm2, %xmm2
; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm3
-; AVX1-NEXT: vpand %xmm5, %xmm3, %xmm3
-; AVX1-NEXT: vpand %xmm5, %xmm1, %xmm1
+; AVX1-NEXT: vandpd %xmm5, %xmm3, %xmm3
+; AVX1-NEXT: vandpd %xmm5, %xmm1, %xmm1
; AVX1-NEXT: vpackusdw %xmm3, %xmm1, %xmm1
; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm3
-; AVX1-NEXT: vpand %xmm5, %xmm3, %xmm3
-; AVX1-NEXT: vpand %xmm5, %xmm0, %xmm0
+; AVX1-NEXT: vandpd %xmm5, %xmm3, %xmm3
+; AVX1-NEXT: vandpd %xmm5, %xmm0, %xmm0
; AVX1-NEXT: vpackusdw %xmm3, %xmm0, %xmm0
; AVX1-NEXT: vpackusdw %xmm1, %xmm0, %xmm0
; AVX1-NEXT: vpackuswb %xmm2, %xmm0, %xmm0
@@ -781,13 +783,13 @@ define <16 x i8> @trunc_add_const_v16i32
; AVX1-LABEL: trunc_add_const_v16i32_v16i8:
; AVX1: # %bb.0:
; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm2
-; AVX1-NEXT: vmovdqa {{.*#+}} xmm3 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0]
-; AVX1-NEXT: vpand %xmm3, %xmm2, %xmm2
-; AVX1-NEXT: vpand %xmm3, %xmm1, %xmm1
+; AVX1-NEXT: vbroadcastss {{.*#+}} xmm3 = [3.57331108E-43,3.57331108E-43,3.57331108E-43,3.57331108E-43]
+; AVX1-NEXT: vandps %xmm3, %xmm2, %xmm2
+; AVX1-NEXT: vandps %xmm3, %xmm1, %xmm1
; AVX1-NEXT: vpackusdw %xmm2, %xmm1, %xmm1
; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm2
-; AVX1-NEXT: vpand %xmm3, %xmm2, %xmm2
-; AVX1-NEXT: vpand %xmm3, %xmm0, %xmm0
+; AVX1-NEXT: vandps %xmm3, %xmm2, %xmm2
+; AVX1-NEXT: vandps %xmm3, %xmm0, %xmm0
; AVX1-NEXT: vpackusdw %xmm2, %xmm0, %xmm0
; AVX1-NEXT: vpackuswb %xmm1, %xmm0, %xmm0
; AVX1-NEXT: vpaddb {{.*}}(%rip), %xmm0, %xmm0
@@ -1106,7 +1108,8 @@ define <16 x i8> @trunc_sub_v16i64_v16i8
; AVX1-NEXT: vextractf128 $1, %ymm7, %xmm7
; AVX1-NEXT: vextractf128 $1, %ymm3, %xmm3
; AVX1-NEXT: vpsubq %xmm7, %xmm3, %xmm3
-; AVX1-NEXT: vmovdqa {{.*#+}} xmm7 = [255,0,0,0,0,0,0,0,255,0,0,0,0,0,0,0]
+; AVX1-NEXT: vmovddup {{.*#+}} xmm7 = [1.2598673968951787E-321,1.2598673968951787E-321]
+; AVX1-NEXT: # xmm7 = mem[0,0]
; AVX1-NEXT: vpand %xmm7, %xmm3, %xmm3
; AVX1-NEXT: vpand %xmm7, %xmm6, %xmm6
; AVX1-NEXT: vpackusdw %xmm3, %xmm6, %xmm3
@@ -1220,7 +1223,7 @@ define <16 x i8> @trunc_sub_v16i32_v16i8
; AVX1-NEXT: vextractf128 $1, %ymm3, %xmm3
; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm1
; AVX1-NEXT: vpsubd %xmm3, %xmm1, %xmm1
-; AVX1-NEXT: vmovdqa {{.*#+}} xmm3 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0]
+; AVX1-NEXT: vbroadcastss {{.*#+}} xmm3 = [3.57331108E-43,3.57331108E-43,3.57331108E-43,3.57331108E-43]
; AVX1-NEXT: vpand %xmm3, %xmm1, %xmm1
; AVX1-NEXT: vpand %xmm3, %xmm2, %xmm2
; AVX1-NEXT: vpackusdw %xmm1, %xmm2, %xmm1
@@ -1575,7 +1578,8 @@ define <16 x i8> @trunc_sub_const_v16i64
; AVX1-NEXT: vpsubq {{.*}}(%rip), %xmm3, %xmm7
; AVX1-NEXT: vextractf128 $1, %ymm3, %xmm3
; AVX1-NEXT: vpsubq {{.*}}(%rip), %xmm3, %xmm3
-; AVX1-NEXT: vmovdqa {{.*#+}} xmm4 = [255,0,0,0,0,0,0,0,255,0,0,0,0,0,0,0]
+; AVX1-NEXT: vmovddup {{.*#+}} xmm4 = [1.2598673968951787E-321,1.2598673968951787E-321]
+; AVX1-NEXT: # xmm4 = mem[0,0]
; AVX1-NEXT: vpand %xmm4, %xmm3, %xmm3
; AVX1-NEXT: vpand %xmm4, %xmm7, %xmm7
; AVX1-NEXT: vpackusdw %xmm3, %xmm7, %xmm3
@@ -1687,7 +1691,7 @@ define <16 x i8> @trunc_sub_const_v16i32
; AVX1-NEXT: vpsubd {{.*}}(%rip), %xmm1, %xmm3
; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm1
; AVX1-NEXT: vpsubd {{.*}}(%rip), %xmm1, %xmm1
-; AVX1-NEXT: vmovdqa {{.*#+}} xmm4 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0]
+; AVX1-NEXT: vbroadcastss {{.*#+}} xmm4 = [3.57331108E-43,3.57331108E-43,3.57331108E-43,3.57331108E-43]
; AVX1-NEXT: vpand %xmm4, %xmm1, %xmm1
; AVX1-NEXT: vpand %xmm4, %xmm3, %xmm3
; AVX1-NEXT: vpackusdw %xmm1, %xmm3, %xmm1
@@ -2275,7 +2279,8 @@ define <16 x i8> @trunc_mul_v16i64_v16i8
; AVX1-NEXT: vpsllq $32, %xmm6, %xmm6
; AVX1-NEXT: vpmuludq %xmm4, %xmm3, %xmm3
; AVX1-NEXT: vpaddq %xmm6, %xmm3, %xmm3
-; AVX1-NEXT: vmovdqa {{.*#+}} xmm4 = [255,0,0,0,0,0,0,0,255,0,0,0,0,0,0,0]
+; AVX1-NEXT: vmovddup {{.*#+}} xmm4 = [1.2598673968951787E-321,1.2598673968951787E-321]
+; AVX1-NEXT: # xmm4 = mem[0,0]
; AVX1-NEXT: vpand %xmm4, %xmm3, %xmm3
; AVX1-NEXT: vpand %xmm4, %xmm2, %xmm2
; AVX1-NEXT: vpackusdw %xmm3, %xmm2, %xmm2
@@ -2451,7 +2456,7 @@ define <16 x i8> @trunc_mul_v16i32_v16i8
; AVX1-NEXT: vextractf128 $1, %ymm3, %xmm3
; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm1
; AVX1-NEXT: vpmulld %xmm3, %xmm1, %xmm1
-; AVX1-NEXT: vmovdqa {{.*#+}} xmm3 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0]
+; AVX1-NEXT: vbroadcastss {{.*#+}} xmm3 = [3.57331108E-43,3.57331108E-43,3.57331108E-43,3.57331108E-43]
; AVX1-NEXT: vpand %xmm3, %xmm1, %xmm1
; AVX1-NEXT: vpand %xmm3, %xmm2, %xmm2
; AVX1-NEXT: vpackusdw %xmm1, %xmm2, %xmm1
@@ -2909,7 +2914,8 @@ define <16 x i8> @trunc_mul_const_v16i64
; AVX1-NEXT: vpmuludq %xmm6, %xmm3, %xmm3
; AVX1-NEXT: vpsllq $32, %xmm3, %xmm3
; AVX1-NEXT: vpaddq %xmm3, %xmm7, %xmm3
-; AVX1-NEXT: vmovdqa {{.*#+}} xmm6 = [255,0,0,0,0,0,0,0,255,0,0,0,0,0,0,0]
+; AVX1-NEXT: vmovddup {{.*#+}} xmm6 = [1.2598673968951787E-321,1.2598673968951787E-321]
+; AVX1-NEXT: # xmm6 = mem[0,0]
; AVX1-NEXT: vpand %xmm6, %xmm3, %xmm3
; AVX1-NEXT: vpand %xmm6, %xmm0, %xmm0
; AVX1-NEXT: vpackusdw %xmm3, %xmm0, %xmm0
@@ -3049,7 +3055,7 @@ define <16 x i8> @trunc_mul_const_v16i32
; AVX1-NEXT: vpmulld {{.*}}(%rip), %xmm1, %xmm3
; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm1
; AVX1-NEXT: vpmulld {{.*}}(%rip), %xmm1, %xmm1
-; AVX1-NEXT: vmovdqa {{.*#+}} xmm4 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0]
+; AVX1-NEXT: vbroadcastss {{.*#+}} xmm4 = [3.57331108E-43,3.57331108E-43,3.57331108E-43,3.57331108E-43]
; AVX1-NEXT: vpand %xmm4, %xmm1, %xmm1
; AVX1-NEXT: vpand %xmm4, %xmm3, %xmm3
; AVX1-NEXT: vpackusdw %xmm1, %xmm3, %xmm1
@@ -3351,27 +3357,28 @@ define <16 x i8> @trunc_and_v16i64_v16i8
;
; AVX1-LABEL: trunc_and_v16i64_v16i8:
; AVX1: # %bb.0:
-; AVX1-NEXT: vandps %ymm4, %ymm0, %ymm0
-; AVX1-NEXT: vandps %ymm5, %ymm1, %ymm1
-; AVX1-NEXT: vandps %ymm6, %ymm2, %ymm2
-; AVX1-NEXT: vandps %ymm7, %ymm3, %ymm3
+; AVX1-NEXT: vandpd %ymm4, %ymm0, %ymm0
+; AVX1-NEXT: vandpd %ymm5, %ymm1, %ymm1
+; AVX1-NEXT: vandpd %ymm6, %ymm2, %ymm2
+; AVX1-NEXT: vandpd %ymm7, %ymm3, %ymm3
; AVX1-NEXT: vextractf128 $1, %ymm3, %xmm4
-; AVX1-NEXT: vmovaps {{.*#+}} xmm5 = [255,0,0,0,0,0,0,0,255,0,0,0,0,0,0,0]
-; AVX1-NEXT: vandps %xmm5, %xmm4, %xmm4
-; AVX1-NEXT: vandps %xmm5, %xmm3, %xmm3
+; AVX1-NEXT: vmovddup {{.*#+}} xmm5 = [1.2598673968951787E-321,1.2598673968951787E-321]
+; AVX1-NEXT: # xmm5 = mem[0,0]
+; AVX1-NEXT: vandpd %xmm5, %xmm4, %xmm4
+; AVX1-NEXT: vandpd %xmm5, %xmm3, %xmm3
; AVX1-NEXT: vpackusdw %xmm4, %xmm3, %xmm3
; AVX1-NEXT: vextractf128 $1, %ymm2, %xmm4
-; AVX1-NEXT: vandps %xmm5, %xmm4, %xmm4
-; AVX1-NEXT: vandps %xmm5, %xmm2, %xmm2
+; AVX1-NEXT: vandpd %xmm5, %xmm4, %xmm4
+; AVX1-NEXT: vandpd %xmm5, %xmm2, %xmm2
; AVX1-NEXT: vpackusdw %xmm4, %xmm2, %xmm2
; AVX1-NEXT: vpackusdw %xmm3, %xmm2, %xmm2
; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm3
-; AVX1-NEXT: vandps %xmm5, %xmm3, %xmm3
-; AVX1-NEXT: vandps %xmm5, %xmm1, %xmm1
+; AVX1-NEXT: vandpd %xmm5, %xmm3, %xmm3
+; AVX1-NEXT: vandpd %xmm5, %xmm1, %xmm1
; AVX1-NEXT: vpackusdw %xmm3, %xmm1, %xmm1
; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm3
-; AVX1-NEXT: vandps %xmm5, %xmm3, %xmm3
-; AVX1-NEXT: vandps %xmm5, %xmm0, %xmm0
+; AVX1-NEXT: vandpd %xmm5, %xmm3, %xmm3
+; AVX1-NEXT: vandpd %xmm5, %xmm0, %xmm0
; AVX1-NEXT: vpackusdw %xmm3, %xmm0, %xmm0
; AVX1-NEXT: vpackusdw %xmm1, %xmm0, %xmm0
; AVX1-NEXT: vpackuswb %xmm2, %xmm0, %xmm0
@@ -3468,7 +3475,7 @@ define <16 x i8> @trunc_and_v16i32_v16i8
; AVX1-NEXT: vandps %ymm2, %ymm0, %ymm0
; AVX1-NEXT: vandps %ymm3, %ymm1, %ymm1
; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm2
-; AVX1-NEXT: vmovaps {{.*#+}} xmm3 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0]
+; AVX1-NEXT: vbroadcastss {{.*#+}} xmm3 = [3.57331108E-43,3.57331108E-43,3.57331108E-43,3.57331108E-43]
; AVX1-NEXT: vandps %xmm3, %xmm2, %xmm2
; AVX1-NEXT: vandps %xmm3, %xmm1, %xmm1
; AVX1-NEXT: vpackusdw %xmm2, %xmm1, %xmm1
@@ -3751,22 +3758,23 @@ define <16 x i8> @trunc_and_const_v16i64
; AVX1-LABEL: trunc_and_const_v16i64_v16i8:
; AVX1: # %bb.0:
; AVX1-NEXT: vextractf128 $1, %ymm3, %xmm4
-; AVX1-NEXT: vmovdqa {{.*#+}} xmm5 = [255,0,0,0,0,0,0,0,255,0,0,0,0,0,0,0]
-; AVX1-NEXT: vpand %xmm5, %xmm4, %xmm4
-; AVX1-NEXT: vpand %xmm5, %xmm3, %xmm3
+; AVX1-NEXT: vmovddup {{.*#+}} xmm5 = [1.2598673968951787E-321,1.2598673968951787E-321]
+; AVX1-NEXT: # xmm5 = mem[0,0]
+; AVX1-NEXT: vandpd %xmm5, %xmm4, %xmm4
+; AVX1-NEXT: vandpd %xmm5, %xmm3, %xmm3
; AVX1-NEXT: vpackusdw %xmm4, %xmm3, %xmm3
; AVX1-NEXT: vextractf128 $1, %ymm2, %xmm4
-; AVX1-NEXT: vpand %xmm5, %xmm4, %xmm4
-; AVX1-NEXT: vpand %xmm5, %xmm2, %xmm2
+; AVX1-NEXT: vandpd %xmm5, %xmm4, %xmm4
+; AVX1-NEXT: vandpd %xmm5, %xmm2, %xmm2
; AVX1-NEXT: vpackusdw %xmm4, %xmm2, %xmm2
; AVX1-NEXT: vpackusdw %xmm3, %xmm2, %xmm2
; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm3
-; AVX1-NEXT: vpand %xmm5, %xmm3, %xmm3
-; AVX1-NEXT: vpand %xmm5, %xmm1, %xmm1
+; AVX1-NEXT: vandpd %xmm5, %xmm3, %xmm3
+; AVX1-NEXT: vandpd %xmm5, %xmm1, %xmm1
; AVX1-NEXT: vpackusdw %xmm3, %xmm1, %xmm1
; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm3
-; AVX1-NEXT: vpand %xmm5, %xmm3, %xmm3
-; AVX1-NEXT: vpand %xmm5, %xmm0, %xmm0
+; AVX1-NEXT: vandpd %xmm5, %xmm3, %xmm3
+; AVX1-NEXT: vandpd %xmm5, %xmm0, %xmm0
; AVX1-NEXT: vpackusdw %xmm3, %xmm0, %xmm0
; AVX1-NEXT: vpackusdw %xmm1, %xmm0, %xmm0
; AVX1-NEXT: vpackuswb %xmm2, %xmm0, %xmm0
@@ -3852,13 +3860,13 @@ define <16 x i8> @trunc_and_const_v16i32
; AVX1-LABEL: trunc_and_const_v16i32_v16i8:
; AVX1: # %bb.0:
; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm2
-; AVX1-NEXT: vmovdqa {{.*#+}} xmm3 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0]
-; AVX1-NEXT: vpand %xmm3, %xmm2, %xmm2
-; AVX1-NEXT: vpand %xmm3, %xmm1, %xmm1
+; AVX1-NEXT: vbroadcastss {{.*#+}} xmm3 = [3.57331108E-43,3.57331108E-43,3.57331108E-43,3.57331108E-43]
+; AVX1-NEXT: vandps %xmm3, %xmm2, %xmm2
+; AVX1-NEXT: vandps %xmm3, %xmm1, %xmm1
; AVX1-NEXT: vpackusdw %xmm2, %xmm1, %xmm1
; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm2
-; AVX1-NEXT: vpand %xmm3, %xmm2, %xmm2
-; AVX1-NEXT: vpand %xmm3, %xmm0, %xmm0
+; AVX1-NEXT: vandps %xmm3, %xmm2, %xmm2
+; AVX1-NEXT: vandps %xmm3, %xmm0, %xmm0
; AVX1-NEXT: vpackusdw %xmm2, %xmm0, %xmm0
; AVX1-NEXT: vpackuswb %xmm1, %xmm0, %xmm0
; AVX1-NEXT: vpand {{.*}}(%rip), %xmm0, %xmm0
@@ -4153,27 +4161,28 @@ define <16 x i8> @trunc_xor_v16i64_v16i8
;
; AVX1-LABEL: trunc_xor_v16i64_v16i8:
; AVX1: # %bb.0:
-; AVX1-NEXT: vxorps %ymm4, %ymm0, %ymm0
-; AVX1-NEXT: vxorps %ymm5, %ymm1, %ymm1
-; AVX1-NEXT: vxorps %ymm6, %ymm2, %ymm2
-; AVX1-NEXT: vxorps %ymm7, %ymm3, %ymm3
+; AVX1-NEXT: vxorpd %ymm4, %ymm0, %ymm0
+; AVX1-NEXT: vxorpd %ymm5, %ymm1, %ymm1
+; AVX1-NEXT: vxorpd %ymm6, %ymm2, %ymm2
+; AVX1-NEXT: vxorpd %ymm7, %ymm3, %ymm3
; AVX1-NEXT: vextractf128 $1, %ymm3, %xmm4
-; AVX1-NEXT: vmovaps {{.*#+}} xmm5 = [255,0,0,0,0,0,0,0,255,0,0,0,0,0,0,0]
-; AVX1-NEXT: vandps %xmm5, %xmm4, %xmm4
-; AVX1-NEXT: vandps %xmm5, %xmm3, %xmm3
+; AVX1-NEXT: vmovddup {{.*#+}} xmm5 = [1.2598673968951787E-321,1.2598673968951787E-321]
+; AVX1-NEXT: # xmm5 = mem[0,0]
+; AVX1-NEXT: vandpd %xmm5, %xmm4, %xmm4
+; AVX1-NEXT: vandpd %xmm5, %xmm3, %xmm3
; AVX1-NEXT: vpackusdw %xmm4, %xmm3, %xmm3
; AVX1-NEXT: vextractf128 $1, %ymm2, %xmm4
-; AVX1-NEXT: vandps %xmm5, %xmm4, %xmm4
-; AVX1-NEXT: vandps %xmm5, %xmm2, %xmm2
+; AVX1-NEXT: vandpd %xmm5, %xmm4, %xmm4
+; AVX1-NEXT: vandpd %xmm5, %xmm2, %xmm2
; AVX1-NEXT: vpackusdw %xmm4, %xmm2, %xmm2
; AVX1-NEXT: vpackusdw %xmm3, %xmm2, %xmm2
; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm3
-; AVX1-NEXT: vandps %xmm5, %xmm3, %xmm3
-; AVX1-NEXT: vandps %xmm5, %xmm1, %xmm1
+; AVX1-NEXT: vandpd %xmm5, %xmm3, %xmm3
+; AVX1-NEXT: vandpd %xmm5, %xmm1, %xmm1
; AVX1-NEXT: vpackusdw %xmm3, %xmm1, %xmm1
; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm3
-; AVX1-NEXT: vandps %xmm5, %xmm3, %xmm3
-; AVX1-NEXT: vandps %xmm5, %xmm0, %xmm0
+; AVX1-NEXT: vandpd %xmm5, %xmm3, %xmm3
+; AVX1-NEXT: vandpd %xmm5, %xmm0, %xmm0
; AVX1-NEXT: vpackusdw %xmm3, %xmm0, %xmm0
; AVX1-NEXT: vpackusdw %xmm1, %xmm0, %xmm0
; AVX1-NEXT: vpackuswb %xmm2, %xmm0, %xmm0
@@ -4270,7 +4279,7 @@ define <16 x i8> @trunc_xor_v16i32_v16i8
; AVX1-NEXT: vxorps %ymm2, %ymm0, %ymm0
; AVX1-NEXT: vxorps %ymm3, %ymm1, %ymm1
; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm2
-; AVX1-NEXT: vmovaps {{.*#+}} xmm3 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0]
+; AVX1-NEXT: vbroadcastss {{.*#+}} xmm3 = [3.57331108E-43,3.57331108E-43,3.57331108E-43,3.57331108E-43]
; AVX1-NEXT: vandps %xmm3, %xmm2, %xmm2
; AVX1-NEXT: vandps %xmm3, %xmm1, %xmm1
; AVX1-NEXT: vpackusdw %xmm2, %xmm1, %xmm1
@@ -4553,22 +4562,23 @@ define <16 x i8> @trunc_xor_const_v16i64
; AVX1-LABEL: trunc_xor_const_v16i64_v16i8:
; AVX1: # %bb.0:
; AVX1-NEXT: vextractf128 $1, %ymm3, %xmm4
-; AVX1-NEXT: vmovdqa {{.*#+}} xmm5 = [255,0,0,0,0,0,0,0,255,0,0,0,0,0,0,0]
-; AVX1-NEXT: vpand %xmm5, %xmm4, %xmm4
-; AVX1-NEXT: vpand %xmm5, %xmm3, %xmm3
+; AVX1-NEXT: vmovddup {{.*#+}} xmm5 = [1.2598673968951787E-321,1.2598673968951787E-321]
+; AVX1-NEXT: # xmm5 = mem[0,0]
+; AVX1-NEXT: vandpd %xmm5, %xmm4, %xmm4
+; AVX1-NEXT: vandpd %xmm5, %xmm3, %xmm3
; AVX1-NEXT: vpackusdw %xmm4, %xmm3, %xmm3
; AVX1-NEXT: vextractf128 $1, %ymm2, %xmm4
-; AVX1-NEXT: vpand %xmm5, %xmm4, %xmm4
-; AVX1-NEXT: vpand %xmm5, %xmm2, %xmm2
+; AVX1-NEXT: vandpd %xmm5, %xmm4, %xmm4
+; AVX1-NEXT: vandpd %xmm5, %xmm2, %xmm2
; AVX1-NEXT: vpackusdw %xmm4, %xmm2, %xmm2
; AVX1-NEXT: vpackusdw %xmm3, %xmm2, %xmm2
; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm3
-; AVX1-NEXT: vpand %xmm5, %xmm3, %xmm3
-; AVX1-NEXT: vpand %xmm5, %xmm1, %xmm1
+; AVX1-NEXT: vandpd %xmm5, %xmm3, %xmm3
+; AVX1-NEXT: vandpd %xmm5, %xmm1, %xmm1
; AVX1-NEXT: vpackusdw %xmm3, %xmm1, %xmm1
; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm3
-; AVX1-NEXT: vpand %xmm5, %xmm3, %xmm3
-; AVX1-NEXT: vpand %xmm5, %xmm0, %xmm0
+; AVX1-NEXT: vandpd %xmm5, %xmm3, %xmm3
+; AVX1-NEXT: vandpd %xmm5, %xmm0, %xmm0
; AVX1-NEXT: vpackusdw %xmm3, %xmm0, %xmm0
; AVX1-NEXT: vpackusdw %xmm1, %xmm0, %xmm0
; AVX1-NEXT: vpackuswb %xmm2, %xmm0, %xmm0
@@ -4654,13 +4664,13 @@ define <16 x i8> @trunc_xor_const_v16i32
; AVX1-LABEL: trunc_xor_const_v16i32_v16i8:
; AVX1: # %bb.0:
; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm2
-; AVX1-NEXT: vmovdqa {{.*#+}} xmm3 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0]
-; AVX1-NEXT: vpand %xmm3, %xmm2, %xmm2
-; AVX1-NEXT: vpand %xmm3, %xmm1, %xmm1
+; AVX1-NEXT: vbroadcastss {{.*#+}} xmm3 = [3.57331108E-43,3.57331108E-43,3.57331108E-43,3.57331108E-43]
+; AVX1-NEXT: vandps %xmm3, %xmm2, %xmm2
+; AVX1-NEXT: vandps %xmm3, %xmm1, %xmm1
; AVX1-NEXT: vpackusdw %xmm2, %xmm1, %xmm1
; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm2
-; AVX1-NEXT: vpand %xmm3, %xmm2, %xmm2
-; AVX1-NEXT: vpand %xmm3, %xmm0, %xmm0
+; AVX1-NEXT: vandps %xmm3, %xmm2, %xmm2
+; AVX1-NEXT: vandps %xmm3, %xmm0, %xmm0
; AVX1-NEXT: vpackusdw %xmm2, %xmm0, %xmm0
; AVX1-NEXT: vpackuswb %xmm1, %xmm0, %xmm0
; AVX1-NEXT: vpxor {{.*}}(%rip), %xmm0, %xmm0
@@ -4955,27 +4965,28 @@ define <16 x i8> @trunc_or_v16i64_v16i8(
;
; AVX1-LABEL: trunc_or_v16i64_v16i8:
; AVX1: # %bb.0:
-; AVX1-NEXT: vorps %ymm4, %ymm0, %ymm0
-; AVX1-NEXT: vorps %ymm5, %ymm1, %ymm1
-; AVX1-NEXT: vorps %ymm6, %ymm2, %ymm2
-; AVX1-NEXT: vorps %ymm7, %ymm3, %ymm3
+; AVX1-NEXT: vorpd %ymm4, %ymm0, %ymm0
+; AVX1-NEXT: vorpd %ymm5, %ymm1, %ymm1
+; AVX1-NEXT: vorpd %ymm6, %ymm2, %ymm2
+; AVX1-NEXT: vorpd %ymm7, %ymm3, %ymm3
; AVX1-NEXT: vextractf128 $1, %ymm3, %xmm4
-; AVX1-NEXT: vmovaps {{.*#+}} xmm5 = [255,0,0,0,0,0,0,0,255,0,0,0,0,0,0,0]
-; AVX1-NEXT: vandps %xmm5, %xmm4, %xmm4
-; AVX1-NEXT: vandps %xmm5, %xmm3, %xmm3
+; AVX1-NEXT: vmovddup {{.*#+}} xmm5 = [1.2598673968951787E-321,1.2598673968951787E-321]
+; AVX1-NEXT: # xmm5 = mem[0,0]
+; AVX1-NEXT: vandpd %xmm5, %xmm4, %xmm4
+; AVX1-NEXT: vandpd %xmm5, %xmm3, %xmm3
; AVX1-NEXT: vpackusdw %xmm4, %xmm3, %xmm3
; AVX1-NEXT: vextractf128 $1, %ymm2, %xmm4
-; AVX1-NEXT: vandps %xmm5, %xmm4, %xmm4
-; AVX1-NEXT: vandps %xmm5, %xmm2, %xmm2
+; AVX1-NEXT: vandpd %xmm5, %xmm4, %xmm4
+; AVX1-NEXT: vandpd %xmm5, %xmm2, %xmm2
; AVX1-NEXT: vpackusdw %xmm4, %xmm2, %xmm2
; AVX1-NEXT: vpackusdw %xmm3, %xmm2, %xmm2
; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm3
-; AVX1-NEXT: vandps %xmm5, %xmm3, %xmm3
-; AVX1-NEXT: vandps %xmm5, %xmm1, %xmm1
+; AVX1-NEXT: vandpd %xmm5, %xmm3, %xmm3
+; AVX1-NEXT: vandpd %xmm5, %xmm1, %xmm1
; AVX1-NEXT: vpackusdw %xmm3, %xmm1, %xmm1
; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm3
-; AVX1-NEXT: vandps %xmm5, %xmm3, %xmm3
-; AVX1-NEXT: vandps %xmm5, %xmm0, %xmm0
+; AVX1-NEXT: vandpd %xmm5, %xmm3, %xmm3
+; AVX1-NEXT: vandpd %xmm5, %xmm0, %xmm0
; AVX1-NEXT: vpackusdw %xmm3, %xmm0, %xmm0
; AVX1-NEXT: vpackusdw %xmm1, %xmm0, %xmm0
; AVX1-NEXT: vpackuswb %xmm2, %xmm0, %xmm0
@@ -5072,7 +5083,7 @@ define <16 x i8> @trunc_or_v16i32_v16i8(
; AVX1-NEXT: vorps %ymm2, %ymm0, %ymm0
; AVX1-NEXT: vorps %ymm3, %ymm1, %ymm1
; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm2
-; AVX1-NEXT: vmovaps {{.*#+}} xmm3 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0]
+; AVX1-NEXT: vbroadcastss {{.*#+}} xmm3 = [3.57331108E-43,3.57331108E-43,3.57331108E-43,3.57331108E-43]
; AVX1-NEXT: vandps %xmm3, %xmm2, %xmm2
; AVX1-NEXT: vandps %xmm3, %xmm1, %xmm1
; AVX1-NEXT: vpackusdw %xmm2, %xmm1, %xmm1
@@ -5355,22 +5366,23 @@ define <16 x i8> @trunc_or_const_v16i64_
; AVX1-LABEL: trunc_or_const_v16i64_v16i8:
; AVX1: # %bb.0:
; AVX1-NEXT: vextractf128 $1, %ymm3, %xmm4
-; AVX1-NEXT: vmovdqa {{.*#+}} xmm5 = [255,0,0,0,0,0,0,0,255,0,0,0,0,0,0,0]
-; AVX1-NEXT: vpand %xmm5, %xmm4, %xmm4
-; AVX1-NEXT: vpand %xmm5, %xmm3, %xmm3
+; AVX1-NEXT: vmovddup {{.*#+}} xmm5 = [1.2598673968951787E-321,1.2598673968951787E-321]
+; AVX1-NEXT: # xmm5 = mem[0,0]
+; AVX1-NEXT: vandpd %xmm5, %xmm4, %xmm4
+; AVX1-NEXT: vandpd %xmm5, %xmm3, %xmm3
; AVX1-NEXT: vpackusdw %xmm4, %xmm3, %xmm3
; AVX1-NEXT: vextractf128 $1, %ymm2, %xmm4
-; AVX1-NEXT: vpand %xmm5, %xmm4, %xmm4
-; AVX1-NEXT: vpand %xmm5, %xmm2, %xmm2
+; AVX1-NEXT: vandpd %xmm5, %xmm4, %xmm4
+; AVX1-NEXT: vandpd %xmm5, %xmm2, %xmm2
; AVX1-NEXT: vpackusdw %xmm4, %xmm2, %xmm2
; AVX1-NEXT: vpackusdw %xmm3, %xmm2, %xmm2
; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm3
-; AVX1-NEXT: vpand %xmm5, %xmm3, %xmm3
-; AVX1-NEXT: vpand %xmm5, %xmm1, %xmm1
+; AVX1-NEXT: vandpd %xmm5, %xmm3, %xmm3
+; AVX1-NEXT: vandpd %xmm5, %xmm1, %xmm1
; AVX1-NEXT: vpackusdw %xmm3, %xmm1, %xmm1
; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm3
-; AVX1-NEXT: vpand %xmm5, %xmm3, %xmm3
-; AVX1-NEXT: vpand %xmm5, %xmm0, %xmm0
+; AVX1-NEXT: vandpd %xmm5, %xmm3, %xmm3
+; AVX1-NEXT: vandpd %xmm5, %xmm0, %xmm0
; AVX1-NEXT: vpackusdw %xmm3, %xmm0, %xmm0
; AVX1-NEXT: vpackusdw %xmm1, %xmm0, %xmm0
; AVX1-NEXT: vpackuswb %xmm2, %xmm0, %xmm0
@@ -5456,13 +5468,13 @@ define <16 x i8> @trunc_or_const_v16i32_
; AVX1-LABEL: trunc_or_const_v16i32_v16i8:
; AVX1: # %bb.0:
; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm2
-; AVX1-NEXT: vmovdqa {{.*#+}} xmm3 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0]
-; AVX1-NEXT: vpand %xmm3, %xmm2, %xmm2
-; AVX1-NEXT: vpand %xmm3, %xmm1, %xmm1
+; AVX1-NEXT: vbroadcastss {{.*#+}} xmm3 = [3.57331108E-43,3.57331108E-43,3.57331108E-43,3.57331108E-43]
+; AVX1-NEXT: vandps %xmm3, %xmm2, %xmm2
+; AVX1-NEXT: vandps %xmm3, %xmm1, %xmm1
; AVX1-NEXT: vpackusdw %xmm2, %xmm1, %xmm1
; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm2
-; AVX1-NEXT: vpand %xmm3, %xmm2, %xmm2
-; AVX1-NEXT: vpand %xmm3, %xmm0, %xmm0
+; AVX1-NEXT: vandps %xmm3, %xmm2, %xmm2
+; AVX1-NEXT: vandps %xmm3, %xmm0, %xmm0
; AVX1-NEXT: vpackusdw %xmm2, %xmm0, %xmm0
; AVX1-NEXT: vpackuswb %xmm1, %xmm0, %xmm0
; AVX1-NEXT: vpor {{.*}}(%rip), %xmm0, %xmm0
Modified: llvm/trunk/test/CodeGen/X86/vector-trunc-packus.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-trunc-packus.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/vector-trunc-packus.ll (original)
+++ llvm/trunk/test/CodeGen/X86/vector-trunc-packus.ll Sun Oct 14 18:51:58 2018
@@ -2070,24 +2070,26 @@ define void @trunc_packus_v8i64_v8i8_sto
; AVX1-NEXT: vblendvpd %ymm3, %ymm1, %ymm2, %ymm1
; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm3
; AVX1-NEXT: vpcmpgtq %xmm3, %xmm4, %xmm3
-; AVX1-NEXT: vpcmpgtq %xmm0, %xmm4, %xmm5
-; AVX1-NEXT: vinsertf128 $1, %xmm3, %ymm5, %ymm3
+; AVX1-NEXT: vpcmpgtq %xmm0, %xmm4, %xmm4
+; AVX1-NEXT: vinsertf128 $1, %xmm3, %ymm4, %ymm3
; AVX1-NEXT: vblendvpd %ymm3, %ymm0, %ymm2, %ymm0
; AVX1-NEXT: vxorpd %xmm2, %xmm2, %xmm2
; AVX1-NEXT: vpcmpgtq %xmm2, %xmm0, %xmm8
-; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm5
-; AVX1-NEXT: vpcmpgtq %xmm2, %xmm5, %xmm6
-; AVX1-NEXT: vpcmpgtq %xmm2, %xmm1, %xmm7
-; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm3
-; AVX1-NEXT: vpcmpgtq %xmm2, %xmm3, %xmm2
-; AVX1-NEXT: vpand %xmm4, %xmm3, %xmm3
-; AVX1-NEXT: vpand %xmm3, %xmm2, %xmm2
-; AVX1-NEXT: vpand %xmm4, %xmm1, %xmm1
-; AVX1-NEXT: vpand %xmm1, %xmm7, %xmm1
+; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm4
+; AVX1-NEXT: vpcmpgtq %xmm2, %xmm4, %xmm5
+; AVX1-NEXT: vpcmpgtq %xmm2, %xmm1, %xmm6
+; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm7
+; AVX1-NEXT: vpcmpgtq %xmm2, %xmm7, %xmm2
+; AVX1-NEXT: vmovddup {{.*#+}} xmm3 = [1.2598673968951787E-321,1.2598673968951787E-321]
+; AVX1-NEXT: # xmm3 = mem[0,0]
+; AVX1-NEXT: vpand %xmm3, %xmm7, %xmm7
+; AVX1-NEXT: vpand %xmm7, %xmm2, %xmm2
+; AVX1-NEXT: vpand %xmm3, %xmm1, %xmm1
+; AVX1-NEXT: vpand %xmm1, %xmm6, %xmm1
; AVX1-NEXT: vpackusdw %xmm2, %xmm1, %xmm1
-; AVX1-NEXT: vpand %xmm4, %xmm5, %xmm2
-; AVX1-NEXT: vpand %xmm2, %xmm6, %xmm2
-; AVX1-NEXT: vpand %xmm4, %xmm0, %xmm0
+; AVX1-NEXT: vpand %xmm3, %xmm4, %xmm2
+; AVX1-NEXT: vpand %xmm2, %xmm5, %xmm2
+; AVX1-NEXT: vpand %xmm3, %xmm0, %xmm0
; AVX1-NEXT: vpand %xmm0, %xmm8, %xmm0
; AVX1-NEXT: vpackusdw %xmm2, %xmm0, %xmm0
; AVX1-NEXT: vpackusdw %xmm1, %xmm0, %xmm0
Modified: llvm/trunk/test/CodeGen/X86/vector-trunc-ssat.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-trunc-ssat.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/vector-trunc-ssat.ll (original)
+++ llvm/trunk/test/CodeGen/X86/vector-trunc-ssat.ll Sun Oct 14 18:51:58 2018
@@ -2001,7 +2001,8 @@ define void @trunc_ssat_v8i64_v8i8_store
; AVX1-NEXT: vinsertf128 $1, %xmm3, %ymm4, %ymm3
; AVX1-NEXT: vblendvpd %ymm3, %ymm1, %ymm2, %ymm1
; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm2
-; AVX1-NEXT: vmovapd {{.*#+}} xmm3 = [255,0,0,0,0,0,0,0,255,0,0,0,0,0,0,0]
+; AVX1-NEXT: vmovddup {{.*#+}} xmm3 = [1.2598673968951787E-321,1.2598673968951787E-321]
+; AVX1-NEXT: # xmm3 = mem[0,0]
; AVX1-NEXT: vandpd %xmm3, %xmm2, %xmm2
; AVX1-NEXT: vandpd %xmm3, %xmm1, %xmm1
; AVX1-NEXT: vpackusdw %xmm2, %xmm1, %xmm1
Modified: llvm/trunk/test/CodeGen/X86/vector-trunc-usat.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-trunc-usat.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/vector-trunc-usat.ll (original)
+++ llvm/trunk/test/CodeGen/X86/vector-trunc-usat.ll Sun Oct 14 18:51:58 2018
@@ -1417,7 +1417,8 @@ define void @trunc_usat_v8i64_v8i8_store
; AVX1-NEXT: vinsertf128 $1, %xmm3, %ymm4, %ymm3
; AVX1-NEXT: vblendvpd %ymm3, %ymm1, %ymm2, %ymm1
; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm2
-; AVX1-NEXT: vmovapd {{.*#+}} xmm3 = [255,0,0,0,0,0,0,0,255,0,0,0,0,0,0,0]
+; AVX1-NEXT: vmovddup {{.*#+}} xmm3 = [1.2598673968951787E-321,1.2598673968951787E-321]
+; AVX1-NEXT: # xmm3 = mem[0,0]
; AVX1-NEXT: vandpd %xmm3, %xmm2, %xmm2
; AVX1-NEXT: vandpd %xmm3, %xmm1, %xmm1
; AVX1-NEXT: vpackusdw %xmm2, %xmm1, %xmm1
Modified: llvm/trunk/test/CodeGen/X86/vector-trunc.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-trunc.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/vector-trunc.ll (original)
+++ llvm/trunk/test/CodeGen/X86/vector-trunc.ll Sun Oct 14 18:51:58 2018
@@ -286,13 +286,14 @@ define void @trunc8i64_8i8(<8 x i64> %a)
; AVX1-LABEL: trunc8i64_8i8:
; AVX1: # %bb.0: # %entry
; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm2
-; AVX1-NEXT: vmovdqa {{.*#+}} xmm3 = [255,0,0,0,0,0,0,0,255,0,0,0,0,0,0,0]
-; AVX1-NEXT: vpand %xmm3, %xmm2, %xmm2
-; AVX1-NEXT: vpand %xmm3, %xmm1, %xmm1
+; AVX1-NEXT: vmovddup {{.*#+}} xmm3 = [1.2598673968951787E-321,1.2598673968951787E-321]
+; AVX1-NEXT: # xmm3 = mem[0,0]
+; AVX1-NEXT: vandpd %xmm3, %xmm2, %xmm2
+; AVX1-NEXT: vandpd %xmm3, %xmm1, %xmm1
; AVX1-NEXT: vpackusdw %xmm2, %xmm1, %xmm1
; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm2
-; AVX1-NEXT: vpand %xmm3, %xmm2, %xmm2
-; AVX1-NEXT: vpand %xmm3, %xmm0, %xmm0
+; AVX1-NEXT: vandpd %xmm3, %xmm2, %xmm2
+; AVX1-NEXT: vandpd %xmm3, %xmm0, %xmm0
; AVX1-NEXT: vpackusdw %xmm2, %xmm0, %xmm0
; AVX1-NEXT: vpackusdw %xmm1, %xmm0, %xmm0
; AVX1-NEXT: vpackuswb %xmm0, %xmm0, %xmm0
@@ -907,13 +908,13 @@ define void @trunc16i32_16i8(<16 x i32>
; AVX1-LABEL: trunc16i32_16i8:
; AVX1: # %bb.0: # %entry
; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm2
-; AVX1-NEXT: vmovdqa {{.*#+}} xmm3 = [255,0,0,0,255,0,0,0,255,0,0,0,255,0,0,0]
-; AVX1-NEXT: vpand %xmm3, %xmm2, %xmm2
-; AVX1-NEXT: vpand %xmm3, %xmm1, %xmm1
+; AVX1-NEXT: vbroadcastss {{.*#+}} xmm3 = [3.57331108E-43,3.57331108E-43,3.57331108E-43,3.57331108E-43]
+; AVX1-NEXT: vandps %xmm3, %xmm2, %xmm2
+; AVX1-NEXT: vandps %xmm3, %xmm1, %xmm1
; AVX1-NEXT: vpackusdw %xmm2, %xmm1, %xmm1
; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm2
-; AVX1-NEXT: vpand %xmm3, %xmm2, %xmm2
-; AVX1-NEXT: vpand %xmm3, %xmm0, %xmm0
+; AVX1-NEXT: vandps %xmm3, %xmm2, %xmm2
+; AVX1-NEXT: vandps %xmm3, %xmm0, %xmm0
; AVX1-NEXT: vpackusdw %xmm2, %xmm0, %xmm0
; AVX1-NEXT: vpackuswb %xmm1, %xmm0, %xmm0
; AVX1-NEXT: vmovdqu %xmm0, (%rax)
Modified: llvm/trunk/test/CodeGen/X86/vshift-6.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vshift-6.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/vshift-6.ll (original)
+++ llvm/trunk/test/CodeGen/X86/vshift-6.ll Sun Oct 14 18:51:58 2018
@@ -50,8 +50,8 @@ define <16 x i8> @do_not_crash(i8*, i32*
; X32-NEXT: movdqa %xmm2, %xmm4
; X32-NEXT: pandn %xmm0, %xmm4
; X32-NEXT: psllw $2, %xmm0
-; X32-NEXT: pand {{\.LCPI.*}}, %xmm0
; X32-NEXT: pand %xmm2, %xmm0
+; X32-NEXT: pand {{\.LCPI.*}}, %xmm0
; X32-NEXT: por %xmm4, %xmm0
; X32-NEXT: paddb %xmm1, %xmm1
; X32-NEXT: pcmpgtb %xmm1, %xmm3
@@ -85,8 +85,8 @@ define <16 x i8> @do_not_crash(i8*, i32*
; X64-NEXT: movdqa %xmm2, %xmm4
; X64-NEXT: pandn %xmm0, %xmm4
; X64-NEXT: psllw $2, %xmm0
-; X64-NEXT: pand {{.*}}(%rip), %xmm0
; X64-NEXT: pand %xmm2, %xmm0
+; X64-NEXT: pand {{.*}}(%rip), %xmm0
; X64-NEXT: por %xmm4, %xmm0
; X64-NEXT: paddb %xmm1, %xmm1
; X64-NEXT: pcmpgtb %xmm1, %xmm3
Modified: llvm/trunk/test/CodeGen/X86/x86-interleaved-access.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/x86-interleaved-access.ll?rev=344487&r1=344486&r2=344487&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/x86-interleaved-access.ll (original)
+++ llvm/trunk/test/CodeGen/X86/x86-interleaved-access.ll Sun Oct 14 18:51:58 2018
@@ -1029,7 +1029,8 @@ define <32 x i8> @interleaved_load_vf32_
; AVX1-NEXT: vinsertf128 $1, %xmm4, %ymm1, %ymm2
; AVX1-NEXT: vpalignr {{.*#+}} xmm9 = xmm7[11,12,13,14,15],xmm1[0,1,2,3,4,5,6,7,8,9,10]
; AVX1-NEXT: vpalignr {{.*#+}} xmm4 = xmm6[11,12,13,14,15],xmm4[0,1,2,3,4,5,6,7,8,9,10]
-; AVX1-NEXT: vmovaps {{.*#+}} ymm5 = [255,255,255,255,255,255,255,255,255,255,255,0,0,0,0,0,255,255,255,255,255,255,255,255,255,255,255,0,0,0,0,0]
+; AVX1-NEXT: vbroadcastf128 {{.*#+}} ymm5 = [255,255,255,255,255,255,255,255,255,255,255,0,0,0,0,0,255,255,255,255,255,255,255,255,255,255,255,0,0,0,0,0]
+; AVX1-NEXT: # ymm5 = mem[0,1,0,1]
; AVX1-NEXT: vandnps %ymm2, %ymm5, %ymm2
; AVX1-NEXT: vandps %ymm5, %ymm8, %ymm5
; AVX1-NEXT: vorps %ymm2, %ymm5, %ymm2
@@ -1585,7 +1586,8 @@ define <64 x i8> @interleaved_load_vf64_
; AVX1-NEXT: vpalignr {{.*#+}} xmm4 = xmm12[11,12,13,14,15],xmm4[0,1,2,3,4,5,6,7,8,9,10]
; AVX1-NEXT: vpalignr {{.*#+}} xmm11 = xmm11[11,12,13,14,15],xmm13[0,1,2,3,4,5,6,7,8,9,10]
; AVX1-NEXT: vinsertf128 $1, %xmm4, %ymm11, %ymm12
-; AVX1-NEXT: vmovaps {{.*#+}} ymm13 = [255,255,255,255,255,255,255,255,255,255,255,0,0,0,0,0,255,255,255,255,255,255,255,255,255,255,255,0,0,0,0,0]
+; AVX1-NEXT: vbroadcastf128 {{.*#+}} ymm13 = [255,255,255,255,255,255,255,255,255,255,255,0,0,0,0,0,255,255,255,255,255,255,255,255,255,255,255,0,0,0,0,0]
+; AVX1-NEXT: # ymm13 = mem[0,1,0,1]
; AVX1-NEXT: vandnps %ymm12, %ymm13, %ymm12
; AVX1-NEXT: vandps %ymm13, %ymm14, %ymm14
; AVX1-NEXT: vorps %ymm12, %ymm14, %ymm12
More information about the llvm-commits
mailing list