[llvm] b5d7bee - [X86] combineConcatVectorOps - add support for concatenation of VSELECT/BLENDV nodes (REAPPLIED)

Jorge Gorbe Moya via llvm-commits llvm-commits at lists.llvm.org
Fri Jun 24 18:51:30 PDT 2022


I got a reduced test case. I'm not sure if it's completely legal IR because
I used some automatic tools to get to this point, but clang accepts it and
it reproduces the issue.

In order to reproduce the problem, run:

    clang -O2 -c reduced2.ll

In our builds before this patch it finishes instantly, but it goes on
forever after this patch.

Hope this helps, please let me know if there's anything else I can do to
help debug this.

On Fri, Jun 24, 2022 at 12:20 PM Jorge Gorbe Moya <jgorbe at google.com> wrote:

> Hi!
>
> We're experiencing build timeouts in one of our files, and we've bisected
> the problem to this patch. It goes from around 10 secs to timing out after
> multiple minutes, so I suspect this change is introducing an infinite loop
> somewhere. We're working on a reproducer, we'll follow up with more details
> ASAP.
>
> Best,
> Jorge
>
>
> On Sun, Jun 12, 2022 at 7:41 AM Simon Pilgrim via llvm-commits <
> llvm-commits at lists.llvm.org> wrote:
>
>>
>> Author: Simon Pilgrim
>> Date: 2022-06-12T15:40:36+01:00
>> New Revision: b5d7beeb9792f626814b1a521872b611fbbaedd6
>>
>> URL:
>> https://github.com/llvm/llvm-project/commit/b5d7beeb9792f626814b1a521872b611fbbaedd6
>> DIFF:
>> https://github.com/llvm/llvm-project/commit/b5d7beeb9792f626814b1a521872b611fbbaedd6.diff
>>
>> LOG: [X86] combineConcatVectorOps - add support for concatenation of
>> VSELECT/BLENDV nodes (REAPPLIED)
>>
>> If the LHS/RHS selection operands can be cheaply concatenated back
>> together then replace 2 x 128-bit selection nodes with 1 x 256-bit node
>>
>> Addresses the regression introduced in the bug fix from
>> rGd5af6a38082b39ae520a328e44dc29ebcb036bb2
>>
>> REAPPLIED with for bug identified in rGea8fb3b60196
>>
>> Added:
>>
>>
>> Modified:
>>     llvm/lib/Target/X86/X86ISelLowering.cpp
>>     llvm/test/CodeGen/X86/vec_minmax_sint.ll
>>     llvm/test/CodeGen/X86/vec_minmax_uint.ll
>>     llvm/test/CodeGen/X86/vselect-avx.ll
>>     llvm/test/CodeGen/X86/vselect-minmax.ll
>>
>> Removed:
>>
>>
>>
>>
>> ################################################################################
>> diff  --git a/llvm/lib/Target/X86/X86ISelLowering.cpp
>> b/llvm/lib/Target/X86/X86ISelLowering.cpp
>> index f7850bd14dcd..d18db3b99234 100644
>> --- a/llvm/lib/Target/X86/X86ISelLowering.cpp
>> +++ b/llvm/lib/Target/X86/X86ISelLowering.cpp
>> @@ -53518,6 +53518,12 @@ static SDValue combineConcatVectorOps(const
>> SDLoc &DL, MVT VT,
>>          Subs.push_back(SubOp.getOperand(I));
>>        return DAG.getNode(ISD::CONCAT_VECTORS, DL, VT, Subs);
>>      };
>> +    auto IsConcatFree = [](MVT VT, ArrayRef<SDValue> SubOps, unsigned I)
>> {
>> +      return all_of(SubOps, [VT, I](SDValue Sub) {
>> +        return Sub.getOperand(I).getOpcode() == ISD::EXTRACT_SUBVECTOR &&
>> +               Sub.getOperand(I).getOperand(0).getValueType() == VT;
>> +      });
>> +    };
>>
>>      unsigned NumOps = Ops.size();
>>      switch (Op0.getOpcode()) {
>> @@ -53701,8 +53707,19 @@ static SDValue combineConcatVectorOps(const
>> SDLoc &DL, MVT VT,
>>                             ConcatSubOperand(VT, Ops, 1),
>> Op0.getOperand(2));
>>        }
>>        break;
>> -      // TODO: ISD::VSELECT and X86ISD::BLENDV handling if some of the
>> args can
>> -      // be concatenated for free.
>> +    case ISD::VSELECT:
>> +    case X86ISD::BLENDV:
>> +      if (!IsSplat && VT.is256BitVector() && Ops.size() == 2 &&
>> +          (VT.getScalarSizeInBits() >= 32 || Subtarget.hasInt256()) &&
>> +          IsConcatFree(VT, Ops, 1) && IsConcatFree(VT, Ops, 2)) {
>> +        EVT SelVT = Ops[0].getOperand(0).getValueType();
>> +        SelVT = SelVT.getDoubleNumVectorElementsVT(*DAG.getContext());
>> +        return DAG.getNode(Op0.getOpcode(), DL, VT,
>> +                           ConcatSubOperand(SelVT.getSimpleVT(), Ops, 0),
>> +                           ConcatSubOperand(VT, Ops, 1),
>> +                           ConcatSubOperand(VT, Ops, 2));
>> +      }
>> +      break;
>>      }
>>    }
>>
>>
>> diff  --git a/llvm/test/CodeGen/X86/vec_minmax_sint.ll
>> b/llvm/test/CodeGen/X86/vec_minmax_sint.ll
>> index 155e22ecc21f..a20e6b4c83de 100644
>> --- a/llvm/test/CodeGen/X86/vec_minmax_sint.ll
>> +++ b/llvm/test/CodeGen/X86/vec_minmax_sint.ll
>> @@ -161,11 +161,10 @@ define <4 x i64> @max_gt_v4i64(<4 x i64> %a, <4 x
>> i64> %b) {
>>  ; AVX1:       # %bb.0:
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm2
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm3
>> -; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm3, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm3, %xmm2, %xmm2
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm3, %xmm2
>>  ; AVX1-NEXT:    vpcmpgtq %xmm1, %xmm0, %xmm3
>> -; AVX1-NEXT:    vblendvpd %xmm3, %xmm0, %xmm1, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm3, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: max_gt_v4i64:
>> @@ -543,11 +542,10 @@ define <4 x i64> @max_ge_v4i64(<4 x i64> %a, <4 x
>> i64> %b) {
>>  ; AVX1:       # %bb.0:
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm2
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm3
>> -; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm3, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm3, %xmm2, %xmm2
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm3, %xmm2
>>  ; AVX1-NEXT:    vpcmpgtq %xmm1, %xmm0, %xmm3
>> -; AVX1-NEXT:    vblendvpd %xmm3, %xmm0, %xmm1, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm3, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: max_ge_v4i64:
>> @@ -925,11 +923,10 @@ define <4 x i64> @min_lt_v4i64(<4 x i64> %a, <4 x
>> i64> %b) {
>>  ; AVX1:       # %bb.0:
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm2
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm3
>> -; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm3, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm2, %xmm3, %xmm2
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm3, %xmm2
>>  ; AVX1-NEXT:    vpcmpgtq %xmm0, %xmm1, %xmm3
>> -; AVX1-NEXT:    vblendvpd %xmm3, %xmm0, %xmm1, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm3, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: min_lt_v4i64:
>> @@ -1307,11 +1304,10 @@ define <4 x i64> @min_le_v4i64(<4 x i64> %a, <4 x
>> i64> %b) {
>>  ; AVX1:       # %bb.0:
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm2
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm3
>> -; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm3, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm2, %xmm3, %xmm2
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm3, %xmm2
>>  ; AVX1-NEXT:    vpcmpgtq %xmm0, %xmm1, %xmm3
>> -; AVX1-NEXT:    vblendvpd %xmm3, %xmm0, %xmm1, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm3, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: min_le_v4i64:
>>
>> diff  --git a/llvm/test/CodeGen/X86/vec_minmax_uint.ll
>> b/llvm/test/CodeGen/X86/vec_minmax_uint.ll
>> index 7ce8c3a6d4fc..49adfbf5acfd 100644
>> --- a/llvm/test/CodeGen/X86/vec_minmax_uint.ll
>> +++ b/llvm/test/CodeGen/X86/vec_minmax_uint.ll
>> @@ -178,16 +178,15 @@ define <4 x i64> @max_gt_v4i64(<4 x i64> %a, <4 x
>> i64> %b) {
>>  ; AVX1:       # %bb.0:
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm2
>>  ; AVX1-NEXT:    vmovdqa {{.*#+}} xmm3 =
>> [9223372036854775808,9223372036854775808]
>> -; AVX1-NEXT:    vpxor %xmm3, %xmm2, %xmm4
>> -; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm5
>> -; AVX1-NEXT:    vpxor %xmm3, %xmm5, %xmm6
>> -; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm6, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm5, %xmm2, %xmm2
>> +; AVX1-NEXT:    vpxor %xmm3, %xmm2, %xmm2
>> +; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm4
>> +; AVX1-NEXT:    vpxor %xmm3, %xmm4, %xmm4
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm2
>>  ; AVX1-NEXT:    vpxor %xmm3, %xmm1, %xmm4
>>  ; AVX1-NEXT:    vpxor %xmm3, %xmm0, %xmm3
>>  ; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm3, %xmm3
>> -; AVX1-NEXT:    vblendvpd %xmm3, %xmm0, %xmm1, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm3, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: max_gt_v4i64:
>> @@ -585,16 +584,15 @@ define <4 x i64> @max_ge_v4i64(<4 x i64> %a, <4 x
>> i64> %b) {
>>  ; AVX1:       # %bb.0:
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm2
>>  ; AVX1-NEXT:    vmovdqa {{.*#+}} xmm3 =
>> [9223372036854775808,9223372036854775808]
>> -; AVX1-NEXT:    vpxor %xmm3, %xmm2, %xmm4
>> -; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm5
>> -; AVX1-NEXT:    vpxor %xmm3, %xmm5, %xmm6
>> -; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm6, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm5, %xmm2, %xmm2
>> +; AVX1-NEXT:    vpxor %xmm3, %xmm2, %xmm2
>> +; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm4
>> +; AVX1-NEXT:    vpxor %xmm3, %xmm4, %xmm4
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm2
>>  ; AVX1-NEXT:    vpxor %xmm3, %xmm1, %xmm4
>>  ; AVX1-NEXT:    vpxor %xmm3, %xmm0, %xmm3
>>  ; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm3, %xmm3
>> -; AVX1-NEXT:    vblendvpd %xmm3, %xmm0, %xmm1, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm3, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: max_ge_v4i64:
>> @@ -991,16 +989,15 @@ define <4 x i64> @min_lt_v4i64(<4 x i64> %a, <4 x
>> i64> %b) {
>>  ; AVX1:       # %bb.0:
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm2
>>  ; AVX1-NEXT:    vmovdqa {{.*#+}} xmm3 =
>> [9223372036854775808,9223372036854775808]
>> -; AVX1-NEXT:    vpxor %xmm3, %xmm2, %xmm4
>> -; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm5
>> -; AVX1-NEXT:    vpxor %xmm3, %xmm5, %xmm6
>> -; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm6, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm2, %xmm5, %xmm2
>> +; AVX1-NEXT:    vpxor %xmm3, %xmm2, %xmm2
>> +; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm4
>> +; AVX1-NEXT:    vpxor %xmm3, %xmm4, %xmm4
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm2
>>  ; AVX1-NEXT:    vpxor %xmm3, %xmm0, %xmm4
>>  ; AVX1-NEXT:    vpxor %xmm3, %xmm1, %xmm3
>>  ; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm3, %xmm3
>> -; AVX1-NEXT:    vblendvpd %xmm3, %xmm0, %xmm1, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm3, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: min_lt_v4i64:
>> @@ -1400,16 +1397,15 @@ define <4 x i64> @min_le_v4i64(<4 x i64> %a, <4 x
>> i64> %b) {
>>  ; AVX1:       # %bb.0:
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm2
>>  ; AVX1-NEXT:    vmovdqa {{.*#+}} xmm3 =
>> [9223372036854775808,9223372036854775808]
>> -; AVX1-NEXT:    vpxor %xmm3, %xmm2, %xmm4
>> -; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm5
>> -; AVX1-NEXT:    vpxor %xmm3, %xmm5, %xmm6
>> -; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm6, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm2, %xmm5, %xmm2
>> +; AVX1-NEXT:    vpxor %xmm3, %xmm2, %xmm2
>> +; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm4
>> +; AVX1-NEXT:    vpxor %xmm3, %xmm4, %xmm4
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm2
>>  ; AVX1-NEXT:    vpxor %xmm3, %xmm0, %xmm4
>>  ; AVX1-NEXT:    vpxor %xmm3, %xmm1, %xmm3
>>  ; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm3, %xmm3
>> -; AVX1-NEXT:    vblendvpd %xmm3, %xmm0, %xmm1, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm3, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: min_le_v4i64:
>>
>> diff  --git a/llvm/test/CodeGen/X86/vselect-avx.ll
>> b/llvm/test/CodeGen/X86/vselect-avx.ll
>> index 9a58d7b58744..90b67d93e80a 100644
>> --- a/llvm/test/CodeGen/X86/vselect-avx.ll
>> +++ b/llvm/test/CodeGen/X86/vselect-avx.ll
>> @@ -198,12 +198,10 @@ define void @blendv_split(<8 x i32>* %p, <8 x i32>
>> %cond, <8 x i32> %a, <8 x i32
>>    ret void
>>  }
>>
>> +; Regression test for rGea8fb3b60196
>>  define void @vselect_concat() {
>>  ; AVX-LABEL: vselect_concat:
>>  ; AVX:       ## %bb.0: ## %entry
>> -; AVX-NEXT:    vbroadcastf128 {{.*#+}} ymm0 = mem[0,1,0,1]
>> -; AVX-NEXT:    vmovaps %ymm0, (%rax)
>> -; AVX-NEXT:    vzeroupper
>>  ; AVX-NEXT:    retq
>>  entry:
>>    %0 = load <8 x i32>, <8 x i32>* undef
>>
>> diff  --git a/llvm/test/CodeGen/X86/vselect-minmax.ll
>> b/llvm/test/CodeGen/X86/vselect-minmax.ll
>> index f33b66d95cc1..e8485ef3d636 100644
>> --- a/llvm/test/CodeGen/X86/vselect-minmax.ll
>> +++ b/llvm/test/CodeGen/X86/vselect-minmax.ll
>> @@ -4575,18 +4575,16 @@ define <8 x i64> @test121(<8 x i64> %a, <8 x i64>
>> %b) {
>>  ; AVX1:       # %bb.0: # %entry
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm4
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm2, %xmm5
>> -; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm5, %xmm6
>> -; AVX1-NEXT:    vblendvpd %xmm6, %xmm4, %xmm5, %xmm4
>> +; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm5, %xmm4
>>  ; AVX1-NEXT:    vpcmpgtq %xmm0, %xmm2, %xmm5
>> -; AVX1-NEXT:    vblendvpd %xmm5, %xmm0, %xmm2, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm4, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm4, %ymm5, %ymm4
>> +; AVX1-NEXT:    vblendvpd %ymm4, %ymm0, %ymm2, %ymm0
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm2
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm3, %xmm4
>> -; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm5
>> -; AVX1-NEXT:    vblendvpd %xmm5, %xmm2, %xmm4, %xmm2
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm2
>>  ; AVX1-NEXT:    vpcmpgtq %xmm1, %xmm3, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm1, %xmm3, %xmm1
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm1, %ymm1
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm4, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm1, %ymm3, %ymm1
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: test121:
>> @@ -4698,18 +4696,16 @@ define <8 x i64> @test122(<8 x i64> %a, <8 x i64>
>> %b) {
>>  ; AVX1:       # %bb.0: # %entry
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm4
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm2, %xmm5
>> -; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm5, %xmm6
>> -; AVX1-NEXT:    vblendvpd %xmm6, %xmm4, %xmm5, %xmm4
>> +; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm5, %xmm4
>>  ; AVX1-NEXT:    vpcmpgtq %xmm0, %xmm2, %xmm5
>> -; AVX1-NEXT:    vblendvpd %xmm5, %xmm0, %xmm2, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm4, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm4, %ymm5, %ymm4
>> +; AVX1-NEXT:    vblendvpd %ymm4, %ymm0, %ymm2, %ymm0
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm2
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm3, %xmm4
>> -; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm5
>> -; AVX1-NEXT:    vblendvpd %xmm5, %xmm2, %xmm4, %xmm2
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm2
>>  ; AVX1-NEXT:    vpcmpgtq %xmm1, %xmm3, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm1, %xmm3, %xmm1
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm1, %ymm1
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm4, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm1, %ymm3, %ymm1
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: test122:
>> @@ -4820,18 +4816,16 @@ define <8 x i64> @test123(<8 x i64> %a, <8 x i64>
>> %b) {
>>  ; AVX1:       # %bb.0: # %entry
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm2, %xmm4
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm5
>> -; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm5, %xmm6
>> -; AVX1-NEXT:    vblendvpd %xmm6, %xmm5, %xmm4, %xmm4
>> +; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm5, %xmm4
>>  ; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm0, %xmm5
>> -; AVX1-NEXT:    vblendvpd %xmm5, %xmm0, %xmm2, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm4, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm4, %ymm5, %ymm4
>> +; AVX1-NEXT:    vblendvpd %ymm4, %ymm0, %ymm2, %ymm0
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm3, %xmm2
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm4
>> -; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm5
>> -; AVX1-NEXT:    vblendvpd %xmm5, %xmm4, %xmm2, %xmm2
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm2
>>  ; AVX1-NEXT:    vpcmpgtq %xmm3, %xmm1, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm1, %xmm3, %xmm1
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm1, %ymm1
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm4, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm1, %ymm3, %ymm1
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: test123:
>> @@ -4942,18 +4936,16 @@ define <8 x i64> @test124(<8 x i64> %a, <8 x i64>
>> %b) {
>>  ; AVX1:       # %bb.0: # %entry
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm2, %xmm4
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm5
>> -; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm5, %xmm6
>> -; AVX1-NEXT:    vblendvpd %xmm6, %xmm5, %xmm4, %xmm4
>> +; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm5, %xmm4
>>  ; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm0, %xmm5
>> -; AVX1-NEXT:    vblendvpd %xmm5, %xmm0, %xmm2, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm4, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm4, %ymm5, %ymm4
>> +; AVX1-NEXT:    vblendvpd %ymm4, %ymm0, %ymm2, %ymm0
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm3, %xmm2
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm4
>> -; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm5
>> -; AVX1-NEXT:    vblendvpd %xmm5, %xmm4, %xmm2, %xmm2
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm2
>>  ; AVX1-NEXT:    vpcmpgtq %xmm3, %xmm1, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm1, %xmm3, %xmm1
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm1, %ymm1
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm4, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm1, %ymm3, %ymm1
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: test124:
>> @@ -5080,27 +5072,25 @@ define <8 x i64> @test125(<8 x i64> %a, <8 x i64>
>> %b) {
>>  ; AVX1:       # %bb.0: # %entry
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm4
>>  ; AVX1-NEXT:    vmovdqa {{.*#+}} xmm5 =
>> [9223372036854775808,9223372036854775808]
>> -; AVX1-NEXT:    vpxor %xmm5, %xmm4, %xmm8
>> -; AVX1-NEXT:    vextractf128 $1, %ymm2, %xmm7
>> -; AVX1-NEXT:    vpxor %xmm5, %xmm7, %xmm6
>> -; AVX1-NEXT:    vpcmpgtq %xmm8, %xmm6, %xmm6
>> -; AVX1-NEXT:    vblendvpd %xmm6, %xmm4, %xmm7, %xmm4
>> +; AVX1-NEXT:    vpxor %xmm5, %xmm4, %xmm4
>> +; AVX1-NEXT:    vextractf128 $1, %ymm2, %xmm6
>> +; AVX1-NEXT:    vpxor %xmm5, %xmm6, %xmm6
>> +; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm6, %xmm4
>>  ; AVX1-NEXT:    vpxor %xmm5, %xmm0, %xmm6
>>  ; AVX1-NEXT:    vpxor %xmm5, %xmm2, %xmm7
>>  ; AVX1-NEXT:    vpcmpgtq %xmm6, %xmm7, %xmm6
>> -; AVX1-NEXT:    vblendvpd %xmm6, %xmm0, %xmm2, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm4, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm4, %ymm6, %ymm4
>> +; AVX1-NEXT:    vblendvpd %ymm4, %ymm0, %ymm2, %ymm0
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm2
>> -; AVX1-NEXT:    vpxor %xmm5, %xmm2, %xmm4
>> -; AVX1-NEXT:    vextractf128 $1, %ymm3, %xmm6
>> -; AVX1-NEXT:    vpxor %xmm5, %xmm6, %xmm7
>> -; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm7, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm2, %xmm6, %xmm2
>> +; AVX1-NEXT:    vpxor %xmm5, %xmm2, %xmm2
>> +; AVX1-NEXT:    vextractf128 $1, %ymm3, %xmm4
>> +; AVX1-NEXT:    vpxor %xmm5, %xmm4, %xmm4
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm2
>>  ; AVX1-NEXT:    vpxor %xmm5, %xmm1, %xmm4
>>  ; AVX1-NEXT:    vpxor %xmm5, %xmm3, %xmm5
>>  ; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm5, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm1, %xmm3, %xmm1
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm1, %ymm1
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm4, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm1, %ymm3, %ymm1
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: test125:
>> @@ -5232,27 +5222,25 @@ define <8 x i64> @test126(<8 x i64> %a, <8 x i64>
>> %b) {
>>  ; AVX1:       # %bb.0: # %entry
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm4
>>  ; AVX1-NEXT:    vmovdqa {{.*#+}} xmm5 =
>> [9223372036854775808,9223372036854775808]
>> -; AVX1-NEXT:    vpxor %xmm5, %xmm4, %xmm8
>> -; AVX1-NEXT:    vextractf128 $1, %ymm2, %xmm7
>> -; AVX1-NEXT:    vpxor %xmm5, %xmm7, %xmm6
>> -; AVX1-NEXT:    vpcmpgtq %xmm8, %xmm6, %xmm6
>> -; AVX1-NEXT:    vblendvpd %xmm6, %xmm4, %xmm7, %xmm4
>> +; AVX1-NEXT:    vpxor %xmm5, %xmm4, %xmm4
>> +; AVX1-NEXT:    vextractf128 $1, %ymm2, %xmm6
>> +; AVX1-NEXT:    vpxor %xmm5, %xmm6, %xmm6
>> +; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm6, %xmm4
>>  ; AVX1-NEXT:    vpxor %xmm5, %xmm0, %xmm6
>>  ; AVX1-NEXT:    vpxor %xmm5, %xmm2, %xmm7
>>  ; AVX1-NEXT:    vpcmpgtq %xmm6, %xmm7, %xmm6
>> -; AVX1-NEXT:    vblendvpd %xmm6, %xmm0, %xmm2, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm4, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm4, %ymm6, %ymm4
>> +; AVX1-NEXT:    vblendvpd %ymm4, %ymm0, %ymm2, %ymm0
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm2
>> -; AVX1-NEXT:    vpxor %xmm5, %xmm2, %xmm4
>> -; AVX1-NEXT:    vextractf128 $1, %ymm3, %xmm6
>> -; AVX1-NEXT:    vpxor %xmm5, %xmm6, %xmm7
>> -; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm7, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm2, %xmm6, %xmm2
>> +; AVX1-NEXT:    vpxor %xmm5, %xmm2, %xmm2
>> +; AVX1-NEXT:    vextractf128 $1, %ymm3, %xmm4
>> +; AVX1-NEXT:    vpxor %xmm5, %xmm4, %xmm4
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm2
>>  ; AVX1-NEXT:    vpxor %xmm5, %xmm1, %xmm4
>>  ; AVX1-NEXT:    vpxor %xmm5, %xmm3, %xmm5
>>  ; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm5, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm1, %xmm3, %xmm1
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm1, %ymm1
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm4, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm1, %ymm3, %ymm1
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: test126:
>> @@ -5383,27 +5371,25 @@ define <8 x i64> @test127(<8 x i64> %a, <8 x i64>
>> %b) {
>>  ; AVX1:       # %bb.0: # %entry
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm2, %xmm4
>>  ; AVX1-NEXT:    vmovdqa {{.*#+}} xmm5 =
>> [9223372036854775808,9223372036854775808]
>> -; AVX1-NEXT:    vpxor %xmm5, %xmm4, %xmm8
>> -; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm7
>> -; AVX1-NEXT:    vpxor %xmm5, %xmm7, %xmm6
>> -; AVX1-NEXT:    vpcmpgtq %xmm8, %xmm6, %xmm6
>> -; AVX1-NEXT:    vblendvpd %xmm6, %xmm7, %xmm4, %xmm4
>> +; AVX1-NEXT:    vpxor %xmm5, %xmm4, %xmm4
>> +; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm6
>> +; AVX1-NEXT:    vpxor %xmm5, %xmm6, %xmm6
>> +; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm6, %xmm4
>>  ; AVX1-NEXT:    vpxor %xmm5, %xmm2, %xmm6
>>  ; AVX1-NEXT:    vpxor %xmm5, %xmm0, %xmm7
>>  ; AVX1-NEXT:    vpcmpgtq %xmm6, %xmm7, %xmm6
>> -; AVX1-NEXT:    vblendvpd %xmm6, %xmm0, %xmm2, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm4, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm4, %ymm6, %ymm4
>> +; AVX1-NEXT:    vblendvpd %ymm4, %ymm0, %ymm2, %ymm0
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm3, %xmm2
>> -; AVX1-NEXT:    vpxor %xmm5, %xmm2, %xmm4
>> -; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm6
>> -; AVX1-NEXT:    vpxor %xmm5, %xmm6, %xmm7
>> -; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm7, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm6, %xmm2, %xmm2
>> +; AVX1-NEXT:    vpxor %xmm5, %xmm2, %xmm2
>> +; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm4
>> +; AVX1-NEXT:    vpxor %xmm5, %xmm4, %xmm4
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm2
>>  ; AVX1-NEXT:    vpxor %xmm5, %xmm3, %xmm4
>>  ; AVX1-NEXT:    vpxor %xmm5, %xmm1, %xmm5
>>  ; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm5, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm1, %xmm3, %xmm1
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm1, %ymm1
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm4, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm1, %ymm3, %ymm1
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: test127:
>> @@ -5534,27 +5520,25 @@ define <8 x i64> @test128(<8 x i64> %a, <8 x i64>
>> %b) {
>>  ; AVX1:       # %bb.0: # %entry
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm2, %xmm4
>>  ; AVX1-NEXT:    vmovdqa {{.*#+}} xmm5 =
>> [9223372036854775808,9223372036854775808]
>> -; AVX1-NEXT:    vpxor %xmm5, %xmm4, %xmm8
>> -; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm7
>> -; AVX1-NEXT:    vpxor %xmm5, %xmm7, %xmm6
>> -; AVX1-NEXT:    vpcmpgtq %xmm8, %xmm6, %xmm6
>> -; AVX1-NEXT:    vblendvpd %xmm6, %xmm7, %xmm4, %xmm4
>> +; AVX1-NEXT:    vpxor %xmm5, %xmm4, %xmm4
>> +; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm6
>> +; AVX1-NEXT:    vpxor %xmm5, %xmm6, %xmm6
>> +; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm6, %xmm4
>>  ; AVX1-NEXT:    vpxor %xmm5, %xmm2, %xmm6
>>  ; AVX1-NEXT:    vpxor %xmm5, %xmm0, %xmm7
>>  ; AVX1-NEXT:    vpcmpgtq %xmm6, %xmm7, %xmm6
>> -; AVX1-NEXT:    vblendvpd %xmm6, %xmm0, %xmm2, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm4, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm4, %ymm6, %ymm4
>> +; AVX1-NEXT:    vblendvpd %ymm4, %ymm0, %ymm2, %ymm0
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm3, %xmm2
>> -; AVX1-NEXT:    vpxor %xmm5, %xmm2, %xmm4
>> -; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm6
>> -; AVX1-NEXT:    vpxor %xmm5, %xmm6, %xmm7
>> -; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm7, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm6, %xmm2, %xmm2
>> +; AVX1-NEXT:    vpxor %xmm5, %xmm2, %xmm2
>> +; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm4
>> +; AVX1-NEXT:    vpxor %xmm5, %xmm4, %xmm4
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm2
>>  ; AVX1-NEXT:    vpxor %xmm5, %xmm3, %xmm4
>>  ; AVX1-NEXT:    vpxor %xmm5, %xmm1, %xmm5
>>  ; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm5, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm1, %xmm3, %xmm1
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm1, %ymm1
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm4, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm1, %ymm3, %ymm1
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: test128:
>> @@ -7144,18 +7128,16 @@ define <8 x i64> @test153(<8 x i64> %a, <8 x i64>
>> %b) {
>>  ; AVX1:       # %bb.0: # %entry
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm2, %xmm4
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm5
>> -; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm5, %xmm6
>> -; AVX1-NEXT:    vblendvpd %xmm6, %xmm5, %xmm4, %xmm4
>> +; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm5, %xmm4
>>  ; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm0, %xmm5
>> -; AVX1-NEXT:    vblendvpd %xmm5, %xmm0, %xmm2, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm4, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm4, %ymm5, %ymm4
>> +; AVX1-NEXT:    vblendvpd %ymm4, %ymm0, %ymm2, %ymm0
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm3, %xmm2
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm4
>> -; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm5
>> -; AVX1-NEXT:    vblendvpd %xmm5, %xmm4, %xmm2, %xmm2
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm2
>>  ; AVX1-NEXT:    vpcmpgtq %xmm3, %xmm1, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm1, %xmm3, %xmm1
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm1, %ymm1
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm4, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm1, %ymm3, %ymm1
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: test153:
>> @@ -7266,18 +7248,16 @@ define <8 x i64> @test154(<8 x i64> %a, <8 x i64>
>> %b) {
>>  ; AVX1:       # %bb.0: # %entry
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm2, %xmm4
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm5
>> -; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm5, %xmm6
>> -; AVX1-NEXT:    vblendvpd %xmm6, %xmm5, %xmm4, %xmm4
>> +; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm5, %xmm4
>>  ; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm0, %xmm5
>> -; AVX1-NEXT:    vblendvpd %xmm5, %xmm0, %xmm2, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm4, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm4, %ymm5, %ymm4
>> +; AVX1-NEXT:    vblendvpd %ymm4, %ymm0, %ymm2, %ymm0
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm3, %xmm2
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm4
>> -; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm5
>> -; AVX1-NEXT:    vblendvpd %xmm5, %xmm4, %xmm2, %xmm2
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm2
>>  ; AVX1-NEXT:    vpcmpgtq %xmm3, %xmm1, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm1, %xmm3, %xmm1
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm1, %ymm1
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm4, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm1, %ymm3, %ymm1
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: test154:
>> @@ -7389,18 +7369,16 @@ define <8 x i64> @test155(<8 x i64> %a, <8 x i64>
>> %b) {
>>  ; AVX1:       # %bb.0: # %entry
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm4
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm2, %xmm5
>> -; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm5, %xmm6
>> -; AVX1-NEXT:    vblendvpd %xmm6, %xmm4, %xmm5, %xmm4
>> +; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm5, %xmm4
>>  ; AVX1-NEXT:    vpcmpgtq %xmm0, %xmm2, %xmm5
>> -; AVX1-NEXT:    vblendvpd %xmm5, %xmm0, %xmm2, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm4, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm4, %ymm5, %ymm4
>> +; AVX1-NEXT:    vblendvpd %ymm4, %ymm0, %ymm2, %ymm0
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm2
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm3, %xmm4
>> -; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm5
>> -; AVX1-NEXT:    vblendvpd %xmm5, %xmm2, %xmm4, %xmm2
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm2
>>  ; AVX1-NEXT:    vpcmpgtq %xmm1, %xmm3, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm1, %xmm3, %xmm1
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm1, %ymm1
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm4, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm1, %ymm3, %ymm1
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: test155:
>> @@ -7526,27 +7504,25 @@ define <8 x i64> @test156(<8 x i64> %a, <8 x i64>
>> %b) {
>>  ; AVX1:       # %bb.0: # %entry
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm2, %xmm4
>>  ; AVX1-NEXT:    vmovdqa {{.*#+}} xmm5 =
>> [9223372036854775808,9223372036854775808]
>> -; AVX1-NEXT:    vpxor %xmm5, %xmm4, %xmm8
>> -; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm7
>> -; AVX1-NEXT:    vpxor %xmm5, %xmm7, %xmm6
>> -; AVX1-NEXT:    vpcmpgtq %xmm8, %xmm6, %xmm6
>> -; AVX1-NEXT:    vblendvpd %xmm6, %xmm7, %xmm4, %xmm4
>> +; AVX1-NEXT:    vpxor %xmm5, %xmm4, %xmm4
>> +; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm6
>> +; AVX1-NEXT:    vpxor %xmm5, %xmm6, %xmm6
>> +; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm6, %xmm4
>>  ; AVX1-NEXT:    vpxor %xmm5, %xmm2, %xmm6
>>  ; AVX1-NEXT:    vpxor %xmm5, %xmm0, %xmm7
>>  ; AVX1-NEXT:    vpcmpgtq %xmm6, %xmm7, %xmm6
>> -; AVX1-NEXT:    vblendvpd %xmm6, %xmm0, %xmm2, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm4, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm4, %ymm6, %ymm4
>> +; AVX1-NEXT:    vblendvpd %ymm4, %ymm0, %ymm2, %ymm0
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm3, %xmm2
>> -; AVX1-NEXT:    vpxor %xmm5, %xmm2, %xmm4
>> -; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm6
>> -; AVX1-NEXT:    vpxor %xmm5, %xmm6, %xmm7
>> -; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm7, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm6, %xmm2, %xmm2
>> +; AVX1-NEXT:    vpxor %xmm5, %xmm2, %xmm2
>> +; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm4
>> +; AVX1-NEXT:    vpxor %xmm5, %xmm4, %xmm4
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm2
>>  ; AVX1-NEXT:    vpxor %xmm5, %xmm3, %xmm4
>>  ; AVX1-NEXT:    vpxor %xmm5, %xmm1, %xmm5
>>  ; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm5, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm1, %xmm3, %xmm1
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm1, %ymm1
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm4, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm1, %ymm3, %ymm1
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: test156:
>> @@ -7678,27 +7654,25 @@ define <8 x i64> @test159(<8 x i64> %a, <8 x i64>
>> %b) {
>>  ; AVX1:       # %bb.0: # %entry
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm4
>>  ; AVX1-NEXT:    vmovdqa {{.*#+}} xmm5 =
>> [9223372036854775808,9223372036854775808]
>> -; AVX1-NEXT:    vpxor %xmm5, %xmm4, %xmm8
>> -; AVX1-NEXT:    vextractf128 $1, %ymm2, %xmm7
>> -; AVX1-NEXT:    vpxor %xmm5, %xmm7, %xmm6
>> -; AVX1-NEXT:    vpcmpgtq %xmm8, %xmm6, %xmm6
>> -; AVX1-NEXT:    vblendvpd %xmm6, %xmm4, %xmm7, %xmm4
>> +; AVX1-NEXT:    vpxor %xmm5, %xmm4, %xmm4
>> +; AVX1-NEXT:    vextractf128 $1, %ymm2, %xmm6
>> +; AVX1-NEXT:    vpxor %xmm5, %xmm6, %xmm6
>> +; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm6, %xmm4
>>  ; AVX1-NEXT:    vpxor %xmm5, %xmm0, %xmm6
>>  ; AVX1-NEXT:    vpxor %xmm5, %xmm2, %xmm7
>>  ; AVX1-NEXT:    vpcmpgtq %xmm6, %xmm7, %xmm6
>> -; AVX1-NEXT:    vblendvpd %xmm6, %xmm0, %xmm2, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm4, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm4, %ymm6, %ymm4
>> +; AVX1-NEXT:    vblendvpd %ymm4, %ymm0, %ymm2, %ymm0
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm2
>> -; AVX1-NEXT:    vpxor %xmm5, %xmm2, %xmm4
>> -; AVX1-NEXT:    vextractf128 $1, %ymm3, %xmm6
>> -; AVX1-NEXT:    vpxor %xmm5, %xmm6, %xmm7
>> -; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm7, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm2, %xmm6, %xmm2
>> +; AVX1-NEXT:    vpxor %xmm5, %xmm2, %xmm2
>> +; AVX1-NEXT:    vextractf128 $1, %ymm3, %xmm4
>> +; AVX1-NEXT:    vpxor %xmm5, %xmm4, %xmm4
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm2
>>  ; AVX1-NEXT:    vpxor %xmm5, %xmm1, %xmm4
>>  ; AVX1-NEXT:    vpxor %xmm5, %xmm3, %xmm5
>>  ; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm5, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm1, %xmm3, %xmm1
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm1, %ymm1
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm4, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm1, %ymm3, %ymm1
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: test159:
>> @@ -7830,27 +7804,25 @@ define <8 x i64> @test160(<8 x i64> %a, <8 x i64>
>> %b) {
>>  ; AVX1:       # %bb.0: # %entry
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm4
>>  ; AVX1-NEXT:    vmovdqa {{.*#+}} xmm5 =
>> [9223372036854775808,9223372036854775808]
>> -; AVX1-NEXT:    vpxor %xmm5, %xmm4, %xmm8
>> -; AVX1-NEXT:    vextractf128 $1, %ymm2, %xmm7
>> -; AVX1-NEXT:    vpxor %xmm5, %xmm7, %xmm6
>> -; AVX1-NEXT:    vpcmpgtq %xmm8, %xmm6, %xmm6
>> -; AVX1-NEXT:    vblendvpd %xmm6, %xmm4, %xmm7, %xmm4
>> +; AVX1-NEXT:    vpxor %xmm5, %xmm4, %xmm4
>> +; AVX1-NEXT:    vextractf128 $1, %ymm2, %xmm6
>> +; AVX1-NEXT:    vpxor %xmm5, %xmm6, %xmm6
>> +; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm6, %xmm4
>>  ; AVX1-NEXT:    vpxor %xmm5, %xmm0, %xmm6
>>  ; AVX1-NEXT:    vpxor %xmm5, %xmm2, %xmm7
>>  ; AVX1-NEXT:    vpcmpgtq %xmm6, %xmm7, %xmm6
>> -; AVX1-NEXT:    vblendvpd %xmm6, %xmm0, %xmm2, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm4, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm4, %ymm6, %ymm4
>> +; AVX1-NEXT:    vblendvpd %ymm4, %ymm0, %ymm2, %ymm0
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm2
>> -; AVX1-NEXT:    vpxor %xmm5, %xmm2, %xmm4
>> -; AVX1-NEXT:    vextractf128 $1, %ymm3, %xmm6
>> -; AVX1-NEXT:    vpxor %xmm5, %xmm6, %xmm7
>> -; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm7, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm2, %xmm6, %xmm2
>> +; AVX1-NEXT:    vpxor %xmm5, %xmm2, %xmm2
>> +; AVX1-NEXT:    vextractf128 $1, %ymm3, %xmm4
>> +; AVX1-NEXT:    vpxor %xmm5, %xmm4, %xmm4
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm2
>>  ; AVX1-NEXT:    vpxor %xmm5, %xmm1, %xmm4
>>  ; AVX1-NEXT:    vpxor %xmm5, %xmm3, %xmm5
>>  ; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm5, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm1, %xmm3, %xmm1
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm1, %ymm1
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm4, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm1, %ymm3, %ymm1
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: test160:
>> @@ -7928,11 +7900,10 @@ define <4 x i64> @test161(<4 x i64> %a, <4 x i64>
>> %b) {
>>  ; AVX1:       # %bb.0: # %entry
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm2
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm3
>> -; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm3, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm2, %xmm3, %xmm2
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm3, %xmm2
>>  ; AVX1-NEXT:    vpcmpgtq %xmm0, %xmm1, %xmm3
>> -; AVX1-NEXT:    vblendvpd %xmm3, %xmm0, %xmm1, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm3, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: test161:
>> @@ -8011,11 +7982,10 @@ define <4 x i64> @test162(<4 x i64> %a, <4 x i64>
>> %b) {
>>  ; AVX1:       # %bb.0: # %entry
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm2
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm3
>> -; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm3, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm2, %xmm3, %xmm2
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm3, %xmm2
>>  ; AVX1-NEXT:    vpcmpgtq %xmm0, %xmm1, %xmm3
>> -; AVX1-NEXT:    vblendvpd %xmm3, %xmm0, %xmm1, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm3, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: test162:
>> @@ -8093,11 +8063,10 @@ define <4 x i64> @test163(<4 x i64> %a, <4 x i64>
>> %b) {
>>  ; AVX1:       # %bb.0: # %entry
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm2
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm3
>> -; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm3, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm3, %xmm2, %xmm2
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm3, %xmm2
>>  ; AVX1-NEXT:    vpcmpgtq %xmm1, %xmm0, %xmm3
>> -; AVX1-NEXT:    vblendvpd %xmm3, %xmm0, %xmm1, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm3, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: test163:
>> @@ -8175,11 +8144,10 @@ define <4 x i64> @test164(<4 x i64> %a, <4 x i64>
>> %b) {
>>  ; AVX1:       # %bb.0: # %entry
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm2
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm3
>> -; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm3, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm3, %xmm2, %xmm2
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm3, %xmm2
>>  ; AVX1-NEXT:    vpcmpgtq %xmm1, %xmm0, %xmm3
>> -; AVX1-NEXT:    vblendvpd %xmm3, %xmm0, %xmm1, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm3, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: test164:
>> @@ -8265,16 +8233,15 @@ define <4 x i64> @test165(<4 x i64> %a, <4 x i64>
>> %b) {
>>  ; AVX1:       # %bb.0: # %entry
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm2
>>  ; AVX1-NEXT:    vmovdqa {{.*#+}} xmm3 =
>> [9223372036854775808,9223372036854775808]
>> -; AVX1-NEXT:    vpxor %xmm3, %xmm2, %xmm4
>> -; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm5
>> -; AVX1-NEXT:    vpxor %xmm3, %xmm5, %xmm6
>> -; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm6, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm2, %xmm5, %xmm2
>> +; AVX1-NEXT:    vpxor %xmm3, %xmm2, %xmm2
>> +; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm4
>> +; AVX1-NEXT:    vpxor %xmm3, %xmm4, %xmm4
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm2
>>  ; AVX1-NEXT:    vpxor %xmm3, %xmm0, %xmm4
>>  ; AVX1-NEXT:    vpxor %xmm3, %xmm1, %xmm3
>>  ; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm3, %xmm3
>> -; AVX1-NEXT:    vblendvpd %xmm3, %xmm0, %xmm1, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm3, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: test165:
>> @@ -8363,16 +8330,15 @@ define <4 x i64> @test166(<4 x i64> %a, <4 x i64>
>> %b) {
>>  ; AVX1:       # %bb.0: # %entry
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm2
>>  ; AVX1-NEXT:    vmovdqa {{.*#+}} xmm3 =
>> [9223372036854775808,9223372036854775808]
>> -; AVX1-NEXT:    vpxor %xmm3, %xmm2, %xmm4
>> -; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm5
>> -; AVX1-NEXT:    vpxor %xmm3, %xmm5, %xmm6
>> -; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm6, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm2, %xmm5, %xmm2
>> +; AVX1-NEXT:    vpxor %xmm3, %xmm2, %xmm2
>> +; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm4
>> +; AVX1-NEXT:    vpxor %xmm3, %xmm4, %xmm4
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm2
>>  ; AVX1-NEXT:    vpxor %xmm3, %xmm0, %xmm4
>>  ; AVX1-NEXT:    vpxor %xmm3, %xmm1, %xmm3
>>  ; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm3, %xmm3
>> -; AVX1-NEXT:    vblendvpd %xmm3, %xmm0, %xmm1, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm3, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: test166:
>> @@ -8460,16 +8426,15 @@ define <4 x i64> @test167(<4 x i64> %a, <4 x i64>
>> %b) {
>>  ; AVX1:       # %bb.0: # %entry
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm2
>>  ; AVX1-NEXT:    vmovdqa {{.*#+}} xmm3 =
>> [9223372036854775808,9223372036854775808]
>> -; AVX1-NEXT:    vpxor %xmm3, %xmm2, %xmm4
>> -; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm5
>> -; AVX1-NEXT:    vpxor %xmm3, %xmm5, %xmm6
>> -; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm6, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm5, %xmm2, %xmm2
>> +; AVX1-NEXT:    vpxor %xmm3, %xmm2, %xmm2
>> +; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm4
>> +; AVX1-NEXT:    vpxor %xmm3, %xmm4, %xmm4
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm2
>>  ; AVX1-NEXT:    vpxor %xmm3, %xmm1, %xmm4
>>  ; AVX1-NEXT:    vpxor %xmm3, %xmm0, %xmm3
>>  ; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm3, %xmm3
>> -; AVX1-NEXT:    vblendvpd %xmm3, %xmm0, %xmm1, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm3, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: test167:
>> @@ -8557,16 +8522,15 @@ define <4 x i64> @test168(<4 x i64> %a, <4 x i64>
>> %b) {
>>  ; AVX1:       # %bb.0: # %entry
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm2
>>  ; AVX1-NEXT:    vmovdqa {{.*#+}} xmm3 =
>> [9223372036854775808,9223372036854775808]
>> -; AVX1-NEXT:    vpxor %xmm3, %xmm2, %xmm4
>> -; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm5
>> -; AVX1-NEXT:    vpxor %xmm3, %xmm5, %xmm6
>> -; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm6, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm5, %xmm2, %xmm2
>> +; AVX1-NEXT:    vpxor %xmm3, %xmm2, %xmm2
>> +; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm4
>> +; AVX1-NEXT:    vpxor %xmm3, %xmm4, %xmm4
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm2
>>  ; AVX1-NEXT:    vpxor %xmm3, %xmm1, %xmm4
>>  ; AVX1-NEXT:    vpxor %xmm3, %xmm0, %xmm3
>>  ; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm3, %xmm3
>> -; AVX1-NEXT:    vblendvpd %xmm3, %xmm0, %xmm1, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm3, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: test168:
>> @@ -8647,11 +8611,10 @@ define <4 x i64> @test169(<4 x i64> %a, <4 x i64>
>> %b) {
>>  ; AVX1:       # %bb.0: # %entry
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm2
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm3
>> -; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm3, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm3, %xmm2, %xmm2
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm3, %xmm2
>>  ; AVX1-NEXT:    vpcmpgtq %xmm1, %xmm0, %xmm3
>> -; AVX1-NEXT:    vblendvpd %xmm3, %xmm0, %xmm1, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm3, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: test169:
>> @@ -8729,11 +8692,10 @@ define <4 x i64> @test170(<4 x i64> %a, <4 x i64>
>> %b) {
>>  ; AVX1:       # %bb.0: # %entry
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm2
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm3
>> -; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm3, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm3, %xmm2, %xmm2
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm3, %xmm2
>>  ; AVX1-NEXT:    vpcmpgtq %xmm1, %xmm0, %xmm3
>> -; AVX1-NEXT:    vblendvpd %xmm3, %xmm0, %xmm1, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm3, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: test170:
>> @@ -8812,11 +8774,10 @@ define <4 x i64> @test171(<4 x i64> %a, <4 x i64>
>> %b) {
>>  ; AVX1:       # %bb.0: # %entry
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm2
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm3
>> -; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm3, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm2, %xmm3, %xmm2
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm3, %xmm2
>>  ; AVX1-NEXT:    vpcmpgtq %xmm0, %xmm1, %xmm3
>> -; AVX1-NEXT:    vblendvpd %xmm3, %xmm0, %xmm1, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm3, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: test171:
>> @@ -8895,11 +8856,10 @@ define <4 x i64> @test172(<4 x i64> %a, <4 x i64>
>> %b) {
>>  ; AVX1:       # %bb.0: # %entry
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm2
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm3
>> -; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm3, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm2, %xmm3, %xmm2
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm3, %xmm2
>>  ; AVX1-NEXT:    vpcmpgtq %xmm0, %xmm1, %xmm3
>> -; AVX1-NEXT:    vblendvpd %xmm3, %xmm0, %xmm1, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm3, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: test172:
>> @@ -8984,16 +8944,15 @@ define <4 x i64> @test173(<4 x i64> %a, <4 x i64>
>> %b) {
>>  ; AVX1:       # %bb.0: # %entry
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm2
>>  ; AVX1-NEXT:    vmovdqa {{.*#+}} xmm3 =
>> [9223372036854775808,9223372036854775808]
>> -; AVX1-NEXT:    vpxor %xmm3, %xmm2, %xmm4
>> -; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm5
>> -; AVX1-NEXT:    vpxor %xmm3, %xmm5, %xmm6
>> -; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm6, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm5, %xmm2, %xmm2
>> +; AVX1-NEXT:    vpxor %xmm3, %xmm2, %xmm2
>> +; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm4
>> +; AVX1-NEXT:    vpxor %xmm3, %xmm4, %xmm4
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm2
>>  ; AVX1-NEXT:    vpxor %xmm3, %xmm1, %xmm4
>>  ; AVX1-NEXT:    vpxor %xmm3, %xmm0, %xmm3
>>  ; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm3, %xmm3
>> -; AVX1-NEXT:    vblendvpd %xmm3, %xmm0, %xmm1, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm3, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: test173:
>> @@ -9081,16 +9040,15 @@ define <4 x i64> @test174(<4 x i64> %a, <4 x i64>
>> %b) {
>>  ; AVX1:       # %bb.0: # %entry
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm2
>>  ; AVX1-NEXT:    vmovdqa {{.*#+}} xmm3 =
>> [9223372036854775808,9223372036854775808]
>> -; AVX1-NEXT:    vpxor %xmm3, %xmm2, %xmm4
>> -; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm5
>> -; AVX1-NEXT:    vpxor %xmm3, %xmm5, %xmm6
>> -; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm6, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm5, %xmm2, %xmm2
>> +; AVX1-NEXT:    vpxor %xmm3, %xmm2, %xmm2
>> +; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm4
>> +; AVX1-NEXT:    vpxor %xmm3, %xmm4, %xmm4
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm2
>>  ; AVX1-NEXT:    vpxor %xmm3, %xmm1, %xmm4
>>  ; AVX1-NEXT:    vpxor %xmm3, %xmm0, %xmm3
>>  ; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm3, %xmm3
>> -; AVX1-NEXT:    vblendvpd %xmm3, %xmm0, %xmm1, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm3, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: test174:
>> @@ -9179,16 +9137,15 @@ define <4 x i64> @test175(<4 x i64> %a, <4 x i64>
>> %b) {
>>  ; AVX1:       # %bb.0: # %entry
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm2
>>  ; AVX1-NEXT:    vmovdqa {{.*#+}} xmm3 =
>> [9223372036854775808,9223372036854775808]
>> -; AVX1-NEXT:    vpxor %xmm3, %xmm2, %xmm4
>> -; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm5
>> -; AVX1-NEXT:    vpxor %xmm3, %xmm5, %xmm6
>> -; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm6, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm2, %xmm5, %xmm2
>> +; AVX1-NEXT:    vpxor %xmm3, %xmm2, %xmm2
>> +; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm4
>> +; AVX1-NEXT:    vpxor %xmm3, %xmm4, %xmm4
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm2
>>  ; AVX1-NEXT:    vpxor %xmm3, %xmm0, %xmm4
>>  ; AVX1-NEXT:    vpxor %xmm3, %xmm1, %xmm3
>>  ; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm3, %xmm3
>> -; AVX1-NEXT:    vblendvpd %xmm3, %xmm0, %xmm1, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm3, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: test175:
>> @@ -9277,16 +9234,15 @@ define <4 x i64> @test176(<4 x i64> %a, <4 x i64>
>> %b) {
>>  ; AVX1:       # %bb.0: # %entry
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm2
>>  ; AVX1-NEXT:    vmovdqa {{.*#+}} xmm3 =
>> [9223372036854775808,9223372036854775808]
>> -; AVX1-NEXT:    vpxor %xmm3, %xmm2, %xmm4
>> -; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm5
>> -; AVX1-NEXT:    vpxor %xmm3, %xmm5, %xmm6
>> -; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm6, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm2, %xmm5, %xmm2
>> +; AVX1-NEXT:    vpxor %xmm3, %xmm2, %xmm2
>> +; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm4
>> +; AVX1-NEXT:    vpxor %xmm3, %xmm4, %xmm4
>> +; AVX1-NEXT:    vpcmpgtq %xmm2, %xmm4, %xmm2
>>  ; AVX1-NEXT:    vpxor %xmm3, %xmm0, %xmm4
>>  ; AVX1-NEXT:    vpxor %xmm3, %xmm1, %xmm3
>>  ; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm3, %xmm3
>> -; AVX1-NEXT:    vblendvpd %xmm3, %xmm0, %xmm1, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm0, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm3, %ymm2
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm0, %ymm1, %ymm0
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: test176:
>> @@ -10414,16 +10370,14 @@ define <8 x i64> @concat_smin_smax(<4 x i64>
>> %a0, <4 x i64> %a1) {
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm0, %xmm3
>>  ; AVX1-NEXT:    vextractf128 $1, %ymm1, %xmm4
>>  ; AVX1-NEXT:    vpcmpgtq %xmm3, %xmm4, %xmm2
>> -; AVX1-NEXT:    vblendvpd %xmm2, %xmm3, %xmm4, %xmm2
>>  ; AVX1-NEXT:    vpcmpgtq %xmm0, %xmm1, %xmm5
>> -; AVX1-NEXT:    vblendvpd %xmm5, %xmm0, %xmm1, %xmm5
>>  ; AVX1-NEXT:    vinsertf128 $1, %xmm2, %ymm5, %ymm2
>> -; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm3, %xmm5
>> -; AVX1-NEXT:    vblendvpd %xmm5, %xmm3, %xmm4, %xmm3
>> +; AVX1-NEXT:    vblendvpd %ymm2, %ymm0, %ymm1, %ymm2
>> +; AVX1-NEXT:    vpcmpgtq %xmm4, %xmm3, %xmm3
>>  ; AVX1-NEXT:    vpcmpgtq %xmm1, %xmm0, %xmm4
>> -; AVX1-NEXT:    vblendvpd %xmm4, %xmm0, %xmm1, %xmm0
>> -; AVX1-NEXT:    vinsertf128 $1, %xmm3, %ymm0, %ymm1
>> -; AVX1-NEXT:    vmovaps %ymm2, %ymm0
>> +; AVX1-NEXT:    vinsertf128 $1, %xmm3, %ymm4, %ymm3
>> +; AVX1-NEXT:    vblendvpd %ymm3, %ymm0, %ymm1, %ymm1
>> +; AVX1-NEXT:    vmovapd %ymm2, %ymm0
>>  ; AVX1-NEXT:    retq
>>  ;
>>  ; AVX2-LABEL: concat_smin_smax:
>>
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220624/e2cef6db/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: reduced2.ll
Type: application/octet-stream
Size: 2334 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220624/e2cef6db/attachment-0001.obj>


More information about the llvm-commits mailing list