[llvm] r353610 - [X86][SSE] Generalize X86ISD::BLENDI support to more value types
Sam McCall via llvm-commits
llvm-commits at lists.llvm.org
Mon Feb 11 05:38:04 PST 2019
Oops, I was building with assertions off.
Have managed to reproduce now:
1) build clang with -DCMAKE_BUILD_TYPE=release
-DCMAKE_CXX_FLAGS_RELEASE="-O3 -DNDEBUG -msse4.2"
-DCMAKE_C_FLAGS_RELEASE="-O3 -DNDEBUG -msse4.2" -DLLVM_ENABLE_ASSERTIONS=Off
2) use that to build another copy of clang with the same options
3) stage2/bin/clang -c foo.c (where foo.c is the reproducer above)
Going to revert now.
On Mon, Feb 11, 2019 at 1:07 PM Sam McCall <sammccall at google.com> wrote:
> Have also seen the problems David mentions. The nature of the patch and
> the error makes me think a miscompile is likely.
>
> Unfortunately so far I have only managed to reproduce this in our internal
> CI, not with a plain release build from upstream. This is probably a "hold
> it just right to reproduce" problem. I'll try to find the right set of
> optimization flags to trigger it.
>
> My minimal reproducer (just "clang foo.c" will reproduce, add -Werror for
> a failure exit):
> ---
> int format(const char *, ...) __attribute__((__format__(__printf__, 1,
> 2)));
> #define PASTE_AND_FORMAT(a, b) format(#a #b)
> #define MACRO2(x) PASTE_AND_FORMAT(a, x)
> void run() { MACRO2(a + b + c); }
> ---
> /usr/local/google/home/sammccall/elfcore.c:4:14: warning: format string
> contains '\0' within the string body [-Wformat]
> void run() { MACRO2(a + b + c); }
> ^~~~~~~~~~~~~~~~~
> /usr/local/google/home/sammccall/elfcore.c:3:19: note: expanded from macro
> 'MACRO2'
> #define MACRO2(x) PASTE_AND_FORMAT(a, x)
> ^~~~~~~~~~~~~~~~~~~~~~
> /usr/local/google/home/sammccall/elfcore.c:2:42: note: expanded from macro
> 'PASTE_AND_FORMAT'
> #define PASTE_AND_FORMAT(a, b) format(#a #b)
> ~~~^~
> <scratch space>:3:8: note: expanded from here
> "a P b <U+0000> c"
> ^
> 1 warning generated.
>
>
>
>
> On Sun, Feb 10, 2019 at 7:27 AM David Jones via llvm-commits <
> llvm-commits at lists.llvm.org> wrote:
>
>> FYI, I'm seeing some stage2 miscompiles with this change (i.e., stage2
>> cannot recompile itself).
>>
>> Being the weekend, I haven't had time to synthesize a reproducer, but the
>> error I first saw looks like this:
>>
>> <redacted loc>: error: format string contains '\0' within the string body
>> [-Werror,-Wformat]
>> CHECK_EQ(*xxx_yyyy, sizeof(AaaB(Cddd)) + eee_ffff * sizeof(gggg));
>> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> <redacted loc>: note: expanded from macro 'CHECK_EQ'
>> { CHECK_OP(val1, ==, val2); }
>> ^~~~~~~~~~~~~~~~~~~~~~~~
>> <redacted loc>: note: expanded from macro 'CHECK_OP'
>> DebugLog(g_debug_fd, "%s:%d Expected " #val1 " " #op " " #val2,
>> __FILE__, \
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~
>> <scratch space>:70:32: note: expanded from here
>> "sizeof(AaaB_Cddd) { eee_ffff <U+0000> sizeof(gggg)"
>> ^
>>
>> Since it's not clear: there are several layers of tokens being pasted
>> here, but the final pasted string (in scratch space) has flipped some bits:
>> + 0b0010'1011 --> { 0b0111'1011
>> * 0b0010'1010 --> \0
>>
>>
>> On Sat, Feb 9, 2019 at 5:13 AM Simon Pilgrim via llvm-commits <
>> llvm-commits at lists.llvm.org> wrote:
>>
>>> Author: rksimon
>>> Date: Sat Feb 9 05:13:59 2019
>>> New Revision: 353610
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=353610&view=rev
>>> Log:
>>> [X86][SSE] Generalize X86ISD::BLENDI support to more value types
>>>
>>> D42042 introduced the ability for the ExecutionDomainFixPass to more
>>> easily change between BLENDPD/BLENDPS/PBLENDW as the domains required.
>>>
>>> With this ability, we can avoid most bitcasts/scaling in the DAG that
>>> was occurring with X86ISD::BLENDI lowering/combining, blend with the
>>> vXi32/vXi64 vectors directly and use isel patterns to lower to the float
>>> vector equivalent vectors.
>>>
>>> This helps the shuffle combining and SimplifyDemandedVectorElts be more
>>> aggressive as we lose track of fewer UNDEF elements than when we go up/down
>>> through bitcasts.
>>>
>>> I've introduced a basic blend(bitcast(x),bitcast(y)) ->
>>> bitcast(blend(x,y)) fold, there are more generalizations I can do there
>>> (e.g. widening/scaling and handling the tricky v16i16 repeated mask case).
>>>
>>> The vector-reduce-smin/smax regressions will be fixed in a future
>>> improvement to SimplifyDemandedBits to peek through bitcasts and support
>>> X86ISD::BLENDV.
>>>
>>> Differential Revision: https://reviews.llvm.org/D57888
>>>
>>> Modified:
>>> llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>>> llvm/trunk/lib/Target/X86/X86InstrSSE.td
>>> llvm/trunk/test/CodeGen/X86/avx512-shuffles/partial_permute.ll
>>> llvm/trunk/test/CodeGen/X86/combine-sdiv.ll
>>> llvm/trunk/test/CodeGen/X86/insertelement-ones.ll
>>> llvm/trunk/test/CodeGen/X86/known-signbits-vector.ll
>>> llvm/trunk/test/CodeGen/X86/masked_load.ll
>>> llvm/trunk/test/CodeGen/X86/masked_store.ll
>>> llvm/trunk/test/CodeGen/X86/oddshuffles.ll
>>> llvm/trunk/test/CodeGen/X86/packss.ll
>>> llvm/trunk/test/CodeGen/X86/pr34592.ll
>>> llvm/trunk/test/CodeGen/X86/prefer-avx256-mask-shuffle.ll
>>> llvm/trunk/test/CodeGen/X86/sse2.ll
>>> llvm/trunk/test/CodeGen/X86/vector-reduce-smax.ll
>>> llvm/trunk/test/CodeGen/X86/vector-reduce-smin.ll
>>> llvm/trunk/test/CodeGen/X86/vector-shift-ashr-256.ll
>>> llvm/trunk/test/CodeGen/X86/vector-shuffle-128-v8.ll
>>> llvm/trunk/test/CodeGen/X86/vector-shuffle-256-v16.ll
>>> llvm/trunk/test/CodeGen/X86/vector-shuffle-256-v32.ll
>>>
>>> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=353610&r1=353609&r2=353610&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
>>> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Sat Feb 9 05:13:59
>>> 2019
>>> @@ -10408,45 +10408,24 @@ static SDValue lowerShuffleAsBlend(const
>>> V2 = getZeroVector(VT, Subtarget, DAG, DL);
>>>
>>> switch (VT.SimpleTy) {
>>> - case MVT::v2f64:
>>> - case MVT::v4f32:
>>> - case MVT::v4f64:
>>> - case MVT::v8f32:
>>> - return DAG.getNode(X86ISD::BLENDI, DL, VT, V1, V2,
>>> - DAG.getConstant(BlendMask, DL, MVT::i8));
>>> case MVT::v4i64:
>>> case MVT::v8i32:
>>> assert(Subtarget.hasAVX2() && "256-bit integer blends require
>>> AVX2!");
>>> LLVM_FALLTHROUGH;
>>> + case MVT::v4f64:
>>> + case MVT::v8f32:
>>> + assert(Subtarget.hasAVX() && "256-bit float blends require AVX!");
>>> + LLVM_FALLTHROUGH;
>>> + case MVT::v2f64:
>>> case MVT::v2i64:
>>> + case MVT::v4f32:
>>> case MVT::v4i32:
>>> - // If we have AVX2 it is faster to use VPBLENDD when the shuffle
>>> fits into
>>> - // that instruction.
>>> - if (Subtarget.hasAVX2()) {
>>> - // Scale the blend by the number of 32-bit dwords per element.
>>> - int Scale = VT.getScalarSizeInBits() / 32;
>>> - BlendMask = scaleVectorShuffleBlendMask(BlendMask, Mask.size(),
>>> Scale);
>>> - MVT BlendVT = VT.getSizeInBits() > 128 ? MVT::v8i32 : MVT::v4i32;
>>> - V1 = DAG.getBitcast(BlendVT, V1);
>>> - V2 = DAG.getBitcast(BlendVT, V2);
>>> - return DAG.getBitcast(
>>> - VT, DAG.getNode(X86ISD::BLENDI, DL, BlendVT, V1, V2,
>>> - DAG.getConstant(BlendMask, DL, MVT::i8)));
>>> - }
>>> - LLVM_FALLTHROUGH;
>>> - case MVT::v8i16: {
>>> - // For integer shuffles we need to expand the mask and cast the
>>> inputs to
>>> - // v8i16s prior to blending.
>>> - int Scale = 8 / VT.getVectorNumElements();
>>> - BlendMask = scaleVectorShuffleBlendMask(BlendMask, Mask.size(),
>>> Scale);
>>> - V1 = DAG.getBitcast(MVT::v8i16, V1);
>>> - V2 = DAG.getBitcast(MVT::v8i16, V2);
>>> - return DAG.getBitcast(VT,
>>> - DAG.getNode(X86ISD::BLENDI, DL, MVT::v8i16,
>>> V1, V2,
>>> - DAG.getConstant(BlendMask, DL,
>>> MVT::i8)));
>>> - }
>>> + case MVT::v8i16:
>>> + assert(Subtarget.hasSSE41() && "128-bit blends require SSE41!");
>>> + return DAG.getNode(X86ISD::BLENDI, DL, VT, V1, V2,
>>> + DAG.getConstant(BlendMask, DL, MVT::i8));
>>> case MVT::v16i16: {
>>> - assert(Subtarget.hasAVX2() && "256-bit integer blends require
>>> AVX2!");
>>> + assert(Subtarget.hasAVX2() && "v16i16 blends require AVX2!");
>>> SmallVector<int, 8> RepeatedMask;
>>> if (is128BitLaneRepeatedShuffleMask(MVT::v16i16, Mask,
>>> RepeatedMask)) {
>>> // We can lower these with PBLENDW which is mirrored across
>>> 128-bit lanes.
>>> @@ -10474,10 +10453,11 @@ static SDValue lowerShuffleAsBlend(const
>>> }
>>> LLVM_FALLTHROUGH;
>>> }
>>> - case MVT::v16i8:
>>> - case MVT::v32i8: {
>>> - assert((VT.is128BitVector() || Subtarget.hasAVX2()) &&
>>> - "256-bit byte-blends require AVX2 support!");
>>> + case MVT::v32i8:
>>> + assert(Subtarget.hasAVX2() && "256-bit byte-blends require AVX2!");
>>> + LLVM_FALLTHROUGH;
>>> + case MVT::v16i8: {
>>> + assert(Subtarget.hasSSE41() && "128-bit byte-blends require
>>> SSE41!");
>>>
>>> // Attempt to lower to a bitmask if we can. VPAND is faster than
>>> VPBLENDVB.
>>> if (SDValue Masked = lowerShuffleAsBitMask(DL, VT, V1, V2, Mask,
>>> Zeroable,
>>> @@ -30973,34 +30953,11 @@ static bool matchBinaryPermuteShuffle(
>>> return true;
>>> }
>>> } else {
>>> - // Determine a type compatible with X86ISD::BLENDI.
>>> - ShuffleVT = MaskVT;
>>> - if (Subtarget.hasAVX2()) {
>>> - if (ShuffleVT == MVT::v4i64)
>>> - ShuffleVT = MVT::v8i32;
>>> - else if (ShuffleVT == MVT::v2i64)
>>> - ShuffleVT = MVT::v4i32;
>>> - } else {
>>> - if (ShuffleVT == MVT::v2i64 || ShuffleVT == MVT::v4i32)
>>> - ShuffleVT = MVT::v8i16;
>>> - else if (ShuffleVT == MVT::v4i64)
>>> - ShuffleVT = MVT::v4f64;
>>> - else if (ShuffleVT == MVT::v8i32)
>>> - ShuffleVT = MVT::v8f32;
>>> - }
>>> -
>>> - if (!ShuffleVT.isFloatingPoint()) {
>>> - int Scale = EltSizeInBits / ShuffleVT.getScalarSizeInBits();
>>> - BlendMask =
>>> - scaleVectorShuffleBlendMask(BlendMask, NumMaskElts,
>>> Scale);
>>> - ShuffleVT = MVT::getIntegerVT(EltSizeInBits / Scale);
>>> - ShuffleVT = MVT::getVectorVT(ShuffleVT, NumMaskElts * Scale);
>>> - }
>>> -
>>> V1 = ForceV1Zero ? getZeroVector(MaskVT, Subtarget, DAG, DL) :
>>> V1;
>>> V2 = ForceV2Zero ? getZeroVector(MaskVT, Subtarget, DAG, DL) :
>>> V2;
>>> PermuteImm = (unsigned)BlendMask;
>>> Shuffle = X86ISD::BLENDI;
>>> + ShuffleVT = MaskVT;
>>> return true;
>>> }
>>> }
>>> @@ -32165,6 +32122,29 @@ static SDValue combineTargetShuffle(SDVa
>>>
>>> return SDValue();
>>> }
>>> + case X86ISD::BLENDI: {
>>> + SDValue N0 = N.getOperand(0);
>>> + SDValue N1 = N.getOperand(1);
>>> +
>>> + // blend(bitcast(x),bitcast(y)) -> bitcast(blend(x,y)) to narrower
>>> types.
>>> + // TODO: Handle MVT::v16i16 repeated blend mask.
>>> + if (N0.getOpcode() == ISD::BITCAST && N1.getOpcode() ==
>>> ISD::BITCAST &&
>>> + N0.getOperand(0).getValueType() ==
>>> N1.getOperand(0).getValueType()) {
>>> + MVT SrcVT = N0.getOperand(0).getSimpleValueType();
>>> + if ((VT.getScalarSizeInBits() % SrcVT.getScalarSizeInBits()) == 0
>>> &&
>>> + SrcVT.getScalarSizeInBits() >= 32) {
>>> + unsigned Mask = N.getConstantOperandVal(2);
>>> + unsigned Size = VT.getVectorNumElements();
>>> + unsigned Scale = VT.getScalarSizeInBits() /
>>> SrcVT.getScalarSizeInBits();
>>> + unsigned ScaleMask = scaleVectorShuffleBlendMask(Mask, Size,
>>> Scale);
>>> + return DAG.getBitcast(
>>> + VT, DAG.getNode(X86ISD::BLENDI, DL, SrcVT, N0.getOperand(0),
>>> + N1.getOperand(0),
>>> + DAG.getConstant(ScaleMask, DL, MVT::i8)));
>>> + }
>>> + }
>>> + return SDValue();
>>> + }
>>> case X86ISD::PSHUFD:
>>> case X86ISD::PSHUFLW:
>>> case X86ISD::PSHUFHW:
>>>
>>> Modified: llvm/trunk/lib/Target/X86/X86InstrSSE.td
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86InstrSSE.td?rev=353610&r1=353609&r2=353610&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/lib/Target/X86/X86InstrSSE.td (original)
>>> +++ llvm/trunk/lib/Target/X86/X86InstrSSE.td Sat Feb 9 05:13:59 2019
>>> @@ -6513,6 +6513,22 @@ let Predicates = [HasAVX2] in {
>>> VEX_4V, VEX_L, VEX_WIG;
>>> }
>>>
>>> +// Emulate vXi32/vXi64 blends with vXf32/vXf64.
>>> +// ExecutionDomainFixPass will cleanup domains later on.
>>> +let Predicates = [HasAVX] in {
>>> +def : Pat<(X86Blendi (v4i64 VR256:$src1), (v4i64 VR256:$src2), (iPTR
>>> imm:$src3)),
>>> + (VBLENDPDYrri VR256:$src1, VR256:$src2, imm:$src3)>;
>>> +def : Pat<(X86Blendi (v2i64 VR128:$src1), (v2i64 VR128:$src2), (iPTR
>>> imm:$src3)),
>>> + (VBLENDPDrri VR128:$src1, VR128:$src2, imm:$src3)>;
>>> +}
>>> +
>>> +let Predicates = [HasAVX1Only] in {
>>> +def : Pat<(X86Blendi (v8i32 VR256:$src1), (v8i32 VR256:$src2), (iPTR
>>> imm:$src3)),
>>> + (VBLENDPSYrri VR256:$src1, VR256:$src2, imm:$src3)>;
>>> +def : Pat<(X86Blendi (v4i32 VR128:$src1), (v4i32 VR128:$src2), (iPTR
>>> imm:$src3)),
>>> + (VBLENDPSrri VR128:$src1, VR128:$src2, imm:$src3)>;
>>> +}
>>> +
>>> defm BLENDPS : SS41I_blend_rmi<0x0C, "blendps", X86Blendi, v4f32,
>>> VR128, memop, f128mem, 1,
>>> SSEPackedSingle,
>>> SchedWriteFBlend.XMM, BlendCommuteImm4>;
>>> @@ -6523,6 +6539,13 @@ defm PBLENDW : SS41I_blend_rmi<0x0E, "pb
>>> VR128, memop, i128mem, 1, SSEPackedInt,
>>> SchedWriteBlend.XMM, BlendCommuteImm8>;
>>>
>>> +let Predicates = [UseSSE41] in {
>>> +def : Pat<(X86Blendi (v2i64 VR128:$src1), (v2i64 VR128:$src2), (iPTR
>>> imm:$src3)),
>>> + (BLENDPDrri VR128:$src1, VR128:$src2, imm:$src3)>;
>>> +def : Pat<(X86Blendi (v4i32 VR128:$src1), (v4i32 VR128:$src2), (iPTR
>>> imm:$src3)),
>>> + (BLENDPSrri VR128:$src1, VR128:$src2, imm:$src3)>;
>>> +}
>>> +
>>> // For insertion into the zero index (low half) of a 256-bit vector, it
>>> is
>>> // more efficient to generate a blend with immediate instead of an
>>> insert*128.
>>> let Predicates = [HasAVX] in {
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/avx512-shuffles/partial_permute.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx512-shuffles/partial_permute.ll?rev=353610&r1=353609&r2=353610&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/avx512-shuffles/partial_permute.ll
>>> (original)
>>> +++ llvm/trunk/test/CodeGen/X86/avx512-shuffles/partial_permute.ll Sat
>>> Feb 9 05:13:59 2019
>>> @@ -1912,8 +1912,8 @@ define <2 x i64> @test_masked_z_4xi64_to
>>> define <2 x i64> @test_masked_4xi64_to_2xi64_perm_mem_mask1(<4 x i64>*
>>> %vp, <2 x i64> %vec2, <2 x i64> %mask) {
>>> ; CHECK-LABEL: test_masked_4xi64_to_2xi64_perm_mem_mask1:
>>> ; CHECK: # %bb.0:
>>> -; CHECK-NEXT: vmovdqa 16(%rdi), %xmm2
>>> -; CHECK-NEXT: vpblendd {{.*#+}} xmm2 = xmm2[0,1],mem[2,3]
>>> +; CHECK-NEXT: vmovdqa (%rdi), %xmm2
>>> +; CHECK-NEXT: vpblendd {{.*#+}} xmm2 = mem[0,1],xmm2[2,3]
>>> ; CHECK-NEXT: vptestnmq %xmm1, %xmm1, %k1
>>> ; CHECK-NEXT: vmovdqa64 %xmm2, %xmm0 {%k1}
>>> ; CHECK-NEXT: retq
>>> @@ -1927,8 +1927,8 @@ define <2 x i64> @test_masked_4xi64_to_2
>>> define <2 x i64> @test_masked_z_4xi64_to_2xi64_perm_mem_mask1(<4 x
>>> i64>* %vp, <2 x i64> %mask) {
>>> ; CHECK-LABEL: test_masked_z_4xi64_to_2xi64_perm_mem_mask1:
>>> ; CHECK: # %bb.0:
>>> -; CHECK-NEXT: vmovdqa 16(%rdi), %xmm1
>>> -; CHECK-NEXT: vpblendd {{.*#+}} xmm1 = xmm1[0,1],mem[2,3]
>>> +; CHECK-NEXT: vmovdqa (%rdi), %xmm1
>>> +; CHECK-NEXT: vpblendd {{.*#+}} xmm1 = mem[0,1],xmm1[2,3]
>>> ; CHECK-NEXT: vptestnmq %xmm0, %xmm0, %k1
>>> ; CHECK-NEXT: vmovdqa64 %xmm1, %xmm0 {%k1} {z}
>>> ; CHECK-NEXT: retq
>>> @@ -2553,9 +2553,8 @@ define <4 x i64> @test_masked_z_8xi64_to
>>> define <2 x i64> @test_8xi64_to_2xi64_perm_mem_mask0(<8 x i64>* %vp) {
>>> ; CHECK-LABEL: test_8xi64_to_2xi64_perm_mem_mask0:
>>> ; CHECK: # %bb.0:
>>> -; CHECK-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero
>>> -; CHECK-NEXT: vmovaps 32(%rdi), %xmm1
>>> -; CHECK-NEXT: vmovlhps {{.*#+}} xmm0 = xmm1[0],xmm0[0]
>>> +; CHECK-NEXT: vmovaps (%rdi), %xmm0
>>> +; CHECK-NEXT: vblendps {{.*#+}} xmm0 = mem[0,1],xmm0[2,3]
>>> ; CHECK-NEXT: retq
>>> %vec = load <8 x i64>, <8 x i64>* %vp
>>> %res = shufflevector <8 x i64> %vec, <8 x i64> undef, <2 x i32> <i32
>>> 4, i32 1>
>>> @@ -2564,10 +2563,10 @@ define <2 x i64> @test_8xi64_to_2xi64_pe
>>> define <2 x i64> @test_masked_8xi64_to_2xi64_perm_mem_mask0(<8 x i64>*
>>> %vp, <2 x i64> %vec2, <2 x i64> %mask) {
>>> ; CHECK-LABEL: test_masked_8xi64_to_2xi64_perm_mem_mask0:
>>> ; CHECK: # %bb.0:
>>> -; CHECK-NEXT: vmovq {{.*#+}} xmm2 = mem[0],zero
>>> -; CHECK-NEXT: vmovdqa 32(%rdi), %xmm3
>>> +; CHECK-NEXT: vmovdqa (%rdi), %xmm2
>>> +; CHECK-NEXT: vpblendd {{.*#+}} xmm2 = mem[0,1],xmm2[2,3]
>>> ; CHECK-NEXT: vptestnmq %xmm1, %xmm1, %k1
>>> -; CHECK-NEXT: vpunpcklqdq {{.*#+}} xmm0 {%k1} = xmm3[0],xmm2[0]
>>> +; CHECK-NEXT: vmovdqa64 %xmm2, %xmm0 {%k1}
>>> ; CHECK-NEXT: retq
>>> %vec = load <8 x i64>, <8 x i64>* %vp
>>> %shuf = shufflevector <8 x i64> %vec, <8 x i64> undef, <2 x i32> <i32
>>> 4, i32 1>
>>> @@ -2579,10 +2578,10 @@ define <2 x i64> @test_masked_8xi64_to_2
>>> define <2 x i64> @test_masked_z_8xi64_to_2xi64_perm_mem_mask0(<8 x
>>> i64>* %vp, <2 x i64> %mask) {
>>> ; CHECK-LABEL: test_masked_z_8xi64_to_2xi64_perm_mem_mask0:
>>> ; CHECK: # %bb.0:
>>> -; CHECK-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero
>>> -; CHECK-NEXT: vmovdqa 32(%rdi), %xmm2
>>> +; CHECK-NEXT: vmovdqa (%rdi), %xmm1
>>> +; CHECK-NEXT: vpblendd {{.*#+}} xmm1 = mem[0,1],xmm1[2,3]
>>> ; CHECK-NEXT: vptestnmq %xmm0, %xmm0, %k1
>>> -; CHECK-NEXT: vpunpcklqdq {{.*#+}} xmm0 {%k1} {z} = xmm2[0],xmm1[0]
>>> +; CHECK-NEXT: vmovdqa64 %xmm1, %xmm0 {%k1} {z}
>>> ; CHECK-NEXT: retq
>>> %vec = load <8 x i64>, <8 x i64>* %vp
>>> %shuf = shufflevector <8 x i64> %vec, <8 x i64> undef, <2 x i32> <i32
>>> 4, i32 1>
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/combine-sdiv.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/combine-sdiv.ll?rev=353610&r1=353609&r2=353610&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/combine-sdiv.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/combine-sdiv.ll Sat Feb 9 05:13:59 2019
>>> @@ -1701,8 +1701,7 @@ define <4 x i64> @combine_vec_sdiv_by_po
>>> ; AVX1-NEXT: vpcmpgtq %xmm0, %xmm2, %xmm2
>>> ; AVX1-NEXT: vpsrlq $62, %xmm2, %xmm2
>>> ; AVX1-NEXT: vpaddq %xmm2, %xmm0, %xmm2
>>> -; AVX1-NEXT: vpsrlq $2, %xmm2, %xmm3
>>> -; AVX1-NEXT: vpblendw {{.*#+}} xmm2 = xmm2[0,1,2,3],xmm3[4,5,6,7]
>>> +; AVX1-NEXT: vpsrlq $2, %xmm2, %xmm2
>>> ; AVX1-NEXT: vmovdqa {{.*#+}} xmm3 =
>>> [9223372036854775808,2305843009213693952]
>>> ; AVX1-NEXT: vpxor %xmm3, %xmm2, %xmm2
>>> ; AVX1-NEXT: vpsubq %xmm3, %xmm2, %xmm2
>>> @@ -1890,8 +1889,7 @@ define <8 x i64> @combine_vec_sdiv_by_po
>>> ; AVX1-NEXT: vpcmpgtq %xmm0, %xmm2, %xmm5
>>> ; AVX1-NEXT: vpsrlq $62, %xmm5, %xmm5
>>> ; AVX1-NEXT: vpaddq %xmm5, %xmm0, %xmm5
>>> -; AVX1-NEXT: vpsrlq $2, %xmm5, %xmm6
>>> -; AVX1-NEXT: vpblendw {{.*#+}} xmm5 = xmm5[0,1,2,3],xmm6[4,5,6,7]
>>> +; AVX1-NEXT: vpsrlq $2, %xmm5, %xmm5
>>> ; AVX1-NEXT: vmovdqa {{.*#+}} xmm6 =
>>> [9223372036854775808,2305843009213693952]
>>> ; AVX1-NEXT: vpxor %xmm6, %xmm5, %xmm5
>>> ; AVX1-NEXT: vpsubq %xmm6, %xmm5, %xmm5
>>> @@ -1911,8 +1909,7 @@ define <8 x i64> @combine_vec_sdiv_by_po
>>> ; AVX1-NEXT: vpcmpgtq %xmm1, %xmm2, %xmm2
>>> ; AVX1-NEXT: vpsrlq $62, %xmm2, %xmm2
>>> ; AVX1-NEXT: vpaddq %xmm2, %xmm1, %xmm2
>>> -; AVX1-NEXT: vpsrlq $2, %xmm2, %xmm4
>>> -; AVX1-NEXT: vpblendw {{.*#+}} xmm2 = xmm2[0,1,2,3],xmm4[4,5,6,7]
>>> +; AVX1-NEXT: vpsrlq $2, %xmm2, %xmm2
>>> ; AVX1-NEXT: vpxor %xmm6, %xmm2, %xmm2
>>> ; AVX1-NEXT: vpsubq %xmm6, %xmm2, %xmm2
>>> ; AVX1-NEXT: vinsertf128 $1, %xmm3, %ymm2, %ymm2
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/insertelement-ones.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/insertelement-ones.ll?rev=353610&r1=353609&r2=353610&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/insertelement-ones.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/insertelement-ones.ll Sat Feb 9
>>> 05:13:59 2019
>>> @@ -291,10 +291,7 @@ define <16 x i16> @insert_v16i16_x12345x
>>> ; AVX2-LABEL: insert_v16i16_x12345x789ABCDEx:
>>> ; AVX2: # %bb.0:
>>> ; AVX2-NEXT: vpcmpeqd %ymm1, %ymm1, %ymm1
>>> -; AVX2-NEXT: vpblendw {{.*#+}} ymm2 =
>>> ymm1[0],ymm0[1,2,3,4,5,6,7],ymm1[8],ymm0[9,10,11,12,13,14,15]
>>> -; AVX2-NEXT: vpblendd {{.*#+}} ymm2 = ymm2[0,1,2,3],ymm0[4,5,6,7]
>>> -; AVX2-NEXT: vpblendw {{.*#+}} ymm2 =
>>> ymm2[0,1,2,3,4,5],ymm1[6],ymm2[7,8,9,10,11,12,13],ymm1[14],ymm2[15]
>>> -; AVX2-NEXT: vpblendd {{.*#+}} ymm0 = ymm2[0,1,2,3],ymm0[4,5,6,7]
>>> +; AVX2-NEXT: vpblendw {{.*#+}} ymm2 =
>>> ymm1[0],ymm0[1,2,3,4,5],ymm1[6],ymm0[7],ymm1[8],ymm0[9,10,11,12,13],ymm1[14],ymm0[15]
>>> ; AVX2-NEXT: vpblendw {{.*#+}} ymm0 =
>>> ymm0[0,1,2,3,4,5,6],ymm1[7],ymm0[8,9,10,11,12,13,14],ymm1[15]
>>> ; AVX2-NEXT: vpblendd {{.*#+}} ymm0 = ymm2[0,1,2,3],ymm0[4,5,6,7]
>>> ; AVX2-NEXT: retq
>>> @@ -302,10 +299,7 @@ define <16 x i16> @insert_v16i16_x12345x
>>> ; AVX512-LABEL: insert_v16i16_x12345x789ABCDEx:
>>> ; AVX512: # %bb.0:
>>> ; AVX512-NEXT: vpcmpeqd %ymm1, %ymm1, %ymm1
>>> -; AVX512-NEXT: vpblendw {{.*#+}} ymm2 =
>>> ymm1[0],ymm0[1,2,3,4,5,6,7],ymm1[8],ymm0[9,10,11,12,13,14,15]
>>> -; AVX512-NEXT: vpblendd {{.*#+}} ymm2 = ymm2[0,1,2,3],ymm0[4,5,6,7]
>>> -; AVX512-NEXT: vpblendw {{.*#+}} ymm2 =
>>> ymm2[0,1,2,3,4,5],ymm1[6],ymm2[7,8,9,10,11,12,13],ymm1[14],ymm2[15]
>>> -; AVX512-NEXT: vpblendd {{.*#+}} ymm0 = ymm2[0,1,2,3],ymm0[4,5,6,7]
>>> +; AVX512-NEXT: vpblendw {{.*#+}} ymm2 =
>>> ymm1[0],ymm0[1,2,3,4,5],ymm1[6],ymm0[7],ymm1[8],ymm0[9,10,11,12,13],ymm1[14],ymm0[15]
>>> ; AVX512-NEXT: vpblendw {{.*#+}} ymm0 =
>>> ymm0[0,1,2,3,4,5,6],ymm1[7],ymm0[8,9,10,11,12,13,14],ymm1[15]
>>> ; AVX512-NEXT: vpblendd {{.*#+}} ymm0 = ymm2[0,1,2,3],ymm0[4,5,6,7]
>>> ; AVX512-NEXT: retq
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/known-signbits-vector.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/known-signbits-vector.ll?rev=353610&r1=353609&r2=353610&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/known-signbits-vector.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/known-signbits-vector.ll Sat Feb 9
>>> 05:13:59 2019
>>> @@ -89,7 +89,7 @@ define float @signbits_ashr_extract_sito
>>> ; X32: # %bb.0:
>>> ; X32-NEXT: pushl %eax
>>> ; X32-NEXT: vpsrlq $32, %xmm0, %xmm0
>>> -; X32-NEXT: vmovdqa {{.*#+}} xmm1 = [0,32768,0,0,1,0,0,0]
>>> +; X32-NEXT: vmovdqa {{.*#+}} xmm1 = [2147483648,0,1,0]
>>> ; X32-NEXT: vpxor %xmm1, %xmm0, %xmm0
>>> ; X32-NEXT: vpsubq %xmm1, %xmm0, %xmm0
>>> ; X32-NEXT: vcvtdq2ps %xmm0, %xmm0
>>> @@ -115,7 +115,7 @@ define float @signbits_ashr_shl_extract_
>>> ; X32: # %bb.0:
>>> ; X32-NEXT: pushl %eax
>>> ; X32-NEXT: vpsrlq $61, %xmm0, %xmm0
>>> -; X32-NEXT: vmovdqa {{.*#+}} xmm1 = [4,0,0,0,8,0,0,0]
>>> +; X32-NEXT: vmovdqa {{.*#+}} xmm1 = [4,0,8,0]
>>> ; X32-NEXT: vpxor %xmm1, %xmm0, %xmm0
>>> ; X32-NEXT: vpsubq %xmm1, %xmm0, %xmm0
>>> ; X32-NEXT: vpsllq $20, %xmm0, %xmm0
>>> @@ -231,7 +231,7 @@ define float @signbits_ashr_sext_sextinr
>>> ; X32: # %bb.0:
>>> ; X32-NEXT: pushl %eax
>>> ; X32-NEXT: vpsrlq $61, %xmm0, %xmm0
>>> -; X32-NEXT: vmovdqa {{.*#+}} xmm1 = [4,0,0,0,8,0,0,0]
>>> +; X32-NEXT: vmovdqa {{.*#+}} xmm1 = [4,0,8,0]
>>> ; X32-NEXT: vpxor %xmm1, %xmm0, %xmm0
>>> ; X32-NEXT: vpsubq %xmm1, %xmm0, %xmm0
>>> ; X32-NEXT: vmovd {{.*#+}} xmm1 = mem[0],zero,zero,zero
>>> @@ -272,7 +272,7 @@ define float @signbits_ashr_sextvecinreg
>>> ; X32-NEXT: vpsrlq $60, %xmm0, %xmm2
>>> ; X32-NEXT: vpsrlq $61, %xmm0, %xmm0
>>> ; X32-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm2[4,5,6,7]
>>> -; X32-NEXT: vmovdqa {{.*#+}} xmm2 = [4,0,0,0,8,0,0,0]
>>> +; X32-NEXT: vmovdqa {{.*#+}} xmm2 = [4,0,8,0]
>>> ; X32-NEXT: vpxor %xmm2, %xmm0, %xmm0
>>> ; X32-NEXT: vpsubq %xmm2, %xmm0, %xmm0
>>> ; X32-NEXT: vpmovsxdq %xmm1, %xmm1
>>> @@ -322,7 +322,7 @@ define <4 x float> @signbits_ashr_sext_s
>>> ; X32-NEXT: vpmovsxdq 8(%ebp), %xmm4
>>> ; X32-NEXT: vextractf128 $1, %ymm2, %xmm5
>>> ; X32-NEXT: vpsrlq $33, %xmm5, %xmm5
>>> -; X32-NEXT: vmovdqa {{.*#+}} xmm6 = [0,16384,0,0,1,0,0,0]
>>> +; X32-NEXT: vmovdqa {{.*#+}} xmm6 = [1073741824,0,1,0]
>>> ; X32-NEXT: vpxor %xmm6, %xmm5, %xmm5
>>> ; X32-NEXT: vpsubq %xmm6, %xmm5, %xmm5
>>> ; X32-NEXT: vpsrlq $33, %xmm2, %xmm2
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/masked_load.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/masked_load.ll?rev=353610&r1=353609&r2=353610&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/masked_load.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/masked_load.ll Sat Feb 9 05:13:59 2019
>>> @@ -1261,18 +1261,15 @@ define <2 x float> @load_v2f32_v2i32(<2
>>> ; SSE42-LABEL: load_v2f32_v2i32:
>>> ; SSE42: ## %bb.0:
>>> ; SSE42-NEXT: pxor %xmm2, %xmm2
>>> -; SSE42-NEXT: movdqa %xmm0, %xmm3
>>> -; SSE42-NEXT: pblendw {{.*#+}} xmm3 =
>>> xmm3[0,1],xmm2[2,3],xmm3[4,5],xmm2[6,7]
>>> -; SSE42-NEXT: pcmpeqq %xmm2, %xmm3
>>> -; SSE42-NEXT: pextrb $0, %xmm3, %eax
>>> +; SSE42-NEXT: pblendw {{.*#+}} xmm0 =
>>> xmm0[0,1],xmm2[2,3],xmm0[4,5],xmm2[6,7]
>>> +; SSE42-NEXT: pcmpeqq %xmm2, %xmm0
>>> +; SSE42-NEXT: pextrb $0, %xmm0, %eax
>>> ; SSE42-NEXT: testb $1, %al
>>> ; SSE42-NEXT: je LBB10_2
>>> ; SSE42-NEXT: ## %bb.1: ## %cond.load
>>> -; SSE42-NEXT: movd {{.*#+}} xmm3 = mem[0],zero,zero,zero
>>> -; SSE42-NEXT: pblendw {{.*#+}} xmm1 = xmm3[0,1],xmm1[2,3,4,5,6,7]
>>> +; SSE42-NEXT: movd {{.*#+}} xmm2 = mem[0],zero,zero,zero
>>> +; SSE42-NEXT: pblendw {{.*#+}} xmm1 = xmm2[0,1],xmm1[2,3,4,5,6,7]
>>> ; SSE42-NEXT: LBB10_2: ## %else
>>> -; SSE42-NEXT: pblendw {{.*#+}} xmm0 =
>>> xmm0[0,1],xmm2[2,3],xmm0[4,5],xmm2[6,7]
>>> -; SSE42-NEXT: pcmpeqq %xmm2, %xmm0
>>> ; SSE42-NEXT: pextrb $8, %xmm0, %eax
>>> ; SSE42-NEXT: testb $1, %al
>>> ; SSE42-NEXT: je LBB10_4
>>> @@ -1357,18 +1354,15 @@ define <2 x i32> @load_v2i32_v2i32(<2 x
>>> ; SSE42-LABEL: load_v2i32_v2i32:
>>> ; SSE42: ## %bb.0:
>>> ; SSE42-NEXT: pxor %xmm2, %xmm2
>>> -; SSE42-NEXT: movdqa %xmm0, %xmm3
>>> -; SSE42-NEXT: pblendw {{.*#+}} xmm3 =
>>> xmm3[0,1],xmm2[2,3],xmm3[4,5],xmm2[6,7]
>>> -; SSE42-NEXT: pcmpeqq %xmm2, %xmm3
>>> -; SSE42-NEXT: pextrb $0, %xmm3, %eax
>>> +; SSE42-NEXT: pblendw {{.*#+}} xmm0 =
>>> xmm0[0,1],xmm2[2,3],xmm0[4,5],xmm2[6,7]
>>> +; SSE42-NEXT: pcmpeqq %xmm2, %xmm0
>>> +; SSE42-NEXT: pextrb $0, %xmm0, %eax
>>> ; SSE42-NEXT: testb $1, %al
>>> ; SSE42-NEXT: je LBB11_2
>>> ; SSE42-NEXT: ## %bb.1: ## %cond.load
>>> ; SSE42-NEXT: movl (%rdi), %eax
>>> ; SSE42-NEXT: pinsrq $0, %rax, %xmm1
>>> ; SSE42-NEXT: LBB11_2: ## %else
>>> -; SSE42-NEXT: pblendw {{.*#+}} xmm0 =
>>> xmm0[0,1],xmm2[2,3],xmm0[4,5],xmm2[6,7]
>>> -; SSE42-NEXT: pcmpeqq %xmm2, %xmm0
>>> ; SSE42-NEXT: pextrb $8, %xmm0, %eax
>>> ; SSE42-NEXT: testb $1, %al
>>> ; SSE42-NEXT: je LBB11_4
>>> @@ -1459,18 +1453,16 @@ define <2 x float> @load_undef_v2f32_v2i
>>> ; SSE42-LABEL: load_undef_v2f32_v2i32:
>>> ; SSE42: ## %bb.0:
>>> ; SSE42-NEXT: movdqa %xmm0, %xmm1
>>> -; SSE42-NEXT: pxor %xmm2, %xmm2
>>> -; SSE42-NEXT: pblendw {{.*#+}} xmm0 =
>>> xmm0[0,1],xmm2[2,3],xmm0[4,5],xmm2[6,7]
>>> -; SSE42-NEXT: pcmpeqq %xmm2, %xmm0
>>> -; SSE42-NEXT: pextrb $0, %xmm0, %eax
>>> +; SSE42-NEXT: pxor %xmm0, %xmm0
>>> +; SSE42-NEXT: pblendw {{.*#+}} xmm1 =
>>> xmm1[0,1],xmm0[2,3],xmm1[4,5],xmm0[6,7]
>>> +; SSE42-NEXT: pcmpeqq %xmm0, %xmm1
>>> +; SSE42-NEXT: pextrb $0, %xmm1, %eax
>>> ; SSE42-NEXT: testb $1, %al
>>> ; SSE42-NEXT: ## implicit-def: $xmm0
>>> ; SSE42-NEXT: je LBB12_2
>>> ; SSE42-NEXT: ## %bb.1: ## %cond.load
>>> ; SSE42-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero
>>> ; SSE42-NEXT: LBB12_2: ## %else
>>> -; SSE42-NEXT: pblendw {{.*#+}} xmm1 =
>>> xmm1[0,1],xmm2[2,3],xmm1[4,5],xmm2[6,7]
>>> -; SSE42-NEXT: pcmpeqq %xmm2, %xmm1
>>> ; SSE42-NEXT: pextrb $8, %xmm1, %eax
>>> ; SSE42-NEXT: testb $1, %al
>>> ; SSE42-NEXT: je LBB12_4
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/masked_store.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/masked_store.ll?rev=353610&r1=353609&r2=353610&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/masked_store.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/masked_store.ll Sat Feb 9 05:13:59 2019
>>> @@ -330,17 +330,14 @@ define void @store_v2f32_v2i32(<2 x i32>
>>> ; SSE4-LABEL: store_v2f32_v2i32:
>>> ; SSE4: ## %bb.0:
>>> ; SSE4-NEXT: pxor %xmm2, %xmm2
>>> -; SSE4-NEXT: movdqa %xmm0, %xmm3
>>> -; SSE4-NEXT: pblendw {{.*#+}} xmm3 =
>>> xmm3[0,1],xmm2[2,3],xmm3[4,5],xmm2[6,7]
>>> -; SSE4-NEXT: pcmpeqq %xmm2, %xmm3
>>> -; SSE4-NEXT: pextrb $0, %xmm3, %eax
>>> +; SSE4-NEXT: pblendw {{.*#+}} xmm0 =
>>> xmm0[0,1],xmm2[2,3],xmm0[4,5],xmm2[6,7]
>>> +; SSE4-NEXT: pcmpeqq %xmm2, %xmm0
>>> +; SSE4-NEXT: pextrb $0, %xmm0, %eax
>>> ; SSE4-NEXT: testb $1, %al
>>> ; SSE4-NEXT: je LBB3_2
>>> ; SSE4-NEXT: ## %bb.1: ## %cond.store
>>> ; SSE4-NEXT: movss %xmm1, (%rdi)
>>> ; SSE4-NEXT: LBB3_2: ## %else
>>> -; SSE4-NEXT: pblendw {{.*#+}} xmm0 =
>>> xmm0[0,1],xmm2[2,3],xmm0[4,5],xmm2[6,7]
>>> -; SSE4-NEXT: pcmpeqq %xmm2, %xmm0
>>> ; SSE4-NEXT: pextrb $8, %xmm0, %eax
>>> ; SSE4-NEXT: testb $1, %al
>>> ; SSE4-NEXT: je LBB3_4
>>> @@ -417,17 +414,14 @@ define void @store_v2i32_v2i32(<2 x i32>
>>> ; SSE4-LABEL: store_v2i32_v2i32:
>>> ; SSE4: ## %bb.0:
>>> ; SSE4-NEXT: pxor %xmm2, %xmm2
>>> -; SSE4-NEXT: movdqa %xmm0, %xmm3
>>> -; SSE4-NEXT: pblendw {{.*#+}} xmm3 =
>>> xmm3[0,1],xmm2[2,3],xmm3[4,5],xmm2[6,7]
>>> -; SSE4-NEXT: pcmpeqq %xmm2, %xmm3
>>> -; SSE4-NEXT: pextrb $0, %xmm3, %eax
>>> +; SSE4-NEXT: pblendw {{.*#+}} xmm0 =
>>> xmm0[0,1],xmm2[2,3],xmm0[4,5],xmm2[6,7]
>>> +; SSE4-NEXT: pcmpeqq %xmm2, %xmm0
>>> +; SSE4-NEXT: pextrb $0, %xmm0, %eax
>>> ; SSE4-NEXT: testb $1, %al
>>> ; SSE4-NEXT: je LBB4_2
>>> ; SSE4-NEXT: ## %bb.1: ## %cond.store
>>> ; SSE4-NEXT: movss %xmm1, (%rdi)
>>> ; SSE4-NEXT: LBB4_2: ## %else
>>> -; SSE4-NEXT: pblendw {{.*#+}} xmm0 =
>>> xmm0[0,1],xmm2[2,3],xmm0[4,5],xmm2[6,7]
>>> -; SSE4-NEXT: pcmpeqq %xmm2, %xmm0
>>> ; SSE4-NEXT: pextrb $8, %xmm0, %eax
>>> ; SSE4-NEXT: testb $1, %al
>>> ; SSE4-NEXT: je LBB4_4
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/oddshuffles.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/oddshuffles.ll?rev=353610&r1=353609&r2=353610&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/oddshuffles.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/oddshuffles.ll Sat Feb 9 05:13:59 2019
>>> @@ -1036,7 +1036,7 @@ define void @interleave_24i16_out(<24 x
>>> ; SSE42-NEXT: pshufhw {{.*#+}} xmm3 = xmm3[0,1,2,3,4,5,6,5]
>>> ; SSE42-NEXT: movdqa %xmm0, %xmm4
>>> ; SSE42-NEXT: pblendw {{.*#+}} xmm4 =
>>> xmm4[0],xmm1[1],xmm4[2,3],xmm1[4],xmm4[5,6],xmm1[7]
>>> -; SSE42-NEXT: pshufb {{.*#+}} xmm4 =
>>> xmm4[0,1,6,7,12,13,2,3,8,9,14,15,u,u,u,u]
>>> +; SSE42-NEXT: pshufb {{.*#+}} xmm4 =
>>> xmm4[0,1,6,7,12,13,2,3,8,9,14,15,12,13,14,15]
>>> ; SSE42-NEXT: pblendw {{.*#+}} xmm4 = xmm4[0,1,2,3,4,5],xmm3[6,7]
>>> ; SSE42-NEXT: movdqa %xmm2, %xmm3
>>> ; SSE42-NEXT: pshufb {{.*#+}} xmm3 =
>>> xmm3[0,1,6,7,4,5,6,7,0,1,0,1,6,7,12,13]
>>> @@ -1061,7 +1061,7 @@ define void @interleave_24i16_out(<24 x
>>> ; AVX1-NEXT: vpshufd {{.*#+}} xmm3 = xmm2[0,1,2,1]
>>> ; AVX1-NEXT: vpshufhw {{.*#+}} xmm3 = xmm3[0,1,2,3,4,5,6,5]
>>> ; AVX1-NEXT: vpblendw {{.*#+}} xmm4 =
>>> xmm0[0],xmm1[1],xmm0[2,3],xmm1[4],xmm0[5,6],xmm1[7]
>>> -; AVX1-NEXT: vpshufb {{.*#+}} xmm4 =
>>> xmm4[0,1,6,7,12,13,2,3,8,9,14,15,u,u,u,u]
>>> +; AVX1-NEXT: vpshufb {{.*#+}} xmm4 =
>>> xmm4[0,1,6,7,12,13,2,3,8,9,14,15,12,13,14,15]
>>> ; AVX1-NEXT: vpblendw {{.*#+}} xmm3 = xmm4[0,1,2,3,4,5],xmm3[6,7]
>>> ; AVX1-NEXT: vpshufb {{.*#+}} xmm4 =
>>> xmm2[0,1,6,7,4,5,6,7,0,1,0,1,6,7,12,13]
>>> ; AVX1-NEXT: vpblendw {{.*#+}} xmm5 =
>>> xmm0[0,1],xmm1[2],xmm0[3,4],xmm1[5],xmm0[6,7]
>>> @@ -1583,25 +1583,25 @@ define void @interleave_24i32_in(<24 x i
>>> ; AVX1: # %bb.0:
>>> ; AVX1-NEXT: vmovupd (%rsi), %ymm0
>>> ; AVX1-NEXT: vmovupd (%rcx), %ymm1
>>> -; AVX1-NEXT: vmovups 16(%rcx), %xmm2
>>> -; AVX1-NEXT: vmovups (%rdx), %xmm3
>>> -; AVX1-NEXT: vmovups 16(%rdx), %xmm4
>>> -; AVX1-NEXT: vshufps {{.*#+}} xmm5 = xmm4[3,0],xmm2[3,0]
>>> -; AVX1-NEXT: vshufps {{.*#+}} xmm5 = xmm2[2,1],xmm5[0,2]
>>> -; AVX1-NEXT: vshufps {{.*#+}} xmm2 = xmm2[1,0],xmm4[1,0]
>>> -; AVX1-NEXT: vshufps {{.*#+}} xmm2 = xmm2[2,0],xmm4[2,2]
>>> -; AVX1-NEXT: vinsertf128 $1, %xmm5, %ymm2, %ymm2
>>> -; AVX1-NEXT: vpermilpd {{.*#+}} ymm4 = ymm0[1,1,3,3]
>>> -; AVX1-NEXT: vperm2f128 {{.*#+}} ymm4 = ymm4[2,3,2,3]
>>> -; AVX1-NEXT: vblendps {{.*#+}} ymm2 =
>>> ymm2[0,1],ymm4[2],ymm2[3,4],ymm4[5],ymm2[6,7]
>>> +; AVX1-NEXT: vmovups (%rdx), %xmm2
>>> +; AVX1-NEXT: vmovups 16(%rdx), %xmm3
>>> ; AVX1-NEXT: vmovups (%rsi), %xmm4
>>> -; AVX1-NEXT: vshufps {{.*#+}} xmm5 = xmm4[2,0],xmm3[2,0]
>>> -; AVX1-NEXT: vshufps {{.*#+}} xmm5 = xmm3[1,1],xmm5[0,2]
>>> -; AVX1-NEXT: vshufps {{.*#+}} xmm3 = xmm3[0,0],xmm4[0,0]
>>> -; AVX1-NEXT: vshufps {{.*#+}} xmm3 = xmm3[2,0],xmm4[2,1]
>>> -; AVX1-NEXT: vinsertf128 $1, %xmm5, %ymm3, %ymm3
>>> +; AVX1-NEXT: vshufps {{.*#+}} xmm5 = xmm4[2,0],xmm2[2,0]
>>> +; AVX1-NEXT: vshufps {{.*#+}} xmm5 = xmm2[1,1],xmm5[0,2]
>>> +; AVX1-NEXT: vshufps {{.*#+}} xmm2 = xmm2[0,0],xmm4[0,0]
>>> +; AVX1-NEXT: vshufps {{.*#+}} xmm2 = xmm2[2,0],xmm4[2,1]
>>> +; AVX1-NEXT: vinsertf128 $1, %xmm5, %ymm2, %ymm2
>>> ; AVX1-NEXT: vmovddup {{.*#+}} xmm4 = xmm1[0,0]
>>> ; AVX1-NEXT: vinsertf128 $1, %xmm4, %ymm4, %ymm4
>>> +; AVX1-NEXT: vblendps {{.*#+}} ymm2 =
>>> ymm2[0,1],ymm4[2],ymm2[3,4],ymm4[5],ymm2[6,7]
>>> +; AVX1-NEXT: vmovups 16(%rcx), %xmm4
>>> +; AVX1-NEXT: vshufps {{.*#+}} xmm5 = xmm3[3,0],xmm4[3,0]
>>> +; AVX1-NEXT: vshufps {{.*#+}} xmm5 = xmm4[2,1],xmm5[0,2]
>>> +; AVX1-NEXT: vshufps {{.*#+}} xmm4 = xmm4[1,0],xmm3[1,0]
>>> +; AVX1-NEXT: vshufps {{.*#+}} xmm3 = xmm4[2,0],xmm3[2,2]
>>> +; AVX1-NEXT: vinsertf128 $1, %xmm5, %ymm3, %ymm3
>>> +; AVX1-NEXT: vpermilpd {{.*#+}} ymm4 = ymm0[1,1,3,3]
>>> +; AVX1-NEXT: vperm2f128 {{.*#+}} ymm4 = ymm4[2,3,2,3]
>>> ; AVX1-NEXT: vblendps {{.*#+}} ymm3 =
>>> ymm3[0,1],ymm4[2],ymm3[3,4],ymm4[5],ymm3[6,7]
>>> ; AVX1-NEXT: vpermilpd {{.*#+}} ymm0 = ymm0[1,0,2,2]
>>> ; AVX1-NEXT: vpermilpd {{.*#+}} ymm1 = ymm1[1,1,2,2]
>>> @@ -1609,8 +1609,8 @@ define void @interleave_24i32_in(<24 x i
>>> ; AVX1-NEXT: vpermilps {{.*#+}} ymm1 = mem[0,0,3,3,4,4,7,7]
>>> ; AVX1-NEXT: vblendps {{.*#+}} ymm0 =
>>> ymm0[0,1],ymm1[2],ymm0[3,4],ymm1[5],ymm0[6,7]
>>> ; AVX1-NEXT: vmovups %ymm0, 32(%rdi)
>>> -; AVX1-NEXT: vmovups %ymm3, (%rdi)
>>> -; AVX1-NEXT: vmovups %ymm2, 64(%rdi)
>>> +; AVX1-NEXT: vmovups %ymm3, 64(%rdi)
>>> +; AVX1-NEXT: vmovups %ymm2, (%rdi)
>>> ; AVX1-NEXT: vzeroupper
>>> ; AVX1-NEXT: retq
>>> ;
>>> @@ -1674,32 +1674,32 @@ define void @interleave_24i32_in(<24 x i
>>> ; XOP: # %bb.0:
>>> ; XOP-NEXT: vmovupd (%rsi), %ymm0
>>> ; XOP-NEXT: vmovups (%rcx), %ymm1
>>> -; XOP-NEXT: vmovups 16(%rcx), %xmm2
>>> -; XOP-NEXT: vmovups (%rdx), %xmm3
>>> -; XOP-NEXT: vmovups 16(%rdx), %xmm4
>>> -; XOP-NEXT: vshufps {{.*#+}} xmm5 = xmm4[3,0],xmm2[3,0]
>>> -; XOP-NEXT: vshufps {{.*#+}} xmm5 = xmm2[2,1],xmm5[0,2]
>>> -; XOP-NEXT: vshufps {{.*#+}} xmm2 = xmm2[1,0],xmm4[1,0]
>>> -; XOP-NEXT: vshufps {{.*#+}} xmm2 = xmm2[2,0],xmm4[2,2]
>>> -; XOP-NEXT: vinsertf128 $1, %xmm5, %ymm2, %ymm2
>>> -; XOP-NEXT: vpermilpd {{.*#+}} ymm4 = ymm0[1,1,3,3]
>>> -; XOP-NEXT: vperm2f128 {{.*#+}} ymm4 = ymm4[2,3,2,3]
>>> -; XOP-NEXT: vblendps {{.*#+}} ymm2 =
>>> ymm2[0,1],ymm4[2],ymm2[3,4],ymm4[5],ymm2[6,7]
>>> +; XOP-NEXT: vmovups (%rdx), %xmm2
>>> +; XOP-NEXT: vmovups 16(%rdx), %xmm3
>>> ; XOP-NEXT: vmovups (%rsi), %xmm4
>>> -; XOP-NEXT: vshufps {{.*#+}} xmm5 = xmm4[2,0],xmm3[2,0]
>>> -; XOP-NEXT: vshufps {{.*#+}} xmm5 = xmm3[1,1],xmm5[0,2]
>>> -; XOP-NEXT: vshufps {{.*#+}} xmm3 = xmm3[0,0],xmm4[0,0]
>>> -; XOP-NEXT: vshufps {{.*#+}} xmm3 = xmm3[2,0],xmm4[2,1]
>>> -; XOP-NEXT: vinsertf128 $1, %xmm5, %ymm3, %ymm3
>>> +; XOP-NEXT: vshufps {{.*#+}} xmm5 = xmm4[2,0],xmm2[2,0]
>>> +; XOP-NEXT: vshufps {{.*#+}} xmm5 = xmm2[1,1],xmm5[0,2]
>>> +; XOP-NEXT: vshufps {{.*#+}} xmm2 = xmm2[0,0],xmm4[0,0]
>>> +; XOP-NEXT: vshufps {{.*#+}} xmm2 = xmm2[2,0],xmm4[2,1]
>>> +; XOP-NEXT: vinsertf128 $1, %xmm5, %ymm2, %ymm2
>>> ; XOP-NEXT: vmovddup {{.*#+}} xmm4 = xmm1[0,0]
>>> ; XOP-NEXT: vinsertf128 $1, %xmm4, %ymm4, %ymm4
>>> +; XOP-NEXT: vblendps {{.*#+}} ymm2 =
>>> ymm2[0,1],ymm4[2],ymm2[3,4],ymm4[5],ymm2[6,7]
>>> +; XOP-NEXT: vmovups 16(%rcx), %xmm4
>>> +; XOP-NEXT: vshufps {{.*#+}} xmm5 = xmm3[3,0],xmm4[3,0]
>>> +; XOP-NEXT: vshufps {{.*#+}} xmm5 = xmm4[2,1],xmm5[0,2]
>>> +; XOP-NEXT: vshufps {{.*#+}} xmm4 = xmm4[1,0],xmm3[1,0]
>>> +; XOP-NEXT: vshufps {{.*#+}} xmm3 = xmm4[2,0],xmm3[2,2]
>>> +; XOP-NEXT: vinsertf128 $1, %xmm5, %ymm3, %ymm3
>>> +; XOP-NEXT: vpermilpd {{.*#+}} ymm4 = ymm0[1,1,3,3]
>>> +; XOP-NEXT: vperm2f128 {{.*#+}} ymm4 = ymm4[2,3,2,3]
>>> ; XOP-NEXT: vblendps {{.*#+}} ymm3 =
>>> ymm3[0,1],ymm4[2],ymm3[3,4],ymm4[5],ymm3[6,7]
>>> ; XOP-NEXT: vpermil2ps {{.*#+}} ymm0 =
>>> ymm1[2],ymm0[3],ymm1[2,3],ymm0[4],ymm1[5,4],ymm0[5]
>>> ; XOP-NEXT: vpermilps {{.*#+}} ymm1 = mem[0,0,3,3,4,4,7,7]
>>> ; XOP-NEXT: vblendps {{.*#+}} ymm0 =
>>> ymm0[0,1],ymm1[2],ymm0[3,4],ymm1[5],ymm0[6,7]
>>> ; XOP-NEXT: vmovups %ymm0, 32(%rdi)
>>> -; XOP-NEXT: vmovups %ymm3, (%rdi)
>>> -; XOP-NEXT: vmovups %ymm2, 64(%rdi)
>>> +; XOP-NEXT: vmovups %ymm3, 64(%rdi)
>>> +; XOP-NEXT: vmovups %ymm2, (%rdi)
>>> ; XOP-NEXT: vzeroupper
>>> ; XOP-NEXT: retq
>>> %s1 = load <8 x i32>, <8 x i32>* %q1, align 4
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/packss.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/packss.ll?rev=353610&r1=353609&r2=353610&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/packss.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/packss.ll Sat Feb 9 05:13:59 2019
>>> @@ -172,19 +172,19 @@ define <8 x i16> @trunc_ashr_v4i64_deman
>>> ;
>>> ; X86-AVX1-LABEL: trunc_ashr_v4i64_demandedelts:
>>> ; X86-AVX1: # %bb.0:
>>> -; X86-AVX1-NEXT: vpsllq $63, %xmm0, %xmm1
>>> -; X86-AVX1-NEXT: vextractf128 $1, %ymm0, %xmm2
>>> -; X86-AVX1-NEXT: vpsllq $63, %xmm2, %xmm3
>>> -; X86-AVX1-NEXT: vpsrlq $63, %xmm3, %xmm3
>>> -; X86-AVX1-NEXT: vpblendw {{.*#+}} xmm2 = xmm3[0,1,2,3],xmm2[4,5,6,7]
>>> -; X86-AVX1-NEXT: vmovdqa {{.*#+}} xmm3 = [1,0,0,0,0,0,0,32768]
>>> -; X86-AVX1-NEXT: vpxor %xmm3, %xmm2, %xmm2
>>> -; X86-AVX1-NEXT: vpsubq %xmm3, %xmm2, %xmm2
>>> +; X86-AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1
>>> +; X86-AVX1-NEXT: vpsllq $63, %xmm1, %xmm2
>>> +; X86-AVX1-NEXT: vpblendw {{.*#+}} xmm1 = xmm2[0,1,2,3],xmm1[4,5,6,7]
>>> +; X86-AVX1-NEXT: vpsllq $63, %xmm0, %xmm2
>>> ; X86-AVX1-NEXT: vpsrlq $63, %xmm1, %xmm1
>>> -; X86-AVX1-NEXT: vpblendw {{.*#+}} xmm0 = xmm1[0,1,2,3],xmm0[4,5,6,7]
>>> +; X86-AVX1-NEXT: vmovdqa {{.*#+}} xmm3 = [1,0,0,2147483648]
>>> +; X86-AVX1-NEXT: vpxor %xmm3, %xmm1, %xmm1
>>> +; X86-AVX1-NEXT: vpsubq %xmm3, %xmm1, %xmm1
>>> +; X86-AVX1-NEXT: vpsrlq $63, %xmm2, %xmm2
>>> +; X86-AVX1-NEXT: vpblendw {{.*#+}} xmm0 = xmm2[0,1,2,3],xmm0[4,5,6,7]
>>> ; X86-AVX1-NEXT: vpxor %xmm3, %xmm0, %xmm0
>>> ; X86-AVX1-NEXT: vpsubq %xmm3, %xmm0, %xmm0
>>> -; X86-AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0
>>> +; X86-AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
>>> ; X86-AVX1-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[0,0,0,0,4,4,4,4]
>>> ; X86-AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1
>>> ; X86-AVX1-NEXT: vpackssdw %xmm1, %xmm0, %xmm0
>>> @@ -224,19 +224,19 @@ define <8 x i16> @trunc_ashr_v4i64_deman
>>> ;
>>> ; X64-AVX1-LABEL: trunc_ashr_v4i64_demandedelts:
>>> ; X64-AVX1: # %bb.0:
>>> -; X64-AVX1-NEXT: vpsllq $63, %xmm0, %xmm1
>>> -; X64-AVX1-NEXT: vextractf128 $1, %ymm0, %xmm2
>>> -; X64-AVX1-NEXT: vpsllq $63, %xmm2, %xmm3
>>> -; X64-AVX1-NEXT: vpsrlq $63, %xmm3, %xmm3
>>> -; X64-AVX1-NEXT: vpblendw {{.*#+}} xmm2 = xmm3[0,1,2,3],xmm2[4,5,6,7]
>>> -; X64-AVX1-NEXT: vmovdqa {{.*#+}} xmm3 = [1,9223372036854775808]
>>> -; X64-AVX1-NEXT: vpxor %xmm3, %xmm2, %xmm2
>>> -; X64-AVX1-NEXT: vpsubq %xmm3, %xmm2, %xmm2
>>> +; X64-AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1
>>> +; X64-AVX1-NEXT: vpsllq $63, %xmm1, %xmm2
>>> +; X64-AVX1-NEXT: vpblendw {{.*#+}} xmm1 = xmm2[0,1,2,3],xmm1[4,5,6,7]
>>> +; X64-AVX1-NEXT: vpsllq $63, %xmm0, %xmm2
>>> ; X64-AVX1-NEXT: vpsrlq $63, %xmm1, %xmm1
>>> -; X64-AVX1-NEXT: vpblendw {{.*#+}} xmm0 = xmm1[0,1,2,3],xmm0[4,5,6,7]
>>> +; X64-AVX1-NEXT: vmovdqa {{.*#+}} xmm3 = [1,9223372036854775808]
>>> +; X64-AVX1-NEXT: vpxor %xmm3, %xmm1, %xmm1
>>> +; X64-AVX1-NEXT: vpsubq %xmm3, %xmm1, %xmm1
>>> +; X64-AVX1-NEXT: vpsrlq $63, %xmm2, %xmm2
>>> +; X64-AVX1-NEXT: vpblendw {{.*#+}} xmm0 = xmm2[0,1,2,3],xmm0[4,5,6,7]
>>> ; X64-AVX1-NEXT: vpxor %xmm3, %xmm0, %xmm0
>>> ; X64-AVX1-NEXT: vpsubq %xmm3, %xmm0, %xmm0
>>> -; X64-AVX1-NEXT: vinsertf128 $1, %xmm2, %ymm0, %ymm0
>>> +; X64-AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
>>> ; X64-AVX1-NEXT: vpermilps {{.*#+}} ymm0 = ymm0[0,0,0,0,4,4,4,4]
>>> ; X64-AVX1-NEXT: vextractf128 $1, %ymm0, %xmm1
>>> ; X64-AVX1-NEXT: vpackssdw %xmm1, %xmm0, %xmm0
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/pr34592.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/pr34592.ll?rev=353610&r1=353609&r2=353610&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/pr34592.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/pr34592.ll Sat Feb 9 05:13:59 2019
>>> @@ -19,16 +19,14 @@ define <16 x i64> @pluto(<16 x i64> %arg
>>> ; CHECK-NEXT: vmovaps 80(%rbp), %ymm13
>>> ; CHECK-NEXT: vmovaps 48(%rbp), %ymm14
>>> ; CHECK-NEXT: vmovaps 16(%rbp), %ymm15
>>> -; CHECK-NEXT: vpblendd {{.*#+}} ymm2 = ymm6[0,1,2,3,4,5],ymm2[6,7]
>>> +; CHECK-NEXT: vblendpd {{.*#+}} ymm2 = ymm6[0,1,2],ymm2[3]
>>> ; CHECK-NEXT: vmovaps %xmm9, %xmm6
>>> -; CHECK-NEXT: vmovdqa %xmm6, %xmm9
>>> -; CHECK-NEXT: # kill: def $ymm9 killed $xmm9
>>> ; CHECK-NEXT: vmovaps %ymm0, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill
>>> ; CHECK-NEXT: # implicit-def: $ymm0
>>> ; CHECK-NEXT: vinserti128 $1, %xmm6, %ymm0, %ymm0
>>> ; CHECK-NEXT: vpalignr {{.*#+}} ymm11 =
>>> ymm2[8,9,10,11,12,13,14,15],ymm11[0,1,2,3,4,5,6,7],ymm2[24,25,26,27,28,29,30,31],ymm11[16,17,18,19,20,21,22,23]
>>> ; CHECK-NEXT: vpermq {{.*#+}} ymm11 = ymm11[2,3,2,0]
>>> -; CHECK-NEXT: vpblendd {{.*#+}} ymm0 =
>>> ymm11[0,1,2,3],ymm0[4,5],ymm11[6,7]
>>> +; CHECK-NEXT: vblendpd {{.*#+}} ymm0 = ymm11[0,1],ymm0[2],ymm11[3]
>>> ; CHECK-NEXT: vmovaps %xmm2, %xmm6
>>> ; CHECK-NEXT: # implicit-def: $ymm2
>>> ; CHECK-NEXT: vinserti128 $1, %xmm6, %ymm2, %ymm2
>>> @@ -36,18 +34,18 @@ define <16 x i64> @pluto(<16 x i64> %arg
>>> ; CHECK-NEXT: vmovq {{.*#+}} xmm6 = xmm6[0],zero
>>> ; CHECK-NEXT: # implicit-def: $ymm11
>>> ; CHECK-NEXT: vmovaps %xmm6, %xmm11
>>> -; CHECK-NEXT: vpblendd {{.*#+}} ymm2 = ymm11[0,1,2,3],ymm2[4,5,6,7]
>>> +; CHECK-NEXT: vblendpd {{.*#+}} ymm2 = ymm11[0,1],ymm2[2,3]
>>> ; CHECK-NEXT: vmovaps %xmm7, %xmm6
>>> ; CHECK-NEXT: vpslldq {{.*#+}} xmm6 =
>>> zero,zero,zero,zero,zero,zero,zero,zero,xmm6[0,1,2,3,4,5,6,7]
>>> ; CHECK-NEXT: # implicit-def: $ymm11
>>> ; CHECK-NEXT: vmovaps %xmm6, %xmm11
>>> ; CHECK-NEXT: vpalignr {{.*#+}} ymm9 =
>>> ymm9[8,9,10,11,12,13,14,15],ymm5[0,1,2,3,4,5,6,7],ymm9[24,25,26,27,28,29,30,31],ymm5[16,17,18,19,20,21,22,23]
>>> ; CHECK-NEXT: vpermq {{.*#+}} ymm9 = ymm9[0,1,0,3]
>>> -; CHECK-NEXT: vpblendd {{.*#+}} ymm9 = ymm11[0,1,2,3],ymm9[4,5,6,7]
>>> -; CHECK-NEXT: vpblendd {{.*#+}} ymm7 =
>>> ymm7[0,1],ymm8[2,3],ymm7[4,5,6,7]
>>> +; CHECK-NEXT: vblendpd {{.*#+}} ymm9 = ymm11[0,1],ymm9[2,3]
>>> +; CHECK-NEXT: vblendpd {{.*#+}} ymm7 = ymm7[0],ymm8[1],ymm7[2,3]
>>> ; CHECK-NEXT: vpermq {{.*#+}} ymm7 = ymm7[2,1,1,3]
>>> ; CHECK-NEXT: vpshufd {{.*#+}} ymm5 = ymm5[0,1,0,1,4,5,4,5]
>>> -; CHECK-NEXT: vpblendd {{.*#+}} ymm5 = ymm7[0,1,2,3,4,5],ymm5[6,7]
>>> +; CHECK-NEXT: vblendpd {{.*#+}} ymm5 = ymm7[0,1,2],ymm5[3]
>>> ; CHECK-NEXT: vmovaps %ymm1, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill
>>> ; CHECK-NEXT: vmovaps %ymm5, %ymm1
>>> ; CHECK-NEXT: vmovaps %ymm3, {{[-0-9]+}}(%r{{[sb]}}p) # 32-byte Spill
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/prefer-avx256-mask-shuffle.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/prefer-avx256-mask-shuffle.ll?rev=353610&r1=353609&r2=353610&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/prefer-avx256-mask-shuffle.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/prefer-avx256-mask-shuffle.ll Sat Feb 9
>>> 05:13:59 2019
>>> @@ -196,14 +196,13 @@ define <32 x i1> @shuf32i1_3_6_22_12_3_7
>>> ; AVX256VLBW: # %bb.0:
>>> ; AVX256VLBW-NEXT: vptestnmb %ymm0, %ymm0, %k0
>>> ; AVX256VLBW-NEXT: vpmovm2b %k0, %ymm0
>>> -; AVX256VLBW-NEXT: vpermq {{.*#+}} ymm1 = ymm0[2,3,0,1]
>>> -; AVX256VLBW-NEXT: vpblendd {{.*#+}} ymm2 =
>>> ymm1[0,1,2,3],ymm0[4,5,6,7]
>>> -; AVX256VLBW-NEXT: vpshufd {{.*#+}} ymm2 = ymm2[1,1,2,1,5,5,6,5]
>>> -; AVX256VLBW-NEXT: vpblendd {{.*#+}} ymm0 =
>>> ymm0[0,1,2,3],ymm1[4,5,6,7]
>>> +; AVX256VLBW-NEXT: vpermq {{.*#+}} ymm1 = ymm0[2,3,2,3]
>>> +; AVX256VLBW-NEXT: vpshufd {{.*#+}} ymm1 = ymm1[1,1,2,1,5,5,6,5]
>>> +; AVX256VLBW-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,1,0,1]
>>> ; AVX256VLBW-NEXT: vpshufb {{.*#+}} ymm0 =
>>> ymm0[3,6,u,12,3,7,7,0,3,6,1,13,3,u,7,0,19,22,u,28,19,23,23,16,19,22,17,29,19,u,23,16]
>>> ; AVX256VLBW-NEXT: movl $537141252, %eax # imm = 0x20042004
>>> ; AVX256VLBW-NEXT: kmovd %eax, %k1
>>> -; AVX256VLBW-NEXT: vmovdqu8 %ymm2, %ymm0 {%k1}
>>> +; AVX256VLBW-NEXT: vmovdqu8 %ymm1, %ymm0 {%k1}
>>> ; AVX256VLBW-NEXT: vpmovb2m %ymm0, %k0
>>> ; AVX256VLBW-NEXT: vpmovm2b %k0, %ymm0
>>> ; AVX256VLBW-NEXT: retq
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/sse2.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/sse2.ll?rev=353610&r1=353609&r2=353610&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/sse2.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/sse2.ll Sat Feb 9 05:13:59 2019
>>> @@ -709,8 +709,7 @@ define <4 x i32> @PR19721(<4 x i32> %i)
>>> ; X64-AVX512-NEXT: movabsq $-4294967296, %rcx # imm =
>>> 0xFFFFFFFF00000000
>>> ; X64-AVX512-NEXT: andq %rax, %rcx
>>> ; X64-AVX512-NEXT: vmovq %rcx, %xmm1
>>> -; X64-AVX512-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[2,3,0,1]
>>> -; X64-AVX512-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm1[0],xmm0[0]
>>> +; X64-AVX512-NEXT: vpblendd {{.*#+}} xmm0 = xmm1[0,1],xmm0[2,3]
>>> ; X64-AVX512-NEXT: retq
>>> %bc = bitcast <4 x i32> %i to i128
>>> %insert = and i128 %bc, -4294967296
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/vector-reduce-smax.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-reduce-smax.ll?rev=353610&r1=353609&r2=353610&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/vector-reduce-smax.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/vector-reduce-smax.ll Sat Feb 9
>>> 05:13:59 2019
>>> @@ -709,16 +709,17 @@ define i32 @test_v2i32(<2 x i32> %a0) {
>>> ; SSE41-NEXT: pshufd {{.*#+}} xmm3 = xmm0[0,2,2,3]
>>> ; SSE41-NEXT: psrad $31, %xmm3
>>> ; SSE41-NEXT: pblendw {{.*#+}} xmm3 =
>>> xmm2[0,1],xmm3[2,3],xmm2[4,5],xmm3[6,7]
>>> -; SSE41-NEXT: movdqa {{.*#+}} xmm2 = [2147483648,2147483648]
>>> -; SSE41-NEXT: movdqa %xmm3, %xmm0
>>> -; SSE41-NEXT: pxor %xmm2, %xmm0
>>> -; SSE41-NEXT: pxor %xmm1, %xmm2
>>> -; SSE41-NEXT: movdqa %xmm2, %xmm4
>>> -; SSE41-NEXT: pcmpeqd %xmm0, %xmm4
>>> -; SSE41-NEXT: pcmpgtd %xmm0, %xmm2
>>> -; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm2[0,0,2,2]
>>> -; SSE41-NEXT: pand %xmm4, %xmm0
>>> -; SSE41-NEXT: por %xmm2, %xmm0
>>> +; SSE41-NEXT: movdqa {{.*#+}} xmm0 = [2147483648,2147483648]
>>> +; SSE41-NEXT: movdqa %xmm3, %xmm2
>>> +; SSE41-NEXT: pxor %xmm0, %xmm2
>>> +; SSE41-NEXT: pxor %xmm1, %xmm0
>>> +; SSE41-NEXT: movdqa %xmm0, %xmm4
>>> +; SSE41-NEXT: pcmpgtd %xmm2, %xmm4
>>> +; SSE41-NEXT: pshufd {{.*#+}} xmm5 = xmm4[0,0,2,2]
>>> +; SSE41-NEXT: pcmpeqd %xmm2, %xmm0
>>> +; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]
>>> +; SSE41-NEXT: pand %xmm5, %xmm0
>>> +; SSE41-NEXT: por %xmm4, %xmm0
>>> ; SSE41-NEXT: blendvpd %xmm0, %xmm1, %xmm3
>>> ; SSE41-NEXT: movd %xmm3, %eax
>>> ; SSE41-NEXT: retq
>>> @@ -1170,11 +1171,12 @@ define i16 @test_v2i16(<2 x i16> %a0) {
>>> ; SSE41-NEXT: pxor %xmm0, %xmm2
>>> ; SSE41-NEXT: pxor %xmm1, %xmm0
>>> ; SSE41-NEXT: movdqa %xmm2, %xmm4
>>> -; SSE41-NEXT: pcmpeqd %xmm0, %xmm4
>>> -; SSE41-NEXT: pcmpgtd %xmm0, %xmm2
>>> -; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm2[0,0,2,2]
>>> -; SSE41-NEXT: pand %xmm4, %xmm0
>>> -; SSE41-NEXT: por %xmm2, %xmm0
>>> +; SSE41-NEXT: pcmpgtd %xmm0, %xmm4
>>> +; SSE41-NEXT: pshufd {{.*#+}} xmm5 = xmm4[0,0,2,2]
>>> +; SSE41-NEXT: pcmpeqd %xmm2, %xmm0
>>> +; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]
>>> +; SSE41-NEXT: pand %xmm5, %xmm0
>>> +; SSE41-NEXT: por %xmm4, %xmm0
>>> ; SSE41-NEXT: blendvpd %xmm0, %xmm3, %xmm1
>>> ; SSE41-NEXT: movd %xmm1, %eax
>>> ; SSE41-NEXT: # kill: def $ax killed $ax killed $eax
>>> @@ -1656,11 +1658,12 @@ define i8 @test_v2i8(<2 x i8> %a0) {
>>> ; SSE41-NEXT: pxor %xmm0, %xmm2
>>> ; SSE41-NEXT: pxor %xmm1, %xmm0
>>> ; SSE41-NEXT: movdqa %xmm2, %xmm4
>>> -; SSE41-NEXT: pcmpeqd %xmm0, %xmm4
>>> -; SSE41-NEXT: pcmpgtd %xmm0, %xmm2
>>> -; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm2[0,0,2,2]
>>> -; SSE41-NEXT: pand %xmm4, %xmm0
>>> -; SSE41-NEXT: por %xmm2, %xmm0
>>> +; SSE41-NEXT: pcmpgtd %xmm0, %xmm4
>>> +; SSE41-NEXT: pshufd {{.*#+}} xmm5 = xmm4[0,0,2,2]
>>> +; SSE41-NEXT: pcmpeqd %xmm2, %xmm0
>>> +; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]
>>> +; SSE41-NEXT: pand %xmm5, %xmm0
>>> +; SSE41-NEXT: por %xmm4, %xmm0
>>> ; SSE41-NEXT: blendvpd %xmm0, %xmm3, %xmm1
>>> ; SSE41-NEXT: pextrb $0, %xmm1, %eax
>>> ; SSE41-NEXT: # kill: def $al killed $al killed $eax
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/vector-reduce-smin.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-reduce-smin.ll?rev=353610&r1=353609&r2=353610&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/vector-reduce-smin.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/vector-reduce-smin.ll Sat Feb 9
>>> 05:13:59 2019
>>> @@ -708,16 +708,17 @@ define i32 @test_v2i32(<2 x i32> %a0) {
>>> ; SSE41-NEXT: pshufd {{.*#+}} xmm3 = xmm0[0,2,2,3]
>>> ; SSE41-NEXT: psrad $31, %xmm3
>>> ; SSE41-NEXT: pblendw {{.*#+}} xmm3 =
>>> xmm2[0,1],xmm3[2,3],xmm2[4,5],xmm3[6,7]
>>> -; SSE41-NEXT: movdqa {{.*#+}} xmm2 = [2147483648,2147483648]
>>> -; SSE41-NEXT: movdqa %xmm1, %xmm0
>>> -; SSE41-NEXT: pxor %xmm2, %xmm0
>>> -; SSE41-NEXT: pxor %xmm3, %xmm2
>>> -; SSE41-NEXT: movdqa %xmm2, %xmm4
>>> -; SSE41-NEXT: pcmpeqd %xmm0, %xmm4
>>> -; SSE41-NEXT: pcmpgtd %xmm0, %xmm2
>>> -; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm2[0,0,2,2]
>>> -; SSE41-NEXT: pand %xmm4, %xmm0
>>> -; SSE41-NEXT: por %xmm2, %xmm0
>>> +; SSE41-NEXT: movdqa {{.*#+}} xmm0 = [2147483648,2147483648]
>>> +; SSE41-NEXT: movdqa %xmm1, %xmm2
>>> +; SSE41-NEXT: pxor %xmm0, %xmm2
>>> +; SSE41-NEXT: pxor %xmm3, %xmm0
>>> +; SSE41-NEXT: movdqa %xmm0, %xmm4
>>> +; SSE41-NEXT: pcmpgtd %xmm2, %xmm4
>>> +; SSE41-NEXT: pshufd {{.*#+}} xmm5 = xmm4[0,0,2,2]
>>> +; SSE41-NEXT: pcmpeqd %xmm2, %xmm0
>>> +; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]
>>> +; SSE41-NEXT: pand %xmm5, %xmm0
>>> +; SSE41-NEXT: por %xmm4, %xmm0
>>> ; SSE41-NEXT: blendvpd %xmm0, %xmm1, %xmm3
>>> ; SSE41-NEXT: movd %xmm3, %eax
>>> ; SSE41-NEXT: retq
>>> @@ -1164,16 +1165,17 @@ define i16 @test_v2i16(<2 x i16> %a0) {
>>> ; SSE41-NEXT: psrad $16, %xmm1
>>> ; SSE41-NEXT: pshufd {{.*#+}} xmm1 = xmm1[1,1,3,3]
>>> ; SSE41-NEXT: pblendw {{.*#+}} xmm1 =
>>> xmm1[0,1],xmm0[2,3],xmm1[4,5],xmm0[6,7]
>>> -; SSE41-NEXT: movdqa {{.*#+}} xmm2 = [2147483648,2147483648]
>>> -; SSE41-NEXT: movdqa %xmm3, %xmm0
>>> -; SSE41-NEXT: pxor %xmm2, %xmm0
>>> -; SSE41-NEXT: pxor %xmm1, %xmm2
>>> -; SSE41-NEXT: movdqa %xmm2, %xmm4
>>> -; SSE41-NEXT: pcmpeqd %xmm0, %xmm4
>>> -; SSE41-NEXT: pcmpgtd %xmm0, %xmm2
>>> -; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm2[0,0,2,2]
>>> -; SSE41-NEXT: pand %xmm4, %xmm0
>>> -; SSE41-NEXT: por %xmm2, %xmm0
>>> +; SSE41-NEXT: movdqa {{.*#+}} xmm0 = [2147483648,2147483648]
>>> +; SSE41-NEXT: movdqa %xmm3, %xmm2
>>> +; SSE41-NEXT: pxor %xmm0, %xmm2
>>> +; SSE41-NEXT: pxor %xmm1, %xmm0
>>> +; SSE41-NEXT: movdqa %xmm0, %xmm4
>>> +; SSE41-NEXT: pcmpgtd %xmm2, %xmm4
>>> +; SSE41-NEXT: pshufd {{.*#+}} xmm5 = xmm4[0,0,2,2]
>>> +; SSE41-NEXT: pcmpeqd %xmm2, %xmm0
>>> +; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]
>>> +; SSE41-NEXT: pand %xmm5, %xmm0
>>> +; SSE41-NEXT: por %xmm4, %xmm0
>>> ; SSE41-NEXT: blendvpd %xmm0, %xmm3, %xmm1
>>> ; SSE41-NEXT: movd %xmm1, %eax
>>> ; SSE41-NEXT: # kill: def $ax killed $ax killed $eax
>>> @@ -1650,16 +1652,17 @@ define i8 @test_v2i8(<2 x i8> %a0) {
>>> ; SSE41-NEXT: psrad $24, %xmm1
>>> ; SSE41-NEXT: pshufd {{.*#+}} xmm1 = xmm1[1,1,3,3]
>>> ; SSE41-NEXT: pblendw {{.*#+}} xmm1 =
>>> xmm1[0,1],xmm0[2,3],xmm1[4,5],xmm0[6,7]
>>> -; SSE41-NEXT: movdqa {{.*#+}} xmm2 = [2147483648,2147483648]
>>> -; SSE41-NEXT: movdqa %xmm3, %xmm0
>>> -; SSE41-NEXT: pxor %xmm2, %xmm0
>>> -; SSE41-NEXT: pxor %xmm1, %xmm2
>>> -; SSE41-NEXT: movdqa %xmm2, %xmm4
>>> -; SSE41-NEXT: pcmpeqd %xmm0, %xmm4
>>> -; SSE41-NEXT: pcmpgtd %xmm0, %xmm2
>>> -; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm2[0,0,2,2]
>>> -; SSE41-NEXT: pand %xmm4, %xmm0
>>> -; SSE41-NEXT: por %xmm2, %xmm0
>>> +; SSE41-NEXT: movdqa {{.*#+}} xmm0 = [2147483648,2147483648]
>>> +; SSE41-NEXT: movdqa %xmm3, %xmm2
>>> +; SSE41-NEXT: pxor %xmm0, %xmm2
>>> +; SSE41-NEXT: pxor %xmm1, %xmm0
>>> +; SSE41-NEXT: movdqa %xmm0, %xmm4
>>> +; SSE41-NEXT: pcmpgtd %xmm2, %xmm4
>>> +; SSE41-NEXT: pshufd {{.*#+}} xmm5 = xmm4[0,0,2,2]
>>> +; SSE41-NEXT: pcmpeqd %xmm2, %xmm0
>>> +; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm0[1,1,3,3]
>>> +; SSE41-NEXT: pand %xmm5, %xmm0
>>> +; SSE41-NEXT: por %xmm4, %xmm0
>>> ; SSE41-NEXT: blendvpd %xmm0, %xmm3, %xmm1
>>> ; SSE41-NEXT: pextrb $0, %xmm1, %eax
>>> ; SSE41-NEXT: # kill: def $al killed $al killed $eax
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/vector-shift-ashr-256.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-shift-ashr-256.ll?rev=353610&r1=353609&r2=353610&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/vector-shift-ashr-256.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/vector-shift-ashr-256.ll Sat Feb 9
>>> 05:13:59 2019
>>> @@ -1070,13 +1070,13 @@ define <4 x i64> @constant_shift_v4i64(<
>>> ; X32-AVX1-NEXT: vpsrlq $62, %xmm1, %xmm2
>>> ; X32-AVX1-NEXT: vpsrlq $31, %xmm1, %xmm1
>>> ; X32-AVX1-NEXT: vpblendw {{.*#+}} xmm1 = xmm1[0,1,2,3],xmm2[4,5,6,7]
>>> -; X32-AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [0,0,1,0,2,0,0,0]
>>> +; X32-AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [0,1,2,0]
>>> ; X32-AVX1-NEXT: vpxor %xmm2, %xmm1, %xmm1
>>> ; X32-AVX1-NEXT: vpsubq %xmm2, %xmm1, %xmm1
>>> ; X32-AVX1-NEXT: vpsrlq $7, %xmm0, %xmm2
>>> ; X32-AVX1-NEXT: vpsrlq $1, %xmm0, %xmm0
>>> ; X32-AVX1-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm2[4,5,6,7]
>>> -; X32-AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [0,0,0,16384,0,0,0,256]
>>> +; X32-AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [0,1073741824,0,16777216]
>>> ; X32-AVX1-NEXT: vpxor %xmm2, %xmm0, %xmm0
>>> ; X32-AVX1-NEXT: vpsubq %xmm2, %xmm0, %xmm0
>>> ; X32-AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0
>>> @@ -1184,7 +1184,6 @@ define <16 x i16> @constant_shift_v16i16
>>> ; AVX2: # %bb.0:
>>> ; AVX2-NEXT: vpmulhw {{.*}}(%rip), %ymm0, %ymm1
>>> ; AVX2-NEXT: vpblendw {{.*#+}} ymm2 =
>>> ymm0[0],ymm1[1,2,3,4,5,6,7],ymm0[8],ymm1[9,10,11,12,13,14,15]
>>> -; AVX2-NEXT: vpblendd {{.*#+}} ymm2 = ymm2[0,1,2,3],ymm1[4,5,6,7]
>>> ; AVX2-NEXT: vpsraw $1, %ymm0, %ymm0
>>> ; AVX2-NEXT: vpblendw {{.*#+}} ymm0 =
>>> ymm2[0],ymm0[1],ymm2[2,3,4,5,6,7,8],ymm0[9],ymm2[10,11,12,13,14,15]
>>> ; AVX2-NEXT: vpblendd {{.*#+}} ymm0 = ymm0[0,1,2,3],ymm1[4,5,6,7]
>>> @@ -1248,7 +1247,6 @@ define <16 x i16> @constant_shift_v16i16
>>> ; X32-AVX2: # %bb.0:
>>> ; X32-AVX2-NEXT: vpmulhw {{\.LCPI.*}}, %ymm0, %ymm1
>>> ; X32-AVX2-NEXT: vpblendw {{.*#+}} ymm2 =
>>> ymm0[0],ymm1[1,2,3,4,5,6,7],ymm0[8],ymm1[9,10,11,12,13,14,15]
>>> -; X32-AVX2-NEXT: vpblendd {{.*#+}} ymm2 = ymm2[0,1,2,3],ymm1[4,5,6,7]
>>> ; X32-AVX2-NEXT: vpsraw $1, %ymm0, %ymm0
>>> ; X32-AVX2-NEXT: vpblendw {{.*#+}} ymm0 =
>>> ymm2[0],ymm0[1],ymm2[2,3,4,5,6,7,8],ymm0[9],ymm2[10,11,12,13,14,15]
>>> ; X32-AVX2-NEXT: vpblendd {{.*#+}} ymm0 = ymm0[0,1,2,3],ymm1[4,5,6,7]
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/vector-shuffle-128-v8.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-shuffle-128-v8.ll?rev=353610&r1=353609&r2=353610&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/vector-shuffle-128-v8.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/vector-shuffle-128-v8.ll Sat Feb 9
>>> 05:13:59 2019
>>> @@ -1167,10 +1167,9 @@ define <8 x i16> @shuffle_v8i16_0213cedf
>>> ;
>>> ; AVX512VL-SLOW-LABEL: shuffle_v8i16_0213cedf:
>>> ; AVX512VL-SLOW: # %bb.0:
>>> -; AVX512VL-SLOW-NEXT: vpshuflw {{.*#+}} xmm0 = xmm0[0,2,1,3,4,5,6,7]
>>> ; AVX512VL-SLOW-NEXT: vpshufhw {{.*#+}} xmm1 = xmm1[0,1,2,3,4,6,5,7]
>>> -; AVX512VL-SLOW-NEXT: vpshufd {{.*#+}} xmm1 = xmm1[2,3,2,3]
>>> -; AVX512VL-SLOW-NEXT: vpunpcklqdq {{.*#+}} xmm0 = xmm0[0],xmm1[0]
>>> +; AVX512VL-SLOW-NEXT: vpshuflw {{.*#+}} xmm0 = xmm0[0,2,1,3,4,5,6,7]
>>> +; AVX512VL-SLOW-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,3]
>>> ; AVX512VL-SLOW-NEXT: retq
>>> ;
>>> ; AVX512VL-FAST-LABEL: shuffle_v8i16_0213cedf:
>>> @@ -1557,14 +1556,14 @@ define <8 x i16> @shuffle_v8i16_XX4X8acX
>>> ;
>>> ; SSE41-LABEL: shuffle_v8i16_XX4X8acX:
>>> ; SSE41: # %bb.0:
>>> -; SSE41-NEXT: pshufb {{.*#+}} xmm1 =
>>> xmm1[u,u,u,u,u,u,u,u,0,1,4,5,8,9,4,5]
>>> +; SSE41-NEXT: pshufb {{.*#+}} xmm1 =
>>> xmm1[0,1,4,5,4,5,6,7,0,1,4,5,8,9,4,5]
>>> ; SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm0[2,2,3,3]
>>> ; SSE41-NEXT: pblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7]
>>> ; SSE41-NEXT: retq
>>> ;
>>> ; AVX1-LABEL: shuffle_v8i16_XX4X8acX:
>>> ; AVX1: # %bb.0:
>>> -; AVX1-NEXT: vpshufb {{.*#+}} xmm1 =
>>> xmm1[u,u,u,u,u,u,u,u,0,1,4,5,8,9,4,5]
>>> +; AVX1-NEXT: vpshufb {{.*#+}} xmm1 =
>>> xmm1[0,1,4,5,4,5,6,7,0,1,4,5,8,9,4,5]
>>> ; AVX1-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[2,2,3,3]
>>> ; AVX1-NEXT: vpblendw {{.*#+}} xmm0 = xmm0[0,1,2,3],xmm1[4,5,6,7]
>>> ; AVX1-NEXT: retq
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/vector-shuffle-256-v16.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-shuffle-256-v16.ll?rev=353610&r1=353609&r2=353610&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/vector-shuffle-256-v16.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/vector-shuffle-256-v16.ll Sat Feb 9
>>> 05:13:59 2019
>>> @@ -313,24 +313,13 @@ define <16 x i16> @shuffle_v16i16_00_00_
>>> ; AVX1-NEXT: vinsertf128 $1, %xmm0, %ymm1, %ymm0
>>> ; AVX1-NEXT: retq
>>> ;
>>> -; AVX2-SLOW-LABEL:
>>> shuffle_v16i16_00_00_00_00_00_00_00_08_00_00_00_00_00_00_00_00:
>>> -; AVX2-SLOW: # %bb.0:
>>> -; AVX2-SLOW-NEXT: vpermq {{.*#+}} ymm1 = ymm0[2,3,0,1]
>>> -; AVX2-SLOW-NEXT: vpblendd {{.*#+}} ymm0 =
>>> ymm0[0,1,2,3],ymm1[4,5,6,7]
>>> -; AVX2-SLOW-NEXT: vpshuflw {{.*#+}} ymm0 =
>>> ymm0[0,0,2,3,4,5,6,7,8,8,10,11,12,13,14,15]
>>> -; AVX2-SLOW-NEXT: vpshufd {{.*#+}} ymm0 = ymm0[0,0,0,0,4,4,4,4]
>>> -; AVX2-SLOW-NEXT: vpslldq {{.*#+}} ymm1 =
>>> zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,ymm1[0,1],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,ymm1[16,17]
>>> -; AVX2-SLOW-NEXT: vpblendw {{.*#+}} ymm0 =
>>> ymm0[0,1,2,3,4,5,6],ymm1[7],ymm0[8,9,10,11,12,13,14],ymm1[15]
>>> -; AVX2-SLOW-NEXT: retq
>>> -;
>>> -; AVX2-FAST-LABEL:
>>> shuffle_v16i16_00_00_00_00_00_00_00_08_00_00_00_00_00_00_00_00:
>>> -; AVX2-FAST: # %bb.0:
>>> -; AVX2-FAST-NEXT: vpermq {{.*#+}} ymm1 = ymm0[2,3,0,1]
>>> -; AVX2-FAST-NEXT: vpblendd {{.*#+}} ymm0 =
>>> ymm0[0,1,2,3],ymm1[4,5,6,7]
>>> -; AVX2-FAST-NEXT: vpshufb {{.*#+}} ymm0 =
>>> ymm0[0,1,0,1,0,1,0,1,0,1,0,1,0,1,0,1,16,17,16,17,16,17,16,17,16,17,16,17,16,17,16,17]
>>> -; AVX2-FAST-NEXT: vpslldq {{.*#+}} ymm1 =
>>> zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,ymm1[0,1],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,ymm1[16,17]
>>> -; AVX2-FAST-NEXT: vpblendw {{.*#+}} ymm0 =
>>> ymm0[0,1,2,3,4,5,6],ymm1[7],ymm0[8,9,10,11,12,13,14],ymm1[15]
>>> -; AVX2-FAST-NEXT: retq
>>> +; AVX2-LABEL:
>>> shuffle_v16i16_00_00_00_00_00_00_00_08_00_00_00_00_00_00_00_00:
>>> +; AVX2: # %bb.0:
>>> +; AVX2-NEXT: vpermq {{.*#+}} ymm1 = ymm0[2,3,0,1]
>>> +; AVX2-NEXT: vpslldq {{.*#+}} ymm1 =
>>> zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,ymm1[0,1],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,ymm1[16,17]
>>> +; AVX2-NEXT: vpbroadcastw %xmm0, %ymm0
>>> +; AVX2-NEXT: vpblendw {{.*#+}} ymm0 =
>>> ymm0[0,1,2,3,4,5,6],ymm1[7],ymm0[8,9,10,11,12,13,14],ymm1[15]
>>> +; AVX2-NEXT: retq
>>> ;
>>> ; AVX512VL-LABEL:
>>> shuffle_v16i16_00_00_00_00_00_00_00_08_00_00_00_00_00_00_00_00:
>>> ; AVX512VL: # %bb.0:
>>> @@ -3908,7 +3897,7 @@ define <16 x i16> @shuffle_v16i16_uu_uu_
>>> ; AVX1-LABEL:
>>> shuffle_v16i16_uu_uu_04_uu_16_18_20_uu_uu_uu_12_uu_24_26_28_uu:
>>> ; AVX1: # %bb.0:
>>> ; AVX1-NEXT: vextractf128 $1, %ymm1, %xmm2
>>> -; AVX1-NEXT: vmovdqa {{.*#+}} xmm3 =
>>> <u,u,u,u,u,u,u,u,0,1,4,5,8,9,4,5>
>>> +; AVX1-NEXT: vmovdqa {{.*#+}} xmm3 =
>>> [0,1,4,5,4,5,6,7,0,1,4,5,8,9,4,5]
>>> ; AVX1-NEXT: vpshufb %xmm3, %xmm2, %xmm2
>>> ; AVX1-NEXT: vextractf128 $1, %ymm0, %xmm4
>>> ; AVX1-NEXT: vpshufd {{.*#+}} xmm4 = xmm4[2,2,3,3]
>>>
>>> Modified: llvm/trunk/test/CodeGen/X86/vector-shuffle-256-v32.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-shuffle-256-v32.ll?rev=353610&r1=353609&r2=353610&view=diff
>>>
>>> ==============================================================================
>>> --- llvm/trunk/test/CodeGen/X86/vector-shuffle-256-v32.ll (original)
>>> +++ llvm/trunk/test/CodeGen/X86/vector-shuffle-256-v32.ll Sat Feb 9
>>> 05:13:59 2019
>>> @@ -578,22 +578,18 @@ define <32 x i8> @shuffle_v32i8_00_00_00
>>> ;
>>> ; AVX2-LABEL:
>>> shuffle_v32i8_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_16_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> ; AVX2: # %bb.0:
>>> -; AVX2-NEXT: vpermq {{.*#+}} ymm1 = ymm0[2,3,0,1]
>>> -; AVX2-NEXT: vpblendd {{.*#+}} ymm0 = ymm0[0,1,2,3],ymm1[4,5,6,7]
>>> -; AVX2-NEXT: vpxor %xmm2, %xmm2, %xmm2
>>> -; AVX2-NEXT: vpshufb %ymm2, %ymm0, %ymm0
>>> -; AVX2-NEXT: vpslldq {{.*#+}} ymm1 =
>>> zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,ymm1[0],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,ymm1[16]
>>> +; AVX2-NEXT: vpbroadcastb %xmm0, %ymm1
>>> +; AVX2-NEXT: vpermq {{.*#+}} ymm0 = ymm0[2,3,0,1]
>>> +; AVX2-NEXT: vpslldq {{.*#+}} ymm0 =
>>> zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,ymm0[0],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,ymm0[16]
>>> ; AVX2-NEXT: vmovdqa {{.*#+}} ymm2 =
>>> [255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,0,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,0]
>>> -; AVX2-NEXT: vpblendvb %ymm2, %ymm0, %ymm1, %ymm0
>>> +; AVX2-NEXT: vpblendvb %ymm2, %ymm1, %ymm0, %ymm0
>>> ; AVX2-NEXT: retq
>>> ;
>>> ; AVX512VLBW-LABEL:
>>> shuffle_v32i8_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_16_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> ; AVX512VLBW: # %bb.0:
>>> -; AVX512VLBW-NEXT: vpermq {{.*#+}} ymm1 = ymm0[2,3,0,1]
>>> -; AVX512VLBW-NEXT: vpblendd {{.*#+}} ymm0 =
>>> ymm0[0,1,2,3],ymm1[4,5,6,7]
>>> -; AVX512VLBW-NEXT: vpxor %xmm2, %xmm2, %xmm2
>>> -; AVX512VLBW-NEXT: vpshufb %ymm2, %ymm0, %ymm0
>>> +; AVX512VLBW-NEXT: vpermpd {{.*#+}} ymm1 = ymm0[2,3,0,1]
>>> ; AVX512VLBW-NEXT: vpslldq {{.*#+}} ymm1 =
>>> zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,ymm1[0],zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,zero,ymm1[16]
>>> +; AVX512VLBW-NEXT: vpbroadcastb %xmm0, %ymm0
>>> ; AVX512VLBW-NEXT: movl $-2147450880, %eax # imm = 0x80008000
>>> ; AVX512VLBW-NEXT: kmovd %eax, %k1
>>> ; AVX512VLBW-NEXT: vmovdqu8 %ymm1, %ymm0 {%k1}
>>> @@ -924,18 +920,11 @@ define <32 x i8> @shuffle_v32i8_00_00_00
>>> ; AVX2-NEXT: vpshufb {{.*#+}} ymm0 =
>>> ymm0[0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
>>> ; AVX2-NEXT: retq
>>> ;
>>> -; AVX512VLBW-SLOW-LABEL:
>>> shuffle_v32i8_00_00_00_00_00_00_00_24_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> -; AVX512VLBW-SLOW: # %bb.0:
>>> -; AVX512VLBW-SLOW-NEXT: vpermq {{.*#+}} ymm1 = ymm0[2,3,0,1]
>>> -; AVX512VLBW-SLOW-NEXT: vpblendd {{.*#+}} ymm0 =
>>> ymm0[0,1],ymm1[2,3,4,5,6,7]
>>> -; AVX512VLBW-SLOW-NEXT: vpshufb {{.*#+}} ymm0 =
>>> ymm0[0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
>>> -; AVX512VLBW-SLOW-NEXT: retq
>>> -;
>>> -; AVX512VLBW-FAST-LABEL:
>>> shuffle_v32i8_00_00_00_00_00_00_00_24_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> -; AVX512VLBW-FAST: # %bb.0:
>>> -; AVX512VLBW-FAST-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,3,0,1]
>>> -; AVX512VLBW-FAST-NEXT: vpshufb {{.*#+}} ymm0 =
>>> ymm0[0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
>>> -; AVX512VLBW-FAST-NEXT: retq
>>> +; AVX512VLBW-LABEL:
>>> shuffle_v32i8_00_00_00_00_00_00_00_24_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> +; AVX512VLBW: # %bb.0:
>>> +; AVX512VLBW-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,3,0,1]
>>> +; AVX512VLBW-NEXT: vpshufb {{.*#+}} ymm0 =
>>> ymm0[0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
>>> +; AVX512VLBW-NEXT: retq
>>> ;
>>> ; AVX512VLVBMI-LABEL:
>>> shuffle_v32i8_00_00_00_00_00_00_00_24_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> ; AVX512VLVBMI: # %bb.0:
>>> @@ -963,18 +952,11 @@ define <32 x i8> @shuffle_v32i8_00_00_00
>>> ; AVX2-NEXT: vpshufb {{.*#+}} ymm0 =
>>> ymm0[0,0,0,0,0,0,9,0,0,0,0,0,0,0,0,0,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
>>> ; AVX2-NEXT: retq
>>> ;
>>> -; AVX512VLBW-SLOW-LABEL:
>>> shuffle_v32i8_00_00_00_00_00_00_25_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> -; AVX512VLBW-SLOW: # %bb.0:
>>> -; AVX512VLBW-SLOW-NEXT: vpermq {{.*#+}} ymm1 = ymm0[2,3,0,1]
>>> -; AVX512VLBW-SLOW-NEXT: vpblendd {{.*#+}} ymm0 =
>>> ymm0[0,1],ymm1[2,3,4,5,6,7]
>>> -; AVX512VLBW-SLOW-NEXT: vpshufb {{.*#+}} ymm0 =
>>> ymm0[0,0,0,0,0,0,9,0,0,0,0,0,0,0,0,0,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
>>> -; AVX512VLBW-SLOW-NEXT: retq
>>> -;
>>> -; AVX512VLBW-FAST-LABEL:
>>> shuffle_v32i8_00_00_00_00_00_00_25_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> -; AVX512VLBW-FAST: # %bb.0:
>>> -; AVX512VLBW-FAST-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,3,0,1]
>>> -; AVX512VLBW-FAST-NEXT: vpshufb {{.*#+}} ymm0 =
>>> ymm0[0,0,0,0,0,0,9,0,0,0,0,0,0,0,0,0,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
>>> -; AVX512VLBW-FAST-NEXT: retq
>>> +; AVX512VLBW-LABEL:
>>> shuffle_v32i8_00_00_00_00_00_00_25_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> +; AVX512VLBW: # %bb.0:
>>> +; AVX512VLBW-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,3,0,1]
>>> +; AVX512VLBW-NEXT: vpshufb {{.*#+}} ymm0 =
>>> ymm0[0,0,0,0,0,0,9,0,0,0,0,0,0,0,0,0,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
>>> +; AVX512VLBW-NEXT: retq
>>> ;
>>> ; AVX512VLVBMI-LABEL:
>>> shuffle_v32i8_00_00_00_00_00_00_25_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> ; AVX512VLVBMI: # %bb.0:
>>> @@ -1002,18 +984,11 @@ define <32 x i8> @shuffle_v32i8_00_00_00
>>> ; AVX2-NEXT: vpshufb {{.*#+}} ymm0 =
>>> ymm0[0,0,0,0,0,10,0,0,0,0,0,0,0,0,0,0,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
>>> ; AVX2-NEXT: retq
>>> ;
>>> -; AVX512VLBW-SLOW-LABEL:
>>> shuffle_v32i8_00_00_00_00_00_26_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> -; AVX512VLBW-SLOW: # %bb.0:
>>> -; AVX512VLBW-SLOW-NEXT: vpermq {{.*#+}} ymm1 = ymm0[2,3,0,1]
>>> -; AVX512VLBW-SLOW-NEXT: vpblendd {{.*#+}} ymm0 =
>>> ymm0[0,1],ymm1[2,3,4,5,6,7]
>>> -; AVX512VLBW-SLOW-NEXT: vpshufb {{.*#+}} ymm0 =
>>> ymm0[0,0,0,0,0,10,0,0,0,0,0,0,0,0,0,0,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
>>> -; AVX512VLBW-SLOW-NEXT: retq
>>> -;
>>> -; AVX512VLBW-FAST-LABEL:
>>> shuffle_v32i8_00_00_00_00_00_26_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> -; AVX512VLBW-FAST: # %bb.0:
>>> -; AVX512VLBW-FAST-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,3,0,1]
>>> -; AVX512VLBW-FAST-NEXT: vpshufb {{.*#+}} ymm0 =
>>> ymm0[0,0,0,0,0,10,0,0,0,0,0,0,0,0,0,0,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
>>> -; AVX512VLBW-FAST-NEXT: retq
>>> +; AVX512VLBW-LABEL:
>>> shuffle_v32i8_00_00_00_00_00_26_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> +; AVX512VLBW: # %bb.0:
>>> +; AVX512VLBW-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,3,0,1]
>>> +; AVX512VLBW-NEXT: vpshufb {{.*#+}} ymm0 =
>>> ymm0[0,0,0,0,0,10,0,0,0,0,0,0,0,0,0,0,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
>>> +; AVX512VLBW-NEXT: retq
>>> ;
>>> ; AVX512VLVBMI-LABEL:
>>> shuffle_v32i8_00_00_00_00_00_26_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> ; AVX512VLVBMI: # %bb.0:
>>> @@ -1041,18 +1016,11 @@ define <32 x i8> @shuffle_v32i8_00_00_00
>>> ; AVX2-NEXT: vpshufb {{.*#+}} ymm0 =
>>> ymm0[0,0,0,0,11,0,0,0,0,0,0,0,0,0,0,0,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
>>> ; AVX2-NEXT: retq
>>> ;
>>> -; AVX512VLBW-SLOW-LABEL:
>>> shuffle_v32i8_00_00_00_00_27_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> -; AVX512VLBW-SLOW: # %bb.0:
>>> -; AVX512VLBW-SLOW-NEXT: vpermq {{.*#+}} ymm1 = ymm0[2,3,0,1]
>>> -; AVX512VLBW-SLOW-NEXT: vpblendd {{.*#+}} ymm0 =
>>> ymm0[0,1],ymm1[2,3,4,5,6,7]
>>> -; AVX512VLBW-SLOW-NEXT: vpshufb {{.*#+}} ymm0 =
>>> ymm0[0,0,0,0,11,0,0,0,0,0,0,0,0,0,0,0,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
>>> -; AVX512VLBW-SLOW-NEXT: retq
>>> -;
>>> -; AVX512VLBW-FAST-LABEL:
>>> shuffle_v32i8_00_00_00_00_27_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> -; AVX512VLBW-FAST: # %bb.0:
>>> -; AVX512VLBW-FAST-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,3,0,1]
>>> -; AVX512VLBW-FAST-NEXT: vpshufb {{.*#+}} ymm0 =
>>> ymm0[0,0,0,0,11,0,0,0,0,0,0,0,0,0,0,0,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
>>> -; AVX512VLBW-FAST-NEXT: retq
>>> +; AVX512VLBW-LABEL:
>>> shuffle_v32i8_00_00_00_00_27_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> +; AVX512VLBW: # %bb.0:
>>> +; AVX512VLBW-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,3,0,1]
>>> +; AVX512VLBW-NEXT: vpshufb {{.*#+}} ymm0 =
>>> ymm0[0,0,0,0,11,0,0,0,0,0,0,0,0,0,0,0,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
>>> +; AVX512VLBW-NEXT: retq
>>> ;
>>> ; AVX512VLVBMI-LABEL:
>>> shuffle_v32i8_00_00_00_00_27_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> ; AVX512VLVBMI: # %bb.0:
>>> @@ -1080,18 +1048,11 @@ define <32 x i8> @shuffle_v32i8_00_00_00
>>> ; AVX2-NEXT: vpshufb {{.*#+}} ymm0 =
>>> ymm0[0,0,0,12,0,0,0,0,0,0,0,0,0,0,0,0,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
>>> ; AVX2-NEXT: retq
>>> ;
>>> -; AVX512VLBW-SLOW-LABEL:
>>> shuffle_v32i8_00_00_00_28_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> -; AVX512VLBW-SLOW: # %bb.0:
>>> -; AVX512VLBW-SLOW-NEXT: vpermq {{.*#+}} ymm1 = ymm0[2,3,0,1]
>>> -; AVX512VLBW-SLOW-NEXT: vpblendd {{.*#+}} ymm0 =
>>> ymm0[0,1],ymm1[2,3,4,5,6,7]
>>> -; AVX512VLBW-SLOW-NEXT: vpshufb {{.*#+}} ymm0 =
>>> ymm0[0,0,0,12,0,0,0,0,0,0,0,0,0,0,0,0,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
>>> -; AVX512VLBW-SLOW-NEXT: retq
>>> -;
>>> -; AVX512VLBW-FAST-LABEL:
>>> shuffle_v32i8_00_00_00_28_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> -; AVX512VLBW-FAST: # %bb.0:
>>> -; AVX512VLBW-FAST-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,3,0,1]
>>> -; AVX512VLBW-FAST-NEXT: vpshufb {{.*#+}} ymm0 =
>>> ymm0[0,0,0,12,0,0,0,0,0,0,0,0,0,0,0,0,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
>>> -; AVX512VLBW-FAST-NEXT: retq
>>> +; AVX512VLBW-LABEL:
>>> shuffle_v32i8_00_00_00_28_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> +; AVX512VLBW: # %bb.0:
>>> +; AVX512VLBW-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,3,0,1]
>>> +; AVX512VLBW-NEXT: vpshufb {{.*#+}} ymm0 =
>>> ymm0[0,0,0,12,0,0,0,0,0,0,0,0,0,0,0,0,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
>>> +; AVX512VLBW-NEXT: retq
>>> ;
>>> ; AVX512VLVBMI-LABEL:
>>> shuffle_v32i8_00_00_00_28_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> ; AVX512VLVBMI: # %bb.0:
>>> @@ -1119,18 +1080,11 @@ define <32 x i8> @shuffle_v32i8_00_00_29
>>> ; AVX2-NEXT: vpshufb {{.*#+}} ymm0 =
>>> ymm0[0,0,13,0,0,0,0,0,0,0,0,0,0,0,0,0,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
>>> ; AVX2-NEXT: retq
>>> ;
>>> -; AVX512VLBW-SLOW-LABEL:
>>> shuffle_v32i8_00_00_29_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> -; AVX512VLBW-SLOW: # %bb.0:
>>> -; AVX512VLBW-SLOW-NEXT: vpermq {{.*#+}} ymm1 = ymm0[2,3,0,1]
>>> -; AVX512VLBW-SLOW-NEXT: vpblendd {{.*#+}} ymm0 =
>>> ymm0[0,1],ymm1[2,3,4,5,6,7]
>>> -; AVX512VLBW-SLOW-NEXT: vpshufb {{.*#+}} ymm0 =
>>> ymm0[0,0,13,0,0,0,0,0,0,0,0,0,0,0,0,0,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
>>> -; AVX512VLBW-SLOW-NEXT: retq
>>> -;
>>> -; AVX512VLBW-FAST-LABEL:
>>> shuffle_v32i8_00_00_29_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> -; AVX512VLBW-FAST: # %bb.0:
>>> -; AVX512VLBW-FAST-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,3,0,1]
>>> -; AVX512VLBW-FAST-NEXT: vpshufb {{.*#+}} ymm0 =
>>> ymm0[0,0,13,0,0,0,0,0,0,0,0,0,0,0,0,0,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
>>> -; AVX512VLBW-FAST-NEXT: retq
>>> +; AVX512VLBW-LABEL:
>>> shuffle_v32i8_00_00_29_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> +; AVX512VLBW: # %bb.0:
>>> +; AVX512VLBW-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,3,0,1]
>>> +; AVX512VLBW-NEXT: vpshufb {{.*#+}} ymm0 =
>>> ymm0[0,0,13,0,0,0,0,0,0,0,0,0,0,0,0,0,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
>>> +; AVX512VLBW-NEXT: retq
>>> ;
>>> ; AVX512VLVBMI-LABEL:
>>> shuffle_v32i8_00_00_29_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> ; AVX512VLVBMI: # %bb.0:
>>> @@ -1158,18 +1112,11 @@ define <32 x i8> @shuffle_v32i8_00_30_00
>>> ; AVX2-NEXT: vpshufb {{.*#+}} ymm0 =
>>> ymm0[0,14,0,0,0,0,0,0,0,0,0,0,0,0,0,0,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
>>> ; AVX2-NEXT: retq
>>> ;
>>> -; AVX512VLBW-SLOW-LABEL:
>>> shuffle_v32i8_00_30_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> -; AVX512VLBW-SLOW: # %bb.0:
>>> -; AVX512VLBW-SLOW-NEXT: vpermq {{.*#+}} ymm1 = ymm0[2,3,0,1]
>>> -; AVX512VLBW-SLOW-NEXT: vpblendd {{.*#+}} ymm0 =
>>> ymm0[0,1],ymm1[2,3,4,5,6,7]
>>> -; AVX512VLBW-SLOW-NEXT: vpshufb {{.*#+}} ymm0 =
>>> ymm0[0,14,0,0,0,0,0,0,0,0,0,0,0,0,0,0,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
>>> -; AVX512VLBW-SLOW-NEXT: retq
>>> -;
>>> -; AVX512VLBW-FAST-LABEL:
>>> shuffle_v32i8_00_30_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> -; AVX512VLBW-FAST: # %bb.0:
>>> -; AVX512VLBW-FAST-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,3,0,1]
>>> -; AVX512VLBW-FAST-NEXT: vpshufb {{.*#+}} ymm0 =
>>> ymm0[0,14,0,0,0,0,0,0,0,0,0,0,0,0,0,0,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
>>> -; AVX512VLBW-FAST-NEXT: retq
>>> +; AVX512VLBW-LABEL:
>>> shuffle_v32i8_00_30_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> +; AVX512VLBW: # %bb.0:
>>> +; AVX512VLBW-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,3,0,1]
>>> +; AVX512VLBW-NEXT: vpshufb {{.*#+}} ymm0 =
>>> ymm0[0,14,0,0,0,0,0,0,0,0,0,0,0,0,0,0,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16]
>>> +; AVX512VLBW-NEXT: retq
>>> ;
>>> ; AVX512VLVBMI-LABEL:
>>> shuffle_v32i8_00_30_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> ; AVX512VLVBMI: # %bb.0:
>>> @@ -1199,22 +1146,13 @@ define <32 x i8> @shuffle_v32i8_31_00_00
>>> ; AVX2-NEXT: vpshufb %ymm1, %ymm0, %ymm0
>>> ; AVX2-NEXT: retq
>>> ;
>>> -; AVX512VLBW-SLOW-LABEL:
>>> shuffle_v32i8_31_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> -; AVX512VLBW-SLOW: # %bb.0:
>>> -; AVX512VLBW-SLOW-NEXT: vpermq {{.*#+}} ymm1 = ymm0[2,3,0,1]
>>> -; AVX512VLBW-SLOW-NEXT: vpblendd {{.*#+}} ymm0 =
>>> ymm0[0,1],ymm1[2,3,4,5,6,7]
>>> -; AVX512VLBW-SLOW-NEXT: movl $15, %eax
>>> -; AVX512VLBW-SLOW-NEXT: vmovd %eax, %xmm1
>>> -; AVX512VLBW-SLOW-NEXT: vpshufb %ymm1, %ymm0, %ymm0
>>> -; AVX512VLBW-SLOW-NEXT: retq
>>> -;
>>> -; AVX512VLBW-FAST-LABEL:
>>> shuffle_v32i8_31_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> -; AVX512VLBW-FAST: # %bb.0:
>>> -; AVX512VLBW-FAST-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,3,0,1]
>>> -; AVX512VLBW-FAST-NEXT: movl $15, %eax
>>> -; AVX512VLBW-FAST-NEXT: vmovd %eax, %xmm1
>>> -; AVX512VLBW-FAST-NEXT: vpshufb %ymm1, %ymm0, %ymm0
>>> -; AVX512VLBW-FAST-NEXT: retq
>>> +; AVX512VLBW-LABEL:
>>> shuffle_v32i8_31_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> +; AVX512VLBW: # %bb.0:
>>> +; AVX512VLBW-NEXT: vpermq {{.*#+}} ymm0 = ymm0[0,3,0,1]
>>> +; AVX512VLBW-NEXT: movl $15, %eax
>>> +; AVX512VLBW-NEXT: vmovd %eax, %xmm1
>>> +; AVX512VLBW-NEXT: vpshufb %ymm1, %ymm0, %ymm0
>>> +; AVX512VLBW-NEXT: retq
>>> ;
>>> ; AVX512VLVBMI-LABEL:
>>> shuffle_v32i8_31_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00_00:
>>> ; AVX512VLVBMI: # %bb.0:
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20190211/df103440/attachment-0001.html>
More information about the llvm-commits
mailing list