[llvm] r310822 - Revert "[DAGCombiner] Extending pattern detection for vector shuffle (REAPPLIED)"
Cohen, Elad2 via llvm-commits
llvm-commits at lists.llvm.org
Thu Aug 17 00:41:33 PDT 2017
Sure - will do.
About this specific one, commit r310782 caused PR34175 (https://bugs.llvm.org/show_bug.cgi?id=34175 ):
test-suite/.../cjpeg & consumer-jpeg fail compilation (when compiled with "-O3 -march=broadwell") with the assertion:
Assertion failed: (Num < NumOperands && "Invalid child # of SDNode!"), function getOperand, file /Users/spatel/myllvm/llvm/include/llvm/CodeGen/SelectionDAGNodes.h, line 831.
More details including a reduced reproducer (by Sanjay Patel) can be found at https://bugs.llvm.org/show_bug.cgi?id=34175.
Patch review reopened at: https://reviews.llvm.org/D35788 (IIUC, the current patch already contains a fix by Jatin).
Thanks, Elad
From: Chandler Carruth [mailto:chandlerc at gmail.com]
Sent: Thursday, August 17, 2017 00:17
To: Cohen, Elad2 <elad2.cohen at intel.com>; llvm-commits at lists.llvm.org
Subject: Re: [llvm] r310822 - Revert "[DAGCombiner] Extending pattern detection for vector shuffle (REAPPLIED)"
On Mon, Aug 14, 2017 at 2:06 AM Elad Cohen via llvm-commits <llvm-commits at lists.llvm.org<mailto:llvm-commits at lists.llvm.org>> wrote:
Author: eladcohen
Date: Mon Aug 14 02:06:00 2017
New Revision: 310822
URL: http://llvm.org/viewvc/llvm-project?rev=310822&view=rev
Log:
Revert "[DAGCombiner] Extending pattern detection for vector shuffle (REAPPLIED)"
This reverts commit r310782.
Please always explain *why* you are reverting something. And please provide a summary in addition to a link to a PR or other source of details so that folks skimming the list have that information immediately available.
Modified:
llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
llvm/trunk/test/CodeGen/X86/shuffle-vs-trunc-512.ll
llvm/trunk/test/CodeGen/X86/vector-shuffle-512-v16.ll
llvm/trunk/test/CodeGen/X86/vector-shuffle-512-v8.ll
llvm/trunk/test/CodeGen/X86/x86-interleaved-access.ll
Modified: llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp?rev=310822&r1=310821&r2=310822&view=diff
==============================================================================
--- llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp (original)
+++ llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp Mon Aug 14 02:06:00 2017
@@ -14186,18 +14186,10 @@ SDValue DAGCombiner::createBuildVecShuff
EVT InVT1 = VecIn1.getValueType();
EVT InVT2 = VecIn2.getNode() ? VecIn2.getValueType() : InVT1;
- unsigned Vec2Offset = 0;
+ unsigned Vec2Offset = InVT1.getVectorNumElements();
unsigned NumElems = VT.getVectorNumElements();
unsigned ShuffleNumElems = NumElems;
- // In case both the input vectors are extracted from same base
- // vector we do not need extra addend (Vec2Offset) while
- // computing shuffle mask.
- if (!VecIn2 || !(VecIn1.getOpcode() == ISD::EXTRACT_SUBVECTOR) ||
- !(VecIn2.getOpcode() == ISD::EXTRACT_SUBVECTOR) ||
- !(VecIn1.getOperand(0) == VecIn2.getOperand(0)))
- Vec2Offset = InVT1.getVectorNumElements();
-
// We can't generate a shuffle node with mismatched input and output types.
// Try to make the types match the type of the output.
if (InVT1 != VT || InVT2 != VT) {
@@ -14344,6 +14336,7 @@ SDValue DAGCombiner::reduceBuildVecToShu
if (Op.getOpcode() != ISD::EXTRACT_VECTOR_ELT ||
!isa<ConstantSDNode>(Op.getOperand(1)))
return SDValue();
+
SDValue ExtractedFromVec = Op.getOperand(0);
// All inputs must have the same element type as the output.
@@ -14366,44 +14359,6 @@ SDValue DAGCombiner::reduceBuildVecToShu
if (VecIn.size() < 2)
return SDValue();
- // If all the Operands of BUILD_VECTOR extract from same
- // vector, then split the vector efficiently based on the maximum
- // vector access index and adjust the VectorMask and
- // VecIn accordingly.
- if (VecIn.size() == 2) {
- unsigned MaxIndex = 0;
- unsigned NearestPow2 = 0;
- SDValue Vec = VecIn.back();
- EVT InVT = Vec.getValueType();
- MVT IdxTy = TLI.getVectorIdxTy(DAG.getDataLayout());
- SmallVector<unsigned, 8> IndexVec(NumElems, 0);
-
- for (unsigned i = 0; i < NumElems; i++) {
- if (VectorMask[i] <= 0)
- continue;
- unsigned Index = N->getOperand(i).getConstantOperandVal(1);
- IndexVec[i] = Index;
- MaxIndex = std::max(MaxIndex, Index);
- }
-
- NearestPow2 = PowerOf2Ceil(MaxIndex);
- if (InVT.isSimple() && (NearestPow2 > 2) && ((NumElems * 2) < NearestPow2)) {
- unsigned SplitSize = NearestPow2 / 2;
- EVT SplitVT = EVT::getVectorVT(*DAG.getContext(),
- InVT.getVectorElementType(), SplitSize);
- SDValue VecIn2 = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, SplitVT, Vec,
- DAG.getConstant(SplitSize, DL, IdxTy));
- SDValue VecIn1 = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, SplitVT, Vec,
- DAG.getConstant(0, DL, IdxTy));
- VecIn.pop_back();
- VecIn.push_back(VecIn1);
- VecIn.push_back(VecIn2);
-
- for (unsigned i = 0; i < NumElems; i++)
- VectorMask[i] = (IndexVec[i] < SplitSize) ? 1 : 2;
- }
- }
-
// TODO: We want to sort the vectors by descending length, so that adjacent
// pairs have similar length, and the longer vector is always first in the
// pair.
Modified: llvm/trunk/test/CodeGen/X86/shuffle-vs-trunc-512.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/shuffle-vs-trunc-512.ll?rev=310822&r1=310821&r2=310822&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/shuffle-vs-trunc-512.ll (original)
+++ llvm/trunk/test/CodeGen/X86/shuffle-vs-trunc-512.ll Mon Aug 14 02:06:00 2017
@@ -311,33 +311,81 @@ define <16 x i8> @trunc_shuffle_v64i8_01
;
; AVX512BW-LABEL: trunc_shuffle_v64i8_01_05_09_13_17_21_25_29_33_37_41_45_49_53_57_62:
; AVX512BW: # BB#0:
-; AVX512BW-NEXT: vextracti128 $1, %ymm0, %xmm1
-; AVX512BW-NEXT: vmovdqa {{.*#+}} xmm2 = <1,5,9,13,u,u,u,u,u,u,u,u,u,u,u,u>
-; AVX512BW-NEXT: vpshufb %xmm2, %xmm1, %xmm1
-; AVX512BW-NEXT: vpshufb %xmm2, %xmm0, %xmm2
-; AVX512BW-NEXT: vpunpckldq {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[1],xmm1[1]
-; AVX512BW-NEXT: vextracti64x4 $1, %zmm0, %ymm0
-; AVX512BW-NEXT: vextracti128 $1, %ymm0, %xmm2
-; AVX512BW-NEXT: vpshufb {{.*#+}} xmm2 = xmm2[u,u,u,u,1,5,9,14,u,u,u,u,u,u,u,u]
-; AVX512BW-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[u,u,u,u,1,5,9,13,u,u,u,u,u,u,u,u]
-; AVX512BW-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
-; AVX512BW-NEXT: vpblendd {{.*#+}} xmm0 = xmm1[0,1],xmm0[2,3]
+; AVX512BW-NEXT: vpextrb $5, %xmm0, %eax
+; AVX512BW-NEXT: vpextrb $1, %xmm0, %ecx
+; AVX512BW-NEXT: vmovd %ecx, %xmm1
+; AVX512BW-NEXT: vpinsrb $1, %eax, %xmm1, %xmm1
+; AVX512BW-NEXT: vpextrb $9, %xmm0, %eax
+; AVX512BW-NEXT: vpinsrb $2, %eax, %xmm1, %xmm1
+; AVX512BW-NEXT: vpextrb $13, %xmm0, %eax
+; AVX512BW-NEXT: vpinsrb $3, %eax, %xmm1, %xmm1
+; AVX512BW-NEXT: vextracti32x4 $1, %zmm0, %xmm2
+; AVX512BW-NEXT: vpextrb $1, %xmm2, %eax
+; AVX512BW-NEXT: vpinsrb $4, %eax, %xmm1, %xmm1
+; AVX512BW-NEXT: vpextrb $5, %xmm2, %eax
+; AVX512BW-NEXT: vpinsrb $5, %eax, %xmm1, %xmm1
+; AVX512BW-NEXT: vpextrb $9, %xmm2, %eax
+; AVX512BW-NEXT: vpinsrb $6, %eax, %xmm1, %xmm1
+; AVX512BW-NEXT: vpextrb $13, %xmm2, %eax
+; AVX512BW-NEXT: vpinsrb $7, %eax, %xmm1, %xmm1
+; AVX512BW-NEXT: vextracti32x4 $2, %zmm0, %xmm2
+; AVX512BW-NEXT: vpextrb $1, %xmm2, %eax
+; AVX512BW-NEXT: vpinsrb $8, %eax, %xmm1, %xmm1
+; AVX512BW-NEXT: vpextrb $5, %xmm2, %eax
+; AVX512BW-NEXT: vpinsrb $9, %eax, %xmm1, %xmm1
+; AVX512BW-NEXT: vpextrb $9, %xmm2, %eax
+; AVX512BW-NEXT: vpinsrb $10, %eax, %xmm1, %xmm1
+; AVX512BW-NEXT: vpextrb $13, %xmm2, %eax
+; AVX512BW-NEXT: vpinsrb $11, %eax, %xmm1, %xmm1
+; AVX512BW-NEXT: vextracti32x4 $3, %zmm0, %xmm0
+; AVX512BW-NEXT: vpextrb $1, %xmm0, %eax
+; AVX512BW-NEXT: vpinsrb $12, %eax, %xmm1, %xmm1
+; AVX512BW-NEXT: vpextrb $5, %xmm0, %eax
+; AVX512BW-NEXT: vpinsrb $13, %eax, %xmm1, %xmm1
+; AVX512BW-NEXT: vpextrb $9, %xmm0, %eax
+; AVX512BW-NEXT: vpinsrb $14, %eax, %xmm1, %xmm1
+; AVX512BW-NEXT: vpextrb $14, %xmm0, %eax
+; AVX512BW-NEXT: vpinsrb $15, %eax, %xmm1, %xmm0
; AVX512BW-NEXT: vzeroupper
; AVX512BW-NEXT: retq
;
; AVX512BWVL-LABEL: trunc_shuffle_v64i8_01_05_09_13_17_21_25_29_33_37_41_45_49_53_57_62:
; AVX512BWVL: # BB#0:
-; AVX512BWVL-NEXT: vextracti128 $1, %ymm0, %xmm1
-; AVX512BWVL-NEXT: vmovdqa {{.*#+}} xmm2 = <1,5,9,13,u,u,u,u,u,u,u,u,u,u,u,u>
-; AVX512BWVL-NEXT: vpshufb %xmm2, %xmm1, %xmm1
-; AVX512BWVL-NEXT: vpshufb %xmm2, %xmm0, %xmm2
-; AVX512BWVL-NEXT: vpunpckldq {{.*#+}} xmm1 = xmm2[0],xmm1[0],xmm2[1],xmm1[1]
-; AVX512BWVL-NEXT: vextracti64x4 $1, %zmm0, %ymm0
-; AVX512BWVL-NEXT: vextracti128 $1, %ymm0, %xmm2
-; AVX512BWVL-NEXT: vpshufb {{.*#+}} xmm2 = xmm2[u,u,u,u,1,5,9,14,u,u,u,u,u,u,u,u]
-; AVX512BWVL-NEXT: vpshufb {{.*#+}} xmm0 = xmm0[u,u,u,u,1,5,9,13,u,u,u,u,u,u,u,u]
-; AVX512BWVL-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[1]
-; AVX512BWVL-NEXT: vpblendd {{.*#+}} xmm0 = xmm1[0,1],xmm0[2,3]
+; AVX512BWVL-NEXT: vpextrb $5, %xmm0, %eax
+; AVX512BWVL-NEXT: vpextrb $1, %xmm0, %ecx
+; AVX512BWVL-NEXT: vmovd %ecx, %xmm1
+; AVX512BWVL-NEXT: vpinsrb $1, %eax, %xmm1, %xmm1
+; AVX512BWVL-NEXT: vpextrb $9, %xmm0, %eax
+; AVX512BWVL-NEXT: vpinsrb $2, %eax, %xmm1, %xmm1
+; AVX512BWVL-NEXT: vpextrb $13, %xmm0, %eax
+; AVX512BWVL-NEXT: vpinsrb $3, %eax, %xmm1, %xmm1
+; AVX512BWVL-NEXT: vextracti32x4 $1, %zmm0, %xmm2
+; AVX512BWVL-NEXT: vpextrb $1, %xmm2, %eax
+; AVX512BWVL-NEXT: vpinsrb $4, %eax, %xmm1, %xmm1
+; AVX512BWVL-NEXT: vpextrb $5, %xmm2, %eax
+; AVX512BWVL-NEXT: vpinsrb $5, %eax, %xmm1, %xmm1
+; AVX512BWVL-NEXT: vpextrb $9, %xmm2, %eax
+; AVX512BWVL-NEXT: vpinsrb $6, %eax, %xmm1, %xmm1
+; AVX512BWVL-NEXT: vpextrb $13, %xmm2, %eax
+; AVX512BWVL-NEXT: vpinsrb $7, %eax, %xmm1, %xmm1
+; AVX512BWVL-NEXT: vextracti32x4 $2, %zmm0, %xmm2
+; AVX512BWVL-NEXT: vpextrb $1, %xmm2, %eax
+; AVX512BWVL-NEXT: vpinsrb $8, %eax, %xmm1, %xmm1
+; AVX512BWVL-NEXT: vpextrb $5, %xmm2, %eax
+; AVX512BWVL-NEXT: vpinsrb $9, %eax, %xmm1, %xmm1
+; AVX512BWVL-NEXT: vpextrb $9, %xmm2, %eax
+; AVX512BWVL-NEXT: vpinsrb $10, %eax, %xmm1, %xmm1
+; AVX512BWVL-NEXT: vpextrb $13, %xmm2, %eax
+; AVX512BWVL-NEXT: vpinsrb $11, %eax, %xmm1, %xmm1
+; AVX512BWVL-NEXT: vextracti32x4 $3, %zmm0, %xmm0
+; AVX512BWVL-NEXT: vpextrb $1, %xmm0, %eax
+; AVX512BWVL-NEXT: vpinsrb $12, %eax, %xmm1, %xmm1
+; AVX512BWVL-NEXT: vpextrb $5, %xmm0, %eax
+; AVX512BWVL-NEXT: vpinsrb $13, %eax, %xmm1, %xmm1
+; AVX512BWVL-NEXT: vpextrb $9, %xmm0, %eax
+; AVX512BWVL-NEXT: vpinsrb $14, %eax, %xmm1, %xmm1
+; AVX512BWVL-NEXT: vpextrb $14, %xmm0, %eax
+; AVX512BWVL-NEXT: vpinsrb $15, %eax, %xmm1, %xmm0
; AVX512BWVL-NEXT: vzeroupper
; AVX512BWVL-NEXT: retq
%res = shufflevector <64 x i8> %x, <64 x i8> %x, <16 x i32> <i32 1, i32 5, i32 9, i32 13, i32 17, i32 21, i32 25, i32 29, i32 33, i32 37, i32 41, i32 45, i32 49, i32 53, i32 57, i32 62>
Modified: llvm/trunk/test/CodeGen/X86/vector-shuffle-512-v16.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-shuffle-512-v16.ll?rev=310822&r1=310821&r2=310822&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/vector-shuffle-512-v16.ll (original)
+++ llvm/trunk/test/CodeGen/X86/vector-shuffle-512-v16.ll Mon Aug 14 02:06:00 2017
@@ -286,10 +286,13 @@ define <8 x i32> @test_v16i32_1_3_5_7_9_
define <4 x i32> @test_v16i32_0_1_2_12 (<16 x i32> %v) {
; ALL-LABEL: test_v16i32_0_1_2_12:
; ALL: # BB#0:
-; ALL-NEXT: vextracti32x8 $1, %zmm0, %ymm1
-; ALL-NEXT: vextracti128 $1, %ymm1, %xmm1
-; ALL-NEXT: vpbroadcastd %xmm1, %xmm1
-; ALL-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0,1,2],xmm1[3]
+; ALL-NEXT: vpextrd $1, %xmm0, %eax
+; ALL-NEXT: vpinsrd $1, %eax, %xmm0, %xmm1
+; ALL-NEXT: vpextrd $2, %xmm0, %eax
+; ALL-NEXT: vpinsrd $2, %eax, %xmm1, %xmm1
+; ALL-NEXT: vextracti32x4 $3, %zmm0, %xmm0
+; ALL-NEXT: vmovd %xmm0, %eax
+; ALL-NEXT: vpinsrd $3, %eax, %xmm1, %xmm0
; ALL-NEXT: vzeroupper
; ALL-NEXT: retq
%res = shufflevector <16 x i32> %v, <16 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 12>
Modified: llvm/trunk/test/CodeGen/X86/vector-shuffle-512-v8.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-shuffle-512-v8.ll?rev=310822&r1=310821&r2=310822&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/vector-shuffle-512-v8.ll (original)
+++ llvm/trunk/test/CodeGen/X86/vector-shuffle-512-v8.ll Mon Aug 14 02:06:00 2017
@@ -2726,17 +2726,20 @@ define <4 x i64> @test_v8i64_1257 (<8 x
define <2 x i64> @test_v8i64_2_5 (<8 x i64> %v) {
; AVX512F-LABEL: test_v8i64_2_5:
; AVX512F: # BB#0:
-; AVX512F-NEXT: vextracti64x4 $1, %zmm0, %ymm1
-; AVX512F-NEXT: vextracti128 $1, %ymm0, %xmm0
+; AVX512F-NEXT: vextracti32x4 $2, %zmm0, %xmm1
+; AVX512F-NEXT: vextracti32x4 $1, %zmm0, %xmm0
; AVX512F-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,3]
; AVX512F-NEXT: vzeroupper
; AVX512F-NEXT: retq
;
; AVX512F-32-LABEL: test_v8i64_2_5:
; AVX512F-32: # BB#0:
-; AVX512F-32-NEXT: vextracti64x4 $1, %zmm0, %ymm1
-; AVX512F-32-NEXT: vextracti128 $1, %ymm0, %xmm0
-; AVX512F-32-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,3]
+; AVX512F-32-NEXT: vextracti32x4 $1, %zmm0, %xmm1
+; AVX512F-32-NEXT: vextracti32x4 $2, %zmm0, %xmm0
+; AVX512F-32-NEXT: vpextrd $2, %xmm0, %eax
+; AVX512F-32-NEXT: vpinsrd $2, %eax, %xmm1, %xmm1
+; AVX512F-32-NEXT: vpextrd $3, %xmm0, %eax
+; AVX512F-32-NEXT: vpinsrd $3, %eax, %xmm1, %xmm0
; AVX512F-32-NEXT: vzeroupper
; AVX512F-32-NEXT: retl
%res = shufflevector <8 x i64> %v, <8 x i64> undef, <2 x i32> <i32 2, i32 5>
Modified: llvm/trunk/test/CodeGen/X86/x86-interleaved-access.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/x86-interleaved-access.ll?rev=310822&r1=310821&r2=310822&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/x86-interleaved-access.ll (original)
+++ llvm/trunk/test/CodeGen/X86/x86-interleaved-access.ll Mon Aug 14 02:06:00 2017
@@ -567,37 +567,37 @@ define <16 x i1> @interleaved_load_vf16_
; AVX2-NEXT: vpermq {{.*#+}} ymm2 = ymm2[0,2,2,3]
; AVX2-NEXT: vpshufb %xmm4, %xmm2, %xmm2
; AVX2-NEXT: vpunpcklqdq {{.*#+}} xmm2 = xmm2[0],xmm3[0]
-; AVX2-NEXT: vmovdqa {{.*#+}} xmm3 = <u,u,u,u,1,5,9,13,u,u,u,u,u,u,u,u>
-; AVX2-NEXT: vextracti128 $1, %ymm1, %xmm4
-; AVX2-NEXT: vpshufb %xmm3, %xmm4, %xmm5
-; AVX2-NEXT: vpshufb %xmm3, %xmm1, %xmm3
-; AVX2-NEXT: vpunpckldq {{.*#+}} xmm3 = xmm3[0],xmm5[0],xmm3[1],xmm5[1]
-; AVX2-NEXT: vmovdqa {{.*#+}} xmm5 = <1,5,9,13,u,u,u,u,u,u,u,u,u,u,u,u>
-; AVX2-NEXT: vextracti128 $1, %ymm0, %xmm6
-; AVX2-NEXT: vpshufb %xmm5, %xmm6, %xmm7
-; AVX2-NEXT: vpshufb %xmm5, %xmm0, %xmm5
-; AVX2-NEXT: vpunpckldq {{.*#+}} xmm5 = xmm5[0],xmm7[0],xmm5[1],xmm7[1]
-; AVX2-NEXT: vpblendd {{.*#+}} xmm3 = xmm5[0,1],xmm3[2,3]
-; AVX2-NEXT: vpcmpeqb %xmm3, %xmm2, %xmm2
-; AVX2-NEXT: vmovdqa {{.*#+}} xmm3 = <u,u,u,u,2,6,10,14,u,u,u,u,u,u,u,u>
-; AVX2-NEXT: vpshufb %xmm3, %xmm4, %xmm5
-; AVX2-NEXT: vpshufb %xmm3, %xmm1, %xmm3
-; AVX2-NEXT: vpunpckldq {{.*#+}} xmm3 = xmm3[0],xmm5[0],xmm3[1],xmm5[1]
-; AVX2-NEXT: vmovdqa {{.*#+}} xmm5 = <2,6,10,14,u,u,u,u,u,u,u,u,u,u,u,u>
-; AVX2-NEXT: vpshufb %xmm5, %xmm6, %xmm7
-; AVX2-NEXT: vpshufb %xmm5, %xmm0, %xmm5
-; AVX2-NEXT: vpunpckldq {{.*#+}} xmm5 = xmm5[0],xmm7[0],xmm5[1],xmm7[1]
-; AVX2-NEXT: vpblendd {{.*#+}} xmm3 = xmm5[0,1],xmm3[2,3]
-; AVX2-NEXT: vmovdqa {{.*#+}} xmm5 = <u,u,u,u,3,7,11,15,u,u,u,u,u,u,u,u>
-; AVX2-NEXT: vpshufb %xmm5, %xmm4, %xmm4
-; AVX2-NEXT: vpshufb %xmm5, %xmm1, %xmm1
-; AVX2-NEXT: vpunpckldq {{.*#+}} xmm1 = xmm1[0],xmm4[0],xmm1[1],xmm4[1]
-; AVX2-NEXT: vmovdqa {{.*#+}} xmm4 = <3,7,11,15,u,u,u,u,u,u,u,u,u,u,u,u>
-; AVX2-NEXT: vpshufb %xmm4, %xmm6, %xmm5
-; AVX2-NEXT: vpshufb %xmm4, %xmm0, %xmm0
+; AVX2-NEXT: vextracti128 $1, %ymm1, %xmm3
+; AVX2-NEXT: vmovdqa {{.*#+}} xmm4 = <u,u,u,u,1,5,9,13,u,u,u,u,u,u,u,u>
+; AVX2-NEXT: vpshufb %xmm4, %xmm3, %xmm5
+; AVX2-NEXT: vpshufb %xmm4, %xmm1, %xmm4
+; AVX2-NEXT: vpunpckldq {{.*#+}} xmm4 = xmm4[0],xmm5[0],xmm4[1],xmm5[1]
+; AVX2-NEXT: vextracti128 $1, %ymm0, %xmm5
+; AVX2-NEXT: vmovdqa {{.*#+}} xmm6 = <1,5,9,13,u,u,u,u,u,u,u,u,u,u,u,u>
+; AVX2-NEXT: vpshufb %xmm6, %xmm5, %xmm7
+; AVX2-NEXT: vpshufb %xmm6, %xmm0, %xmm6
+; AVX2-NEXT: vpunpckldq {{.*#+}} xmm6 = xmm6[0],xmm7[0],xmm6[1],xmm7[1]
+; AVX2-NEXT: vpblendd {{.*#+}} xmm4 = xmm6[0,1],xmm4[2,3]
+; AVX2-NEXT: vpcmpeqb %xmm4, %xmm2, %xmm2
+; AVX2-NEXT: vmovdqa {{.*#+}} xmm4 = <u,u,u,u,2,6,10,14,u,u,u,u,u,u,u,u>
+; AVX2-NEXT: vpshufb %xmm4, %xmm3, %xmm6
+; AVX2-NEXT: vpshufb %xmm4, %xmm1, %xmm4
+; AVX2-NEXT: vpunpckldq {{.*#+}} xmm4 = xmm4[0],xmm6[0],xmm4[1],xmm6[1]
+; AVX2-NEXT: vmovdqa {{.*#+}} xmm6 = <2,6,10,14,u,u,u,u,u,u,u,u,u,u,u,u>
+; AVX2-NEXT: vpshufb %xmm6, %xmm5, %xmm7
+; AVX2-NEXT: vpshufb %xmm6, %xmm0, %xmm6
+; AVX2-NEXT: vpunpckldq {{.*#+}} xmm6 = xmm6[0],xmm7[0],xmm6[1],xmm7[1]
+; AVX2-NEXT: vpblendd {{.*#+}} xmm4 = xmm6[0,1],xmm4[2,3]
+; AVX2-NEXT: vmovdqa {{.*#+}} xmm6 = <u,u,u,u,3,7,11,15,u,u,u,u,u,u,u,u>
+; AVX2-NEXT: vpshufb %xmm6, %xmm3, %xmm3
+; AVX2-NEXT: vpshufb %xmm6, %xmm1, %xmm1
+; AVX2-NEXT: vpunpckldq {{.*#+}} xmm1 = xmm1[0],xmm3[0],xmm1[1],xmm3[1]
+; AVX2-NEXT: vmovdqa {{.*#+}} xmm3 = <3,7,11,15,u,u,u,u,u,u,u,u,u,u,u,u>
+; AVX2-NEXT: vpshufb %xmm3, %xmm5, %xmm5
+; AVX2-NEXT: vpshufb %xmm3, %xmm0, %xmm0
; AVX2-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm5[0],xmm0[1],xmm5[1]
; AVX2-NEXT: vpblendd {{.*#+}} xmm0 = xmm0[0,1],xmm1[2,3]
-; AVX2-NEXT: vpcmpeqb %xmm0, %xmm3, %xmm0
+; AVX2-NEXT: vpcmpeqb %xmm0, %xmm4, %xmm0
; AVX2-NEXT: vmovdqa {{.*#+}} xmm1 = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]
; AVX2-NEXT: vpand %xmm1, %xmm2, %xmm2
; AVX2-NEXT: vpand %xmm1, %xmm0, %xmm0
@@ -836,15 +836,15 @@ define <32 x i1> @interleaved_load_vf32_
; AVX512-NEXT: vpmovdw %zmm1, %ymm3
; AVX512-NEXT: vinserti64x4 $1, %ymm3, %zmm2, %zmm2
; AVX512-NEXT: vpmovwb %zmm2, %ymm8
+; AVX512-NEXT: vmovdqa {{.*#+}} xmm7 = <u,u,u,u,1,5,9,13,u,u,u,u,u,u,u,u>
; AVX512-NEXT: vextracti64x4 $1, %zmm1, %ymm14
; AVX512-NEXT: vextracti128 $1, %ymm14, %xmm9
-; AVX512-NEXT: vmovdqa {{.*#+}} xmm7 = <u,u,u,u,1,5,9,13,u,u,u,u,u,u,u,u>
; AVX512-NEXT: vpshufb %xmm7, %xmm9, %xmm4
; AVX512-NEXT: vpshufb %xmm7, %xmm14, %xmm5
; AVX512-NEXT: vpunpckldq {{.*#+}} xmm4 = xmm5[0],xmm4[0],xmm5[1],xmm4[1]
; AVX512-NEXT: vinserti128 $1, %xmm4, %ymm0, %ymm5
-; AVX512-NEXT: vextracti128 $1, %ymm1, %xmm10
; AVX512-NEXT: vmovdqa {{.*#+}} xmm3 = <1,5,9,13,u,u,u,u,u,u,u,u,u,u,u,u>
+; AVX512-NEXT: vextracti128 $1, %ymm1, %xmm10
; AVX512-NEXT: vpshufb %xmm3, %xmm10, %xmm6
; AVX512-NEXT: vpshufb %xmm3, %xmm1, %xmm4
; AVX512-NEXT: vpunpckldq {{.*#+}} xmm4 = xmm4[0],xmm6[0],xmm4[1],xmm6[1]
_______________________________________________
llvm-commits mailing list
llvm-commits at lists.llvm.org<mailto:llvm-commits at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
---------------------------------------------------------------------
Intel Israel (74) Limited
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170817/72870a25/attachment.html>
More information about the llvm-commits
mailing list