[llvm] r346574 - [X86] In LowerHorizontalByteSum, emit vector_shuffle nodes instead of directly using X86ISD::UNPCKL/X86ISD::UNPCKH.
Craig Topper via llvm-commits
llvm-commits at lists.llvm.org
Fri Nov 9 16:26:42 PST 2018
Author: ctopper
Date: Fri Nov 9 16:26:42 2018
New Revision: 346574
URL: http://llvm.org/viewvc/llvm-project?rev=346574&view=rev
Log:
[X86] In LowerHorizontalByteSum, emit vector_shuffle nodes instead of directly using X86ISD::UNPCKL/X86ISD::UNPCKH.
This gives shuffle lowering the freedom to use zero_extend_vector_inreg for the unpckl shuffle. Shuffle combining usually makes this swap later, but not when AVX512 is enabled it seems.
While there also use DAG.getConstant to create a 0 vector instead of using the helper the forces a specific BUILD_VECTOR. I don't think that helper is usually needed. We're basically free to create a constant build_vector anytime and it will be legalized on its own.
Modified:
llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
llvm/trunk/test/CodeGen/X86/vector-popcnt-128.ll
llvm/trunk/test/CodeGen/X86/vector-tzcnt-128.ll
Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=346574&r1=346573&r2=346574&view=diff
==============================================================================
--- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
+++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Fri Nov 9 16:26:42 2018
@@ -25038,7 +25038,7 @@ static SDValue LowerHorizontalByteSum(SD
// PSADBW instruction horizontally add all bytes and leave the result in i64
// chunks, thus directly computes the pop count for v2i64 and v4i64.
if (EltVT == MVT::i64) {
- SDValue Zeros = getZeroVector(ByteVecVT, Subtarget, DAG, DL);
+ SDValue Zeros = DAG.getConstant(0, DL, ByteVecVT);
MVT SadVecVT = MVT::getVectorVT(MVT::i64, VecSize / 64);
V = DAG.getNode(X86ISD::PSADBW, DL, SadVecVT, V, Zeros);
return DAG.getBitcast(VT, V);
@@ -25050,13 +25050,13 @@ static SDValue LowerHorizontalByteSum(SD
// this is that it lines up the results of two PSADBW instructions to be
// two v2i64 vectors which concatenated are the 4 population counts. We can
// then use PACKUSWB to shrink and concatenate them into a v4i32 again.
- SDValue Zeros = getZeroVector(VT, Subtarget, DAG, DL);
+ SDValue Zeros = DAG.getConstant(0, DL, VT);
SDValue V32 = DAG.getBitcast(VT, V);
- SDValue Low = DAG.getNode(X86ISD::UNPCKL, DL, VT, V32, Zeros);
- SDValue High = DAG.getNode(X86ISD::UNPCKH, DL, VT, V32, Zeros);
+ SDValue Low = getUnpackl(DAG, DL, VT, V32, Zeros);
+ SDValue High = getUnpackh(DAG, DL, VT, V32, Zeros);
// Do the horizontal sums into two v2i64s.
- Zeros = getZeroVector(ByteVecVT, Subtarget, DAG, DL);
+ Zeros = DAG.getConstant(0, DL, ByteVecVT);
MVT SadVecVT = MVT::getVectorVT(MVT::i64, VecSize / 64);
Low = DAG.getNode(X86ISD::PSADBW, DL, SadVecVT,
DAG.getBitcast(ByteVecVT, Low), Zeros);
Modified: llvm/trunk/test/CodeGen/X86/vector-popcnt-128.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-popcnt-128.ll?rev=346574&r1=346573&r2=346574&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/vector-popcnt-128.ll (original)
+++ llvm/trunk/test/CodeGen/X86/vector-popcnt-128.ll Fri Nov 9 16:26:42 2018
@@ -308,7 +308,7 @@ define <4 x i32> @testv4i32(<4 x i32> %i
; BITALG-NEXT: vpxor %xmm1, %xmm1, %xmm1
; BITALG-NEXT: vpunpckhdq {{.*#+}} xmm2 = xmm0[2],xmm1[2],xmm0[3],xmm1[3]
; BITALG-NEXT: vpsadbw %xmm1, %xmm2, %xmm2
-; BITALG-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
+; BITALG-NEXT: vpmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero
; BITALG-NEXT: vpsadbw %xmm1, %xmm0, %xmm0
; BITALG-NEXT: vpackuswb %xmm2, %xmm0, %xmm0
; BITALG-NEXT: retq
Modified: llvm/trunk/test/CodeGen/X86/vector-tzcnt-128.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-tzcnt-128.ll?rev=346574&r1=346573&r2=346574&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/vector-tzcnt-128.ll (original)
+++ llvm/trunk/test/CodeGen/X86/vector-tzcnt-128.ll Fri Nov 9 16:26:42 2018
@@ -633,7 +633,7 @@ define <4 x i32> @testv4i32(<4 x i32> %i
; BITALG-NEXT: vpxor %xmm1, %xmm1, %xmm1
; BITALG-NEXT: vpunpckhdq {{.*#+}} xmm2 = xmm0[2],xmm1[2],xmm0[3],xmm1[3]
; BITALG-NEXT: vpsadbw %xmm1, %xmm2, %xmm2
-; BITALG-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
+; BITALG-NEXT: vpmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero
; BITALG-NEXT: vpsadbw %xmm1, %xmm0, %xmm0
; BITALG-NEXT: vpackuswb %xmm2, %xmm0, %xmm0
; BITALG-NEXT: retq
@@ -876,7 +876,7 @@ define <4 x i32> @testv4i32u(<4 x i32> %
; BITALG-NEXT: vpxor %xmm1, %xmm1, %xmm1
; BITALG-NEXT: vpunpckhdq {{.*#+}} xmm2 = xmm0[2],xmm1[2],xmm0[3],xmm1[3]
; BITALG-NEXT: vpsadbw %xmm1, %xmm2, %xmm2
-; BITALG-NEXT: vpunpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[1]
+; BITALG-NEXT: vpmovzxdq {{.*#+}} xmm0 = xmm0[0],zero,xmm0[1],zero
; BITALG-NEXT: vpsadbw %xmm1, %xmm0, %xmm0
; BITALG-NEXT: vpackuswb %xmm2, %xmm0, %xmm0
; BITALG-NEXT: retq
More information about the llvm-commits
mailing list