[llvm] r295155 - [X86] Don't create VBROADCAST nodes with 256-bit or 512-bit input types
Craig Topper via llvm-commits
llvm-commits at lists.llvm.org
Tue Feb 14 22:58:47 PST 2017
Author: ctopper
Date: Wed Feb 15 00:58:47 2017
New Revision: 295155
URL: http://llvm.org/viewvc/llvm-project?rev=295155&view=rev
Log:
[X86] Don't create VBROADCAST nodes with 256-bit or 512-bit input types
Summary:
We don't seem to have great rules on what a valid VBROADCAST node looks like. And as a consequence we end up with a lot of patterns to try to catch everything. We have patterns with scalar inputs, 128-bit vector inputs, 256-bit vector inputs, and 512-bit vector inputs.
As you can see from the things improved here we are currently missing patterns for 128-bit loads being extended to 256-bit before the vbroadcast.
I'd like to propose that VBROADCAST should always take a 128-bit vector type as input. As a first step towards that this patch adds an EXTRACT_SUBVECTOR in front of VBROADCAST when the input is 256 or 512-bits. In the future I would like to add scalar_to_vector around all the scalar operations. And maybe we should consider adding a VBROADCAST+load node to avoid separating loads from the broadcasting operation when the load itself isn't foldable.
This requires an additional change in target shuffle combining to look for the extract subvector and look through it to find the original operand. I'm sure this change isn't perfect but was enough to fix a few test failures that were being caused.
Another interesting thing I noticed is that the changes in masked_gather_scatter.ll show cases were we don't remove a useless insert into element 1 before broadcasting element 0.
Reviewers: delena, RKSimon, zvi
Reviewed By: zvi
Subscribers: igorb, llvm-commits
Differential Revision: https://reviews.llvm.org/D28747
Modified:
llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
llvm/trunk/test/CodeGen/X86/masked_gather_scatter.ll
llvm/trunk/test/CodeGen/X86/vector-shuffle-avx512.ll
llvm/trunk/test/CodeGen/X86/widened-broadcast.ll
Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=295155&r1=295154&r2=295155&view=diff
==============================================================================
--- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
+++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Wed Feb 15 00:58:47 2017
@@ -5394,8 +5394,18 @@ static bool getTargetShuffleMask(SDNode
IsUnary = true;
break;
case X86ISD::VBROADCAST: {
- // We only decode broadcasts of same-sized vectors at the moment.
- if (N->getOperand(0).getValueType() == VT) {
+ SDValue N0 = N->getOperand(0);
+ // See if we're broadcasting from index 0 of an EXTRACT_SUBVECTOR. If so,
+ // add the pre-extracted value to the Ops vector.
+ if (N0.getOpcode() == ISD::EXTRACT_SUBVECTOR &&
+ N0.getOperand(0).getValueType() == VT &&
+ N0.getConstantOperandVal(1) == 0)
+ Ops.push_back(N0.getOperand(0));
+
+ // We only decode broadcasts of same-sized vectors, unless the broadcast
+ // came from an extract from the original width. If we found one, we
+ // pushed it the Ops vector above.
+ if (N0.getValueType() == VT || !Ops.empty()) {
DecodeVectorBroadcast(VT, Mask);
IsUnary = true;
break;
@@ -9729,6 +9739,12 @@ static SDValue lowerVectorShuffleAsBroad
BroadcastVT = MVT::getVectorVT(MVT::f64, NumBroadcastElts);
}
+ // We only support broadcasting from 128-bit vectors to minimize the
+ // number of patterns we need to deal with in isel. So extract down to
+ // 128-bits.
+ if (SrcVT.getSizeInBits() > 128)
+ V = extract128BitVector(V, 0, DAG, DL);
+
return DAG.getBitcast(VT, DAG.getNode(Opcode, DL, BroadcastVT, V));
}
Modified: llvm/trunk/test/CodeGen/X86/masked_gather_scatter.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/masked_gather_scatter.ll?rev=295155&r1=295154&r2=295155&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/masked_gather_scatter.ll (original)
+++ llvm/trunk/test/CodeGen/X86/masked_gather_scatter.ll Wed Feb 15 00:58:47 2017
@@ -714,8 +714,7 @@ define <16 x float> @test13(float* %base
define <16 x float> @test14(float* %base, i32 %ind, <16 x float*> %vec) {
; KNL_64-LABEL: test14:
; KNL_64: # BB#0:
-; KNL_64-NEXT: vpinsrq $1, %rdi, %xmm0, %xmm1
-; KNL_64-NEXT: vinserti32x4 $0, %xmm1, %zmm0, %zmm0
+; KNL_64-NEXT: vpinsrq $1, %rdi, %xmm0, %xmm0
; KNL_64-NEXT: vpbroadcastq %xmm0, %zmm0
; KNL_64-NEXT: vmovd %esi, %xmm1
; KNL_64-NEXT: vpbroadcastd %xmm1, %ymm1
@@ -731,8 +730,7 @@ define <16 x float> @test14(float* %base
;
; KNL_32-LABEL: test14:
; KNL_32: # BB#0:
-; KNL_32-NEXT: vpinsrd $1, {{[0-9]+}}(%esp), %xmm0, %xmm1
-; KNL_32-NEXT: vinserti32x4 $0, %xmm1, %zmm0, %zmm0
+; KNL_32-NEXT: vpinsrd $1, {{[0-9]+}}(%esp), %xmm0, %xmm0
; KNL_32-NEXT: vpbroadcastd %xmm0, %zmm0
; KNL_32-NEXT: vpslld $2, {{[0-9]+}}(%esp){1to16}, %zmm1
; KNL_32-NEXT: vpaddd %zmm1, %zmm0, %zmm1
@@ -742,8 +740,7 @@ define <16 x float> @test14(float* %base
;
; SKX-LABEL: test14:
; SKX: # BB#0:
-; SKX-NEXT: vpinsrq $1, %rdi, %xmm0, %xmm1
-; SKX-NEXT: vinserti64x2 $0, %xmm1, %zmm0, %zmm0
+; SKX-NEXT: vpinsrq $1, %rdi, %xmm0, %xmm0
; SKX-NEXT: vpbroadcastq %xmm0, %zmm0
; SKX-NEXT: vpbroadcastd %esi, %ymm1
; SKX-NEXT: vpmovsxdq %ymm1, %zmm1
@@ -758,8 +755,7 @@ define <16 x float> @test14(float* %base
;
; SKX_32-LABEL: test14:
; SKX_32: # BB#0:
-; SKX_32-NEXT: vpinsrd $1, {{[0-9]+}}(%esp), %xmm0, %xmm1
-; SKX_32-NEXT: vinserti32x4 $0, %xmm1, %zmm0, %zmm0
+; SKX_32-NEXT: vpinsrd $1, {{[0-9]+}}(%esp), %xmm0, %xmm0
; SKX_32-NEXT: vpbroadcastd %xmm0, %zmm0
; SKX_32-NEXT: vpslld $2, {{[0-9]+}}(%esp){1to16}, %zmm1
; SKX_32-NEXT: vpaddd %zmm1, %zmm0, %zmm1
Modified: llvm/trunk/test/CodeGen/X86/vector-shuffle-avx512.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-shuffle-avx512.ll?rev=295155&r1=295154&r2=295155&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/vector-shuffle-avx512.ll (original)
+++ llvm/trunk/test/CodeGen/X86/vector-shuffle-avx512.ll Wed Feb 15 00:58:47 2017
@@ -126,7 +126,6 @@ define <8 x i32> @expand3(<4 x i32> %a )
;
; KNL64-LABEL: expand3:
; KNL64: # BB#0:
-; KNL64-NEXT: # kill: %XMM0<def> %XMM0<kill> %YMM0<def>
; KNL64-NEXT: vpbroadcastq %xmm0, %ymm0
; KNL64-NEXT: vpxor %ymm1, %ymm1, %ymm1
; KNL64-NEXT: vpblendd {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3,4,5,6],ymm0[7]
@@ -142,7 +141,6 @@ define <8 x i32> @expand3(<4 x i32> %a )
;
; KNL32-LABEL: expand3:
; KNL32: # BB#0:
-; KNL32-NEXT: # kill: %XMM0<def> %XMM0<kill> %YMM0<def>
; KNL32-NEXT: vpbroadcastq %xmm0, %ymm0
; KNL32-NEXT: vpxor %ymm1, %ymm1, %ymm1
; KNL32-NEXT: vpblendd {{.*#+}} ymm0 = ymm0[0],ymm1[1,2,3,4,5,6],ymm0[7]
Modified: llvm/trunk/test/CodeGen/X86/widened-broadcast.ll
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/widened-broadcast.ll?rev=295155&r1=295154&r2=295155&view=diff
==============================================================================
--- llvm/trunk/test/CodeGen/X86/widened-broadcast.ll (original)
+++ llvm/trunk/test/CodeGen/X86/widened-broadcast.ll Wed Feb 15 00:58:47 2017
@@ -51,14 +51,12 @@ define <8 x float> @load_splat_8f32_4f32
;
; AVX2-LABEL: load_splat_8f32_4f32_01010101:
; AVX2: # BB#0: # %entry
-; AVX2-NEXT: vmovaps (%rdi), %xmm0
-; AVX2-NEXT: vbroadcastsd %xmm0, %ymm0
+; AVX2-NEXT: vbroadcastsd (%rdi), %ymm0
; AVX2-NEXT: retq
;
; AVX512-LABEL: load_splat_8f32_4f32_01010101:
; AVX512: # BB#0: # %entry
-; AVX512-NEXT: vmovaps (%rdi), %xmm0
-; AVX512-NEXT: vbroadcastsd %xmm0, %ymm0
+; AVX512-NEXT: vbroadcastsd (%rdi), %ymm0
; AVX512-NEXT: retq
entry:
%ld = load <4 x float>, <4 x float>* %ptr
@@ -131,14 +129,12 @@ define <8 x i32> @load_splat_8i32_4i32_0
;
; AVX2-LABEL: load_splat_8i32_4i32_01010101:
; AVX2: # BB#0: # %entry
-; AVX2-NEXT: vmovaps (%rdi), %xmm0
-; AVX2-NEXT: vbroadcastsd %xmm0, %ymm0
+; AVX2-NEXT: vbroadcastsd (%rdi), %ymm0
; AVX2-NEXT: retq
;
; AVX512-LABEL: load_splat_8i32_4i32_01010101:
; AVX512: # BB#0: # %entry
-; AVX512-NEXT: vmovaps (%rdi), %xmm0
-; AVX512-NEXT: vbroadcastsd %xmm0, %ymm0
+; AVX512-NEXT: vbroadcastsd (%rdi), %ymm0
; AVX512-NEXT: retq
entry:
%ld = load <4 x i32>, <4 x i32>* %ptr
@@ -242,14 +238,12 @@ define <16 x i16> @load_splat_16i16_8i16
;
; AVX2-LABEL: load_splat_16i16_8i16_0101010101010101:
; AVX2: # BB#0: # %entry
-; AVX2-NEXT: vmovaps (%rdi), %xmm0
-; AVX2-NEXT: vbroadcastss %xmm0, %ymm0
+; AVX2-NEXT: vbroadcastss (%rdi), %ymm0
; AVX2-NEXT: retq
;
; AVX512-LABEL: load_splat_16i16_8i16_0101010101010101:
; AVX512: # BB#0: # %entry
-; AVX512-NEXT: vmovaps (%rdi), %xmm0
-; AVX512-NEXT: vbroadcastss %xmm0, %ymm0
+; AVX512-NEXT: vbroadcastss (%rdi), %ymm0
; AVX512-NEXT: retq
entry:
%ld = load <8 x i16>, <8 x i16>* %ptr
@@ -272,14 +266,12 @@ define <16 x i16> @load_splat_16i16_8i16
;
; AVX2-LABEL: load_splat_16i16_8i16_0123012301230123:
; AVX2: # BB#0: # %entry
-; AVX2-NEXT: vmovaps (%rdi), %xmm0
-; AVX2-NEXT: vbroadcastsd %xmm0, %ymm0
+; AVX2-NEXT: vbroadcastsd (%rdi), %ymm0
; AVX2-NEXT: retq
;
; AVX512-LABEL: load_splat_16i16_8i16_0123012301230123:
; AVX512: # BB#0: # %entry
-; AVX512-NEXT: vmovaps (%rdi), %xmm0
-; AVX512-NEXT: vbroadcastsd %xmm0, %ymm0
+; AVX512-NEXT: vbroadcastsd (%rdi), %ymm0
; AVX512-NEXT: retq
entry:
%ld = load <8 x i16>, <8 x i16>* %ptr
@@ -442,14 +434,12 @@ define <32 x i8> @load_splat_32i8_16i8_0
;
; AVX2-LABEL: load_splat_32i8_16i8_01010101010101010101010101010101:
; AVX2: # BB#0: # %entry
-; AVX2-NEXT: vmovdqa (%rdi), %xmm0
-; AVX2-NEXT: vpbroadcastw %xmm0, %ymm0
+; AVX2-NEXT: vpbroadcastw (%rdi), %ymm0
; AVX2-NEXT: retq
;
; AVX512-LABEL: load_splat_32i8_16i8_01010101010101010101010101010101:
; AVX512: # BB#0: # %entry
-; AVX512-NEXT: vmovdqa (%rdi), %xmm0
-; AVX512-NEXT: vpbroadcastw %xmm0, %ymm0
+; AVX512-NEXT: vpbroadcastw (%rdi), %ymm0
; AVX512-NEXT: retq
entry:
%ld = load <16 x i8>, <16 x i8>* %ptr
@@ -472,14 +462,12 @@ define <32 x i8> @load_splat_32i8_16i8_0
;
; AVX2-LABEL: load_splat_32i8_16i8_01230123012301230123012301230123:
; AVX2: # BB#0: # %entry
-; AVX2-NEXT: vmovaps (%rdi), %xmm0
-; AVX2-NEXT: vbroadcastss %xmm0, %ymm0
+; AVX2-NEXT: vbroadcastss (%rdi), %ymm0
; AVX2-NEXT: retq
;
; AVX512-LABEL: load_splat_32i8_16i8_01230123012301230123012301230123:
; AVX512: # BB#0: # %entry
-; AVX512-NEXT: vmovaps (%rdi), %xmm0
-; AVX512-NEXT: vbroadcastss %xmm0, %ymm0
+; AVX512-NEXT: vbroadcastss (%rdi), %ymm0
; AVX512-NEXT: retq
entry:
%ld = load <16 x i8>, <16 x i8>* %ptr
@@ -502,14 +490,12 @@ define <32 x i8> @load_splat_32i8_16i8_0
;
; AVX2-LABEL: load_splat_32i8_16i8_01234567012345670123456701234567:
; AVX2: # BB#0: # %entry
-; AVX2-NEXT: vmovaps (%rdi), %xmm0
-; AVX2-NEXT: vbroadcastsd %xmm0, %ymm0
+; AVX2-NEXT: vbroadcastsd (%rdi), %ymm0
; AVX2-NEXT: retq
;
; AVX512-LABEL: load_splat_32i8_16i8_01234567012345670123456701234567:
; AVX512: # BB#0: # %entry
-; AVX512-NEXT: vmovaps (%rdi), %xmm0
-; AVX512-NEXT: vbroadcastsd %xmm0, %ymm0
+; AVX512-NEXT: vbroadcastsd (%rdi), %ymm0
; AVX512-NEXT: retq
entry:
%ld = load <16 x i8>, <16 x i8>* %ptr
More information about the llvm-commits
mailing list