<div dir="ltr">Hi Simon,<div><br></div><div>This is causing a legalization infinite loop on the following code (only with AVX. SSE or AVX2 are ok):<br><br><div>target datalayout = "e-m:e-i64:64-f80:128-n8:16:<wbr>32:64-S128"</div><div>target triple = "x86_64-pc-linux"</div><div><br></div><div>define void @foo(double* %p, <4 x double>* %q) #0 {</div><div>entry:</div><div> %0 = load double, double* %p, align 8</div><div> %1 = insertelement <4 x double> zeroinitializer, double %0, i32 1</div><div> %2 = insertelement <4 x double> %1, double %0, i32 2</div><div> %3 = insertelement <4 x double> %2, double %0, i32 3</div><div> store <4 x double> %3, <4 x double>* %q, align 16</div><div> ret void</div><div>}</div><div><br></div><div>attributes #0 = { norecurse nounwind "target-features"="+avx" }</div></div><div><br></div><div>We get a legalization oscillation that looks something like:</div><div><br></div><div><div>Legalizing: t19698: v4f64 = vector_shuffle<0,1,1,1> t19697, undef:v4f64</div><div> ... replacing: t19698: v4f64 = vector_shuffle<0,1,1,1> t19697, undef:v4f64</div><div> with: t19705: v4f64 = BUILD_VECTOR ConstantFP:f64<0.000000e+00>, t7, t7, t7</div><div><br></div><div>Legalizing: t19705: v4f64 = BUILD_VECTOR ConstantFP:f64<0.000000e+00>, t7, t7, t7</div><div> ... replacing: t19705: v4f64 = BUILD_VECTOR ConstantFP:f64<0.000000e+00>, t7, t7, t7</div><div> with: t19708: v4f64 = vector_shuffle<0,1,1,1> t19707, undef:v4f64</div></div><div><br></div><div>I'm not entirely sure what the right thing to do here is, so I'm going to revert this for now.</div><div><br></div><div>Thanks,</div><div> Michael</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Apr 3, 2017 at 2:06 PM, Simon Pilgrim via llvm-commits <span dir="ltr"><<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Author: rksimon<br>
Date: Mon Apr 3 16:06:51 2017<br>
New Revision: 299387<br>
<br>
URL: <a href="http://llvm.org/viewvc/llvm-project?rev=299387&view=rev" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project?rev=299387&view=rev</a><br>
Log:<br>
[X86][SSE]] Lower BUILD_VECTOR with repeated elts as BUILD_VECTOR + VECTOR_SHUFFLE<br>
<br>
It can be costly to transfer from the gprs to the xmm registers and can prevent loads merging.<br>
<br>
This patch splits vXi16/vXi32/vXi64 BUILD_VECTORS that use the same operand in multiple elements into a BUILD_VECTOR with only a single insertion of each of those elements and then performs an unary shuffle to duplicate the values.<br>
<br>
There are a couple of minor regressions this patch unearths due to some missing MOVDDUP/BROADCAST folds that I will address in a future patch.<br>
<br>
Note: Now that vector shuffle lowering and combining is pretty good we should be reusing that instead of duplicating so much in LowerBUILD_VECTOR - this is the first of several patches to address this.<br>
<br>
Differential Revision: <a href="https://reviews.llvm.org/D31373" rel="noreferrer" target="_blank">https://reviews.llvm.org/<wbr>D31373</a><br>
<br>
Modified:<br>
llvm/trunk/lib/Target/X86/<wbr>X86ISelLowering.cpp<br>
llvm/trunk/test/CodeGen/X86/<wbr>avx-intrinsics-fast-isel.ll<br>
llvm/trunk/test/CodeGen/X86/<wbr>avx-vbroadcast.ll<br>
llvm/trunk/test/CodeGen/X86/<wbr>avx2-vbroadcast.ll<br>
llvm/trunk/test/CodeGen/X86/<wbr>merge-consecutive-loads-128.ll<br>
llvm/trunk/test/CodeGen/X86/<wbr>sse2-intrinsics-fast-isel.ll<br>
llvm/trunk/test/CodeGen/X86/<wbr>vec_fp_to_int.ll<br>
llvm/trunk/test/CodeGen/X86/<wbr>vec_int_to_fp.ll<br>
llvm/trunk/test/CodeGen/X86/<wbr>vector-sext.ll<br>
llvm/trunk/test/CodeGen/X86/<wbr>vector-shuffle-combining-xop.<wbr>ll<br>
llvm/trunk/test/CodeGen/X86/<wbr>vshift-1.ll<br>
llvm/trunk/test/CodeGen/X86/<wbr>vshift-2.ll<br>
<br>
Modified: llvm/trunk/lib/Target/X86/<wbr>X86ISelLowering.cpp<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=299387&r1=299386&r2=299387&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project/llvm/trunk/lib/Target/<wbr>X86/X86ISelLowering.cpp?rev=<wbr>299387&r1=299386&r2=299387&<wbr>view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/lib/Target/X86/<wbr>X86ISelLowering.cpp (original)<br>
+++ llvm/trunk/lib/Target/X86/<wbr>X86ISelLowering.cpp Mon Apr 3 16:06:51 2017<br>
@@ -6120,6 +6120,54 @@ static SDValue getShuffleScalarElt(SDNod<br>
return SDValue();<br>
}<br>
<br>
+// Attempt to lower a build vector of repeated elts as a build vector of unique<br>
+// ops followed by a shuffle.<br>
+static SDValue<br>
+<wbr>lowerBuildVectorWithRepeatedEl<wbr>tsUsingShuffle(SDValue V, SelectionDAG &DAG,<br>
+ const X86Subtarget &Subtarget) {<br>
+ MVT VT = V.getSimpleValueType();<br>
+ unsigned NumElts = VT.getVectorNumElements();<br>
+<br>
+ // TODO - vXi8 insertions+shuffles often cause PSHUFBs which can lead to<br>
+ // excessive/bulky shuffle mask creation.<br>
+ if (VT.getScalarSizeInBits() < 16)<br>
+ return SDValue();<br>
+<br>
+ // Create list of unique operands to be passed to a build vector and a shuffle<br>
+ // mask describing the repetitions.<br>
+ // TODO - we currently insert the first occurances in place - sometimes it<br>
+ // might be better to insert them in other locations for shuffle efficiency.<br>
+ bool HasRepeatedElts = false;<br>
+ SmallVector<int, 16> Mask(NumElts, SM_SentinelUndef);<br>
+ SmallVector<SDValue, 16> Uniques(V->op_begin(), V->op_end());<br>
+ for (unsigned i = 0; i != NumElts; ++i) {<br>
+ SDValue Op = Uniques[i];<br>
+ if (Op.isUndef())<br>
+ continue;<br>
+ Mask[i] = i;<br>
+<br>
+ // Zeros can be efficiently repeated, so don't shuffle these.<br>
+ if (X86::isZeroNode(Op))<br>
+ continue;<br>
+<br>
+ // If any repeated operands are found then mark the build vector entry as<br>
+ // undef and setup a copy in the shuffle mask.<br>
+ for (unsigned j = i + 1; j != NumElts; ++j)<br>
+ if (Op == Uniques[j]) {<br>
+ HasRepeatedElts = true;<br>
+ Mask[j] = i;<br>
+ Uniques[j] = DAG.getUNDEF(VT.getScalarType(<wbr>));<br>
+ }<br>
+ }<br>
+<br>
+ if (!HasRepeatedElts)<br>
+ return SDValue();<br>
+<br>
+ SDLoc DL(V);<br>
+ return DAG.getVectorShuffle(VT, DL, DAG.getBuildVector(VT, DL, Uniques),<br>
+ DAG.getUNDEF(VT), Mask);<br>
+}<br>
+<br>
/// Custom lower build_vector of v16i8.<br>
static SDValue LowerBuildVectorv16i8(SDValue Op, unsigned NonZeros,<br>
unsigned NumNonZero, unsigned NumZero,<br>
@@ -7752,11 +7800,17 @@ X86TargetLowering::LowerBUILD_<wbr>VECTOR(SDV<br>
if (IsAllConstants)<br>
return SDValue();<br>
<br>
- // See if we can use a vector load to get all of the elements.<br>
if (VT.is128BitVector() || VT.is256BitVector() || VT.is512BitVector()) {<br>
+ // See if we can use a vector load to get all of the elements.<br>
SmallVector<SDValue, 64> Ops(Op->op_begin(), Op->op_begin() + NumElems);<br>
if (SDValue LD = EltsFromConsecutiveLoads(VT, Ops, dl, DAG, false))<br>
return LD;<br>
+<br>
+ // Attempt to lower a build vector of repeated elts as single insertions<br>
+ // followed by a shuffle.<br>
+ if (SDValue V =<br>
+ lowerBuildVectorWithRepeatedEl<wbr>tsUsingShuffle(Op, DAG, Subtarget))<br>
+ return V;<br>
}<br>
<br>
// For AVX-length vectors, build the individual 128-bit pieces and use<br>
<br>
Modified: llvm/trunk/test/CodeGen/X86/<wbr>avx-intrinsics-fast-isel.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-intrinsics-fast-isel.ll?rev=299387&r1=299386&r2=299387&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project/llvm/trunk/test/<wbr>CodeGen/X86/avx-intrinsics-<wbr>fast-isel.ll?rev=299387&r1=<wbr>299386&r2=299387&view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/test/CodeGen/X86/<wbr>avx-intrinsics-fast-isel.ll (original)<br>
+++ llvm/trunk/test/CodeGen/X86/<wbr>avx-intrinsics-fast-isel.ll Mon Apr 3 16:06:51 2017<br>
@@ -2425,12 +2425,9 @@ define <4 x i64> @test_mm256_set1_epi32(<br>
define <4 x i64> @test_mm256_set1_epi64x(i64 %a0) nounwind {<br>
; X32-LABEL: test_mm256_set1_epi64x:<br>
; X32: # BB#0:<br>
-; X32-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
-; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
-; X32-NEXT: vmovd %ecx, %xmm0<br>
-; X32-NEXT: vpinsrd $1, %eax, %xmm0, %xmm0<br>
-; X32-NEXT: vpinsrd $2, %ecx, %xmm0, %xmm0<br>
-; X32-NEXT: vpinsrd $3, %eax, %xmm0, %xmm0<br>
+; X32-NEXT: vmovd {{.*#+}} xmm0 = mem[0],zero,zero,zero<br>
+; X32-NEXT: vpinsrd $1, {{[0-9]+}}(%esp), %xmm0, %xmm0<br>
+; X32-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]<br>
; X32-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0<br>
; X32-NEXT: retl<br>
;<br>
<br>
Modified: llvm/trunk/test/CodeGen/X86/<wbr>avx-vbroadcast.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-vbroadcast.ll?rev=299387&r1=299386&r2=299387&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project/llvm/trunk/test/<wbr>CodeGen/X86/avx-vbroadcast.ll?<wbr>rev=299387&r1=299386&r2=<wbr>299387&view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/test/CodeGen/X86/<wbr>avx-vbroadcast.ll (original)<br>
+++ llvm/trunk/test/CodeGen/X86/<wbr>avx-vbroadcast.ll Mon Apr 3 16:06:51 2017<br>
@@ -6,12 +6,8 @@ define <4 x i64> @A(i64* %ptr) nounwind<br>
; X32-LABEL: A:<br>
; X32: ## BB#0: ## %entry<br>
; X32-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
-; X32-NEXT: movl (%eax), %ecx<br>
-; X32-NEXT: movl 4(%eax), %eax<br>
-; X32-NEXT: vmovd %ecx, %xmm0<br>
-; X32-NEXT: vpinsrd $1, %eax, %xmm0, %xmm0<br>
-; X32-NEXT: vpinsrd $2, %ecx, %xmm0, %xmm0<br>
-; X32-NEXT: vpinsrd $3, %eax, %xmm0, %xmm0<br>
+; X32-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero<br>
+; X32-NEXT: vmovddup {{.*#+}} xmm0 = xmm0[0,0]<br>
; X32-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0<br>
; X32-NEXT: retl<br>
;<br>
@@ -31,17 +27,21 @@ entry:<br>
define <4 x i64> @A2(i64* %ptr, i64* %ptr2) nounwind uwtable readnone ssp {<br>
; X32-LABEL: A2:<br>
; X32: ## BB#0: ## %entry<br>
+; X32-NEXT: pushl %esi<br>
+; X32-NEXT: Lcfi0:<br>
+; X32-NEXT: .cfi_def_cfa_offset 8<br>
+; X32-NEXT: Lcfi1:<br>
+; X32-NEXT: .cfi_offset %esi, -8<br>
; X32-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
; X32-NEXT: movl (%ecx), %edx<br>
-; X32-NEXT: movl 4(%ecx), %ecx<br>
-; X32-NEXT: movl %ecx, 4(%eax)<br>
+; X32-NEXT: movl 4(%ecx), %esi<br>
+; X32-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero<br>
+; X32-NEXT: movl %esi, 4(%eax)<br>
; X32-NEXT: movl %edx, (%eax)<br>
-; X32-NEXT: vmovd %edx, %xmm0<br>
-; X32-NEXT: vpinsrd $1, %ecx, %xmm0, %xmm0<br>
-; X32-NEXT: vpinsrd $2, %edx, %xmm0, %xmm0<br>
-; X32-NEXT: vpinsrd $3, %ecx, %xmm0, %xmm0<br>
+; X32-NEXT: vmovddup {{.*#+}} xmm0 = xmm0[0,0]<br>
; X32-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0<br>
+; X32-NEXT: popl %esi<br>
; X32-NEXT: retl<br>
;<br>
; X64-LABEL: A2:<br>
@@ -592,12 +592,8 @@ define <2 x i64> @G(i64* %ptr) nounwind<br>
; X32-LABEL: G:<br>
; X32: ## BB#0: ## %entry<br>
; X32-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
-; X32-NEXT: movl (%eax), %ecx<br>
-; X32-NEXT: movl 4(%eax), %eax<br>
-; X32-NEXT: vmovd %ecx, %xmm0<br>
-; X32-NEXT: vpinsrd $1, %eax, %xmm0, %xmm0<br>
-; X32-NEXT: vpinsrd $2, %ecx, %xmm0, %xmm0<br>
-; X32-NEXT: vpinsrd $3, %eax, %xmm0, %xmm0<br>
+; X32-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero<br>
+; X32-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]<br>
; X32-NEXT: retl<br>
;<br>
; X64-LABEL: G:<br>
@@ -615,16 +611,20 @@ entry:<br>
define <2 x i64> @G2(i64* %ptr, i64* %ptr2) nounwind uwtable readnone ssp {<br>
; X32-LABEL: G2:<br>
; X32: ## BB#0: ## %entry<br>
+; X32-NEXT: pushl %esi<br>
+; X32-NEXT: Lcfi2:<br>
+; X32-NEXT: .cfi_def_cfa_offset 8<br>
+; X32-NEXT: Lcfi3:<br>
+; X32-NEXT: .cfi_offset %esi, -8<br>
; X32-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
; X32-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
; X32-NEXT: movl (%ecx), %edx<br>
-; X32-NEXT: movl 4(%ecx), %ecx<br>
-; X32-NEXT: movl %ecx, 4(%eax)<br>
+; X32-NEXT: movl 4(%ecx), %esi<br>
+; X32-NEXT: vmovq {{.*#+}} xmm0 = mem[0],zero<br>
+; X32-NEXT: movl %esi, 4(%eax)<br>
; X32-NEXT: movl %edx, (%eax)<br>
-; X32-NEXT: vmovd %edx, %xmm0<br>
-; X32-NEXT: vpinsrd $1, %ecx, %xmm0, %xmm0<br>
-; X32-NEXT: vpinsrd $2, %edx, %xmm0, %xmm0<br>
-; X32-NEXT: vpinsrd $3, %ecx, %xmm0, %xmm0<br>
+; X32-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]<br>
+; X32-NEXT: popl %esi<br>
; X32-NEXT: retl<br>
;<br>
; X64-LABEL: G2:<br>
<br>
Modified: llvm/trunk/test/CodeGen/X86/<wbr>avx2-vbroadcast.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx2-vbroadcast.ll?rev=299387&r1=299386&r2=299387&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project/llvm/trunk/test/<wbr>CodeGen/X86/avx2-vbroadcast.<wbr>ll?rev=299387&r1=299386&r2=<wbr>299387&view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/test/CodeGen/X86/<wbr>avx2-vbroadcast.ll (original)<br>
+++ llvm/trunk/test/CodeGen/X86/<wbr>avx2-vbroadcast.ll Mon Apr 3 16:06:51 2017<br>
@@ -189,12 +189,7 @@ define <2 x i64> @Q64(i64* %ptr) nounwin<br>
; X32-LABEL: Q64:<br>
; X32: ## BB#0: ## %entry<br>
; X32-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
-; X32-NEXT: movl (%eax), %ecx<br>
-; X32-NEXT: movl 4(%eax), %eax<br>
-; X32-NEXT: vmovd %ecx, %xmm0<br>
-; X32-NEXT: vpinsrd $1, %eax, %xmm0, %xmm0<br>
-; X32-NEXT: vpinsrd $2, %ecx, %xmm0, %xmm0<br>
-; X32-NEXT: vpinsrd $3, %eax, %xmm0, %xmm0<br>
+; X32-NEXT: vpbroadcastq (%eax), %xmm0<br>
; X32-NEXT: retl<br>
;<br>
; X64-LABEL: Q64:<br>
@@ -212,13 +207,8 @@ define <4 x i64> @QQ64(i64* %ptr) nounwi<br>
; X32-LABEL: QQ64:<br>
; X32: ## BB#0: ## %entry<br>
; X32-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
-; X32-NEXT: movl (%eax), %ecx<br>
-; X32-NEXT: movl 4(%eax), %eax<br>
-; X32-NEXT: vmovd %ecx, %xmm0<br>
-; X32-NEXT: vpinsrd $1, %eax, %xmm0, %xmm0<br>
-; X32-NEXT: vpinsrd $2, %ecx, %xmm0, %xmm0<br>
-; X32-NEXT: vpinsrd $3, %eax, %xmm0, %xmm0<br>
-; X32-NEXT: vinserti128 $1, %xmm0, %ymm0, %ymm0<br>
+; X32-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero<br>
+; X32-NEXT: vbroadcastsd %xmm0, %ymm0<br>
; X32-NEXT: retl<br>
;<br>
; X64-LABEL: QQ64:<br>
@@ -1440,12 +1430,8 @@ define void @isel_crash_2q(i64* %cV_R.ad<br>
; X32-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
; X32-NEXT: vxorps %xmm0, %xmm0, %xmm0<br>
; X32-NEXT: vmovaps %xmm0, (%esp)<br>
-; X32-NEXT: movl (%eax), %ecx<br>
-; X32-NEXT: movl 4(%eax), %eax<br>
-; X32-NEXT: vmovd %ecx, %xmm1<br>
-; X32-NEXT: vpinsrd $1, %eax, %xmm1, %xmm1<br>
-; X32-NEXT: vpinsrd $2, %ecx, %xmm1, %xmm1<br>
-; X32-NEXT: vpinsrd $3, %eax, %xmm1, %xmm1<br>
+; X32-NEXT: vmovq {{.*#+}} xmm1 = mem[0],zero<br>
+; X32-NEXT: vpbroadcastq %xmm1, %xmm1<br>
; X32-NEXT: vmovaps %xmm0, {{[0-9]+}}(%esp)<br>
; X32-NEXT: vmovdqa %xmm1, {{[0-9]+}}(%esp)<br>
; X32-NEXT: addl $60, %esp<br>
@@ -1501,15 +1487,10 @@ define void @isel_crash_4q(i64* %cV_R.ad<br>
; X32-NEXT: movl 8(%ebp), %eax<br>
; X32-NEXT: vxorps %ymm0, %ymm0, %ymm0<br>
; X32-NEXT: vmovaps %ymm0, (%esp)<br>
-; X32-NEXT: movl (%eax), %ecx<br>
-; X32-NEXT: movl 4(%eax), %eax<br>
-; X32-NEXT: vmovd %ecx, %xmm1<br>
-; X32-NEXT: vpinsrd $1, %eax, %xmm1, %xmm1<br>
-; X32-NEXT: vpinsrd $2, %ecx, %xmm1, %xmm1<br>
-; X32-NEXT: vpinsrd $3, %eax, %xmm1, %xmm1<br>
-; X32-NEXT: vinserti128 $1, %xmm1, %ymm1, %ymm1<br>
+; X32-NEXT: vmovsd {{.*#+}} xmm1 = mem[0],zero<br>
+; X32-NEXT: vbroadcastsd %xmm1, %ymm1<br>
; X32-NEXT: vmovaps %ymm0, {{[0-9]+}}(%esp)<br>
-; X32-NEXT: vmovdqa %ymm1, {{[0-9]+}}(%esp)<br>
+; X32-NEXT: vmovaps %ymm1, {{[0-9]+}}(%esp)<br>
; X32-NEXT: movl %ebp, %esp<br>
; X32-NEXT: popl %ebp<br>
; X32-NEXT: vzeroupper<br>
<br>
Modified: llvm/trunk/test/CodeGen/X86/<wbr>merge-consecutive-loads-128.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/merge-consecutive-loads-128.ll?rev=299387&r1=299386&r2=299387&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project/llvm/trunk/test/<wbr>CodeGen/X86/merge-consecutive-<wbr>loads-128.ll?rev=299387&r1=<wbr>299386&r2=299387&view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/test/CodeGen/X86/<wbr>merge-consecutive-loads-128.ll (original)<br>
+++ llvm/trunk/test/CodeGen/X86/<wbr>merge-consecutive-loads-128.ll Mon Apr 3 16:06:51 2017<br>
@@ -1102,28 +1102,44 @@ define <4 x float> @merge_4f32_f32_2345_<br>
;<br>
<br>
define <4 x float> @merge_4f32_f32_X0YY(float* %ptr0, float* %ptr1) nounwind uwtable noinline ssp {<br>
-; SSE-LABEL: merge_4f32_f32_X0YY:<br>
-; SSE: # BB#0:<br>
-; SSE-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero<br>
-; SSE-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero<br>
-; SSE-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0,0]<br>
-; SSE-NEXT: retq<br>
+; SSE2-LABEL: merge_4f32_f32_X0YY:<br>
+; SSE2: # BB#0:<br>
+; SSE2-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero<br>
+; SSE2-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero<br>
+; SSE2-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0,0]<br>
+; SSE2-NEXT: retq<br>
+;<br>
+; SSE41-LABEL: merge_4f32_f32_X0YY:<br>
+; SSE41: # BB#0:<br>
+; SSE41-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero<br>
+; SSE41-NEXT: insertps {{.*#+}} xmm0 = xmm0[0],zero,mem[0],zero<br>
+; SSE41-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,1,2,2]<br>
+; SSE41-NEXT: retq<br>
;<br>
; AVX-LABEL: merge_4f32_f32_X0YY:<br>
; AVX: # BB#0:<br>
; AVX-NEXT: vmovss {{.*#+}} xmm0 = mem[0],zero,zero,zero<br>
-; AVX-NEXT: vmovss {{.*#+}} xmm1 = mem[0],zero,zero,zero<br>
-; AVX-NEXT: vshufps {{.*#+}} xmm0 = xmm1[0,1],xmm0[0,0]<br>
+; AVX-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],zero,mem[0],zero<br>
+; AVX-NEXT: vpermilps {{.*#+}} xmm0 = xmm0[0,1,2,2]<br>
; AVX-NEXT: retq<br>
;<br>
-; X32-SSE-LABEL: merge_4f32_f32_X0YY:<br>
-; X32-SSE: # BB#0:<br>
-; X32-SSE-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
-; X32-SSE-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
-; X32-SSE-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero<br>
-; X32-SSE-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero<br>
-; X32-SSE-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0,0]<br>
-; X32-SSE-NEXT: retl<br>
+; X32-SSE1-LABEL: merge_4f32_f32_X0YY:<br>
+; X32-SSE1: # BB#0:<br>
+; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; X32-SSE1-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; X32-SSE1-NEXT: movss {{.*#+}} xmm1 = mem[0],zero,zero,zero<br>
+; X32-SSE1-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero<br>
+; X32-SSE1-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,1],xmm1[0,0]<br>
+; X32-SSE1-NEXT: retl<br>
+;<br>
+; X32-SSE41-LABEL: merge_4f32_f32_X0YY:<br>
+; X32-SSE41: # BB#0:<br>
+; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
+; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %ecx<br>
+; X32-SSE41-NEXT: movss {{.*#+}} xmm0 = mem[0],zero,zero,zero<br>
+; X32-SSE41-NEXT: insertps {{.*#+}} xmm0 = xmm0[0],zero,mem[0],zero<br>
+; X32-SSE41-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,1,2,2]<br>
+; X32-SSE41-NEXT: retl<br>
%val0 = load float, float* %ptr0, align 4<br>
%val1 = load float, float* %ptr1, align 4<br>
%res0 = insertelement <4 x float> undef, float %val0, i32 0<br>
<br>
Modified: llvm/trunk/test/CodeGen/X86/<wbr>sse2-intrinsics-fast-isel.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/sse2-intrinsics-fast-isel.ll?rev=299387&r1=299386&r2=299387&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project/llvm/trunk/test/<wbr>CodeGen/X86/sse2-intrinsics-<wbr>fast-isel.ll?rev=299387&r1=<wbr>299386&r2=299387&view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/test/CodeGen/X86/<wbr>sse2-intrinsics-fast-isel.ll (original)<br>
+++ llvm/trunk/test/CodeGen/X86/<wbr>sse2-intrinsics-fast-isel.ll Mon Apr 3 16:06:51 2017<br>
@@ -2425,10 +2425,9 @@ define <2 x i64> @test_mm_set1_epi64x(i6<br>
; X32-LABEL: test_mm_set1_epi64x:<br>
; X32: # BB#0:<br>
; X32-NEXT: movd {{.*#+}} xmm0 = mem[0],zero,zero,zero<br>
-; X32-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,0,1,1]<br>
; X32-NEXT: movd {{.*#+}} xmm1 = mem[0],zero,zero,zero<br>
-; X32-NEXT: pshufd {{.*#+}} xmm1 = xmm1[0,0,1,1]<br>
; X32-NEXT: punpckldq {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[<wbr>1]<br>
+; X32-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,1,0,1]<br>
; X32-NEXT: retl<br>
;<br>
; X64-LABEL: test_mm_set1_epi64x:<br>
<br>
Modified: llvm/trunk/test/CodeGen/X86/<wbr>vec_fp_to_int.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vec_fp_to_int.ll?rev=299387&r1=299386&r2=299387&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project/llvm/trunk/test/<wbr>CodeGen/X86/vec_fp_to_int.ll?<wbr>rev=299387&r1=299386&r2=<wbr>299387&view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/test/CodeGen/X86/<wbr>vec_fp_to_int.ll (original)<br>
+++ llvm/trunk/test/CodeGen/X86/<wbr>vec_fp_to_int.ll Mon Apr 3 16:06:51 2017<br>
@@ -537,7 +537,7 @@ define <4 x i32> @fptoui_4f64_to_2i32(<2<br>
; VEX-NEXT: vpinsrd $1, %eax, %xmm0, %xmm0<br>
; VEX-NEXT: vcvttsd2si %xmm0, %rax<br>
; VEX-NEXT: vpinsrd $2, %eax, %xmm0, %xmm0<br>
-; VEX-NEXT: vpinsrd $3, %eax, %xmm0, %xmm0<br>
+; VEX-NEXT: vpshufd {{.*#+}} xmm0 = xmm0[0,1,2,2]<br>
; VEX-NEXT: retq<br>
;<br>
; AVX512F-LABEL: fptoui_4f64_to_2i32:<br>
<br>
Modified: llvm/trunk/test/CodeGen/X86/<wbr>vec_int_to_fp.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vec_int_to_fp.ll?rev=299387&r1=299386&r2=299387&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project/llvm/trunk/test/<wbr>CodeGen/X86/vec_int_to_fp.ll?<wbr>rev=299387&r1=299386&r2=<wbr>299387&view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/test/CodeGen/X86/<wbr>vec_int_to_fp.ll (original)<br>
+++ llvm/trunk/test/CodeGen/X86/<wbr>vec_int_to_fp.ll Mon Apr 3 16:06:51 2017<br>
@@ -1177,8 +1177,8 @@ define <4 x float> @sitofp_4i64_to_4f32_<br>
; SSE-NEXT: movd %xmm0, %rax<br>
; SSE-NEXT: xorps %xmm0, %xmm0<br>
; SSE-NEXT: cvtsi2ssq %rax, %xmm0<br>
-; SSE-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm2[0],xmm0[1],xmm2[<wbr>1]<br>
; SSE-NEXT: unpcklps {{.*#+}} xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[<wbr>1]<br>
+; SSE-NEXT: shufps {{.*#+}} xmm1 = xmm1[0,1,2,2]<br>
; SSE-NEXT: movaps %xmm1, %xmm0<br>
; SSE-NEXT: retq<br>
;<br>
@@ -1879,8 +1879,8 @@ define <4 x float> @uitofp_4i64_to_4f32_<br>
; SSE-NEXT: cvtsi2ssq %rax, %xmm1<br>
; SSE-NEXT: addss %xmm1, %xmm1<br>
; SSE-NEXT: .LBB41_8:<br>
-; SSE-NEXT: unpcklps {{.*#+}} xmm1 = xmm1[0],xmm2[0],xmm1[1],xmm2[<wbr>1]<br>
; SSE-NEXT: unpcklps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[1],xmm1[<wbr>1]<br>
+; SSE-NEXT: shufps {{.*#+}} xmm0 = xmm0[0,1,2,2]<br>
; SSE-NEXT: retq<br>
;<br>
; VEX-LABEL: uitofp_4i64_to_4f32_undef:<br>
<br>
Modified: llvm/trunk/test/CodeGen/X86/<wbr>vector-sext.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-sext.ll?rev=299387&r1=299386&r2=299387&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project/llvm/trunk/test/<wbr>CodeGen/X86/vector-sext.ll?<wbr>rev=299387&r1=299386&r2=<wbr>299387&view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/test/CodeGen/X86/<wbr>vector-sext.ll (original)<br>
+++ llvm/trunk/test/CodeGen/X86/<wbr>vector-sext.ll Mon Apr 3 16:06:51 2017<br>
@@ -1263,14 +1263,13 @@ define <2 x i64> @load_sext_2i1_to_2i64(<br>
; X32-SSE41-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
; X32-SSE41-NEXT: movzbl (%eax), %eax<br>
; X32-SSE41-NEXT: movl %eax, %ecx<br>
-; X32-SSE41-NEXT: shll $31, %ecx<br>
+; X32-SSE41-NEXT: shll $30, %ecx<br>
; X32-SSE41-NEXT: sarl $31, %ecx<br>
-; X32-SSE41-NEXT: movd %ecx, %xmm0<br>
-; X32-SSE41-NEXT: pinsrd $1, %ecx, %xmm0<br>
-; X32-SSE41-NEXT: shll $30, %eax<br>
+; X32-SSE41-NEXT: shll $31, %eax<br>
; X32-SSE41-NEXT: sarl $31, %eax<br>
-; X32-SSE41-NEXT: pinsrd $2, %eax, %xmm0<br>
-; X32-SSE41-NEXT: pinsrd $3, %eax, %xmm0<br>
+; X32-SSE41-NEXT: movd %eax, %xmm0<br>
+; X32-SSE41-NEXT: pinsrd $2, %ecx, %xmm0<br>
+; X32-SSE41-NEXT: pshufd {{.*#+}} xmm0 = xmm0[0,0,2,2]<br>
; X32-SSE41-NEXT: retl<br>
entry:<br>
%X = load <2 x i1>, <2 x i1>* %ptr<br>
<br>
Modified: llvm/trunk/test/CodeGen/X86/<wbr>vector-shuffle-combining-xop.<wbr>ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vector-shuffle-combining-xop.ll?rev=299387&r1=299386&r2=299387&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project/llvm/trunk/test/<wbr>CodeGen/X86/vector-shuffle-<wbr>combining-xop.ll?rev=299387&<wbr>r1=299386&r2=299387&view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/test/CodeGen/X86/<wbr>vector-shuffle-combining-xop.<wbr>ll (original)<br>
+++ llvm/trunk/test/CodeGen/X86/<wbr>vector-shuffle-combining-xop.<wbr>ll Mon Apr 3 16:06:51 2017<br>
@@ -318,21 +318,20 @@ define <4 x i32> @combine_vpperm_10zz32B<br>
ret <4 x i32> %res3<br>
}<br>
<br>
-; FIXME: Duplicated load in i686<br>
define void @buildvector_v4f32_0404(float %a, float %b, <4 x float>* %ptr) {<br>
; X32-LABEL: buildvector_v4f32_0404:<br>
; X32: # BB#0:<br>
; X32-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
; X32-NEXT: vmovsd {{.*#+}} xmm0 = mem[0],zero<br>
-; X32-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1],mem[0],xmm0[3]<br>
-; X32-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0,1,2],mem[0]<br>
-; X32-NEXT: vmovaps %xmm0, (%eax)<br>
+; X32-NEXT: vmovddup {{.*#+}} xmm0 = xmm0[0,0]<br>
+; X32-NEXT: vmovapd %xmm0, (%eax)<br>
; X32-NEXT: retl<br>
;<br>
; X64-LABEL: buildvector_v4f32_0404:<br>
; X64: # BB#0:<br>
-; X64-NEXT: vpermil2ps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[0],xmm1[<wbr>0]<br>
-; X64-NEXT: vmovaps %xmm0, (%rdi)<br>
+; X64-NEXT: vinsertps {{.*#+}} xmm0 = xmm0[0],xmm1[0],xmm0[2,3]<br>
+; X64-NEXT: vmovddup {{.*#+}} xmm0 = xmm0[0,0]<br>
+; X64-NEXT: vmovapd %xmm0, (%rdi)<br>
; X64-NEXT: retq<br>
%v0 = insertelement <4 x float> undef, float %a, i32 0<br>
%v1 = insertelement <4 x float> %v0, float %b, i32 1<br>
<br>
Modified: llvm/trunk/test/CodeGen/X86/<wbr>vshift-1.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vshift-1.ll?rev=299387&r1=299386&r2=299387&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project/llvm/trunk/test/<wbr>CodeGen/X86/vshift-1.ll?rev=<wbr>299387&r1=299386&r2=299387&<wbr>view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/test/CodeGen/X86/<wbr>vshift-1.ll (original)<br>
+++ llvm/trunk/test/CodeGen/X86/<wbr>vshift-1.ll Mon Apr 3 16:06:51 2017<br>
@@ -28,12 +28,9 @@ define void @shift1b(<2 x i64> %val, <2<br>
; X32-LABEL: shift1b:<br>
; X32: # BB#0: # %entry<br>
; X32-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
-; X32-NEXT: movd {{.*#+}} xmm1 = mem[0],zero,zero,zero<br>
-; X32-NEXT: pshufd {{.*#+}} xmm1 = xmm1[0,0,1,1]<br>
-; X32-NEXT: movd {{.*#+}} xmm2 = mem[0],zero,zero,zero<br>
-; X32-NEXT: pshufd {{.*#+}} xmm2 = xmm2[0,0,1,1]<br>
-; X32-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[<wbr>1]<br>
-; X32-NEXT: psllq %xmm2, %xmm0<br>
+; X32-NEXT: movq {{.*#+}} xmm1 = mem[0],zero<br>
+; X32-NEXT: pshufd {{.*#+}} xmm1 = xmm1[0,1,0,1]<br>
+; X32-NEXT: psllq %xmm1, %xmm0<br>
; X32-NEXT: movdqa %xmm0, (%eax)<br>
; X32-NEXT: retl<br>
;<br>
<br>
Modified: llvm/trunk/test/CodeGen/X86/<wbr>vshift-2.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vshift-2.ll?rev=299387&r1=299386&r2=299387&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project/llvm/trunk/test/<wbr>CodeGen/X86/vshift-2.ll?rev=<wbr>299387&r1=299386&r2=299387&<wbr>view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/test/CodeGen/X86/<wbr>vshift-2.ll (original)<br>
+++ llvm/trunk/test/CodeGen/X86/<wbr>vshift-2.ll Mon Apr 3 16:06:51 2017<br>
@@ -28,12 +28,9 @@ define void @shift1b(<2 x i64> %val, <2<br>
; X32-LABEL: shift1b:<br>
; X32: # BB#0: # %entry<br>
; X32-NEXT: movl {{[0-9]+}}(%esp), %eax<br>
-; X32-NEXT: movd {{.*#+}} xmm1 = mem[0],zero,zero,zero<br>
-; X32-NEXT: pshufd {{.*#+}} xmm1 = xmm1[0,0,1,1]<br>
-; X32-NEXT: movd {{.*#+}} xmm2 = mem[0],zero,zero,zero<br>
-; X32-NEXT: pshufd {{.*#+}} xmm2 = xmm2[0,0,1,1]<br>
-; X32-NEXT: punpckldq {{.*#+}} xmm2 = xmm2[0],xmm1[0],xmm2[1],xmm1[<wbr>1]<br>
-; X32-NEXT: psrlq %xmm2, %xmm0<br>
+; X32-NEXT: movq {{.*#+}} xmm1 = mem[0],zero<br>
+; X32-NEXT: pshufd {{.*#+}} xmm1 = xmm1[0,1,0,1]<br>
+; X32-NEXT: psrlq %xmm1, %xmm0<br>
; X32-NEXT: movdqa %xmm0, (%eax)<br>
; X32-NEXT: retl<br>
;<br>
<br>
<br>
______________________________<wbr>_________________<br>
llvm-commits mailing list<br>
<a href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-commits</a><br>
</blockquote></div><br></div>