[llvm] r353923 - [X86][SLP] Enable SLP vectorization for 128-bit horizontal X86 instructions (add, sub)

Wed Feb 20 10:35:20 PST 2019

On Wed, Feb 20, 2019 at 12:11 AM Anton Afanasyev
<anton.a.afanasyev at gmail.com> wrote:
>
> Oh, ok.
> I've stucked with 3-stage reproducing, migrating to the new compilation monorepo.
> Looks like this almost done -- I will have fix for these two tests. Do you think I should fix benchmark regressions in this commit as well?

Probably, since this is a performance optimization that's
unnecessarily penalizing some tests?

> This commit generally fixes a common issue for x86, special side effects are to be fixed separately.
> Can I open an issue for benchmark regression and fix it in separate commit?
> Could you please also give a link to the automatic testing results with this benchmark, if you have one.
>

I don't have an external link unfortunately, but the code gen for the
test should be pretty clear. It resides in projects/test-suite.
Perhaps IACA can be helpful there since they're pretty small?

-eric

>   Thanks, Anton
>
> ср, 20 февр. 2019 г. в 07:30, Eric Christopher <echristo at gmail.com>:
>>
>> Hi All,
>>
>> Given that this has broken the lto bootstrap build for 3 days and is
>> showing a significant regression on the Dither_benchmark results (from
>> the LLVM benchmark suite) -- specifically, on the
>> BENCHMARK_FLOYD_DITHER_128, BENCHMARK_FLOYD_DITHER_256, and
>> BENCHMARK_FLOYD_DITHER_512; the others are unchanged.  These have
>> regressed by about 28% on Skylake, 34% on Haswell, and over 40% on
>> Sandybridge. I'm going to go ahead and revert this now and we can
>> reapply after these problems have been resolved.
>>
>> Thanks!
>>
>> -eric
>>
>> On Mon, Feb 18, 2019 at 10:41 AM Galina Kistanova via llvm-commits
>> <llvm-commits at lists.llvm.org> wrote:
>> >
>> > Hello Anton,
>> >
>> > Not sure what do you mean with the word "regular". `ninja check-all` could be, or could not be enough, depending on what exact configuration you build with what toolchain. In this particular case, stage3 builds with LTO by the same version of LLVM/Clang.
>> >
>> > In some cases, multi-stages builds detect non-determinism in the compiler as well. Doesn't seem this is the case this time.
>> >
>> > Please let me know if you need some help, intermediate files from that bot, and such.
>> >
>> > Thanks
>> >
>> > Galina
>> >
>> >
>> > On Sat, Feb 16, 2019 at 10:48 PM Anton Afanasyev <anton.a.afanasyev at gmail.com> wrote:
>> >>
>> >> Hello Galina,
>> >>
>> >> yes, I see the build failed for the third stage (compilation using compiled clang compiled by clang). Does it mean that regular `ninja check-all` is not enough to make sure the commit doesn't break something?
>> >> Ok, I'm working on it.
>> >>
>> >>    Thanks, Anton
>> >>
>> >> вс, 17 февр. 2019 г. в 07:30, Galina Kistanova <gkistanova at gmail.com>:
>> >>>
>> >>> Hello Anton,
>> >>>
>> >>> This commit broke tests on clang-with-lto-ubuntu builder:
>> >>> http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/12214
>> >>>
>> >>> . . .
>> >>> Failing Tests (2):
>> >>>     LLVM :: CodeGen/ARM/special-reg-v8m-main.ll
>> >>>     LLVM :: MC/ARM/thumbv8m.s
>> >>>
>> >>> The previous revision r353922 builds green:
>> >>> http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/12258
>> >>>
>> >>> Please have a look ASAP?
>> >>>
>> >>> Thanks
>> >>>
>> >>> Galina
>> >>>
>> >>>
>> >>> On Wed, Feb 13, 2019 at 12:26 AM Anton Afanasyev via llvm-commits <llvm-commits at lists.llvm.org> wrote:
>> >>>>
>> >>>> Author: anton-afanasyev
>> >>>> Date: Wed Feb 13 00:26:43 2019
>> >>>> New Revision: 353923
>> >>>>
>> >>>> URL: http://llvm.org/viewvc/llvm-project?rev=353923&view=rev
>> >>>> Log:
>> >>>> [X86][SLP] Enable SLP vectorization for 128-bit horizontal X86 instructions (add, sub)
>> >>>>
>> >>>> Try to use 64-bit SLP vectorization. In addition to horizontal instrs
>> >>>> this change triggers optimizations for partial vector operations (for instance,
>> >>>> using low halfs of 128-bit registers xmm0 and xmm1 to multiply <2 x float> by
>> >>>> <2 x float>).
>> >>>>
>> >>>> Fixes llvm.org/PR32433
>> >>>>
>> >>>> Added:
>> >>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/vec-reg-64bit.ll
>> >>>> Modified:
>> >>>>     llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp
>> >>>>     llvm/trunk/lib/Target/X86/X86TargetTransformInfo.h
>> >>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/addsub.ll
>> >>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/alternate-fp.ll
>> >>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/alternate-int.ll
>> >>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/crash_7zip.ll
>> >>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/crash_bullet.ll
>> >>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/crash_bullet3.ll
>> >>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/crash_cmpop.ll
>> >>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/crash_sim4b1.ll
>> >>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/fptosi.ll
>> >>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/fptoui.ll
>> >>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/insertvalue.ll
>> >>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/phi.ll
>> >>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/remark_not_all_parts.ll
>> >>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/reorder_phi.ll
>> >>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/resched.ll
>> >>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/rgb_phi.ll
>> >>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/saxpy.ll
>> >>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/schedule-bundle.ll
>> >>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/sext.ll
>> >>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/shift-lshr.ll
>> >>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/shift-shl.ll
>> >>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/sitofp.ll
>> >>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/tiny-tree.ll
>> >>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/uitofp.ll
>> >>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/vect_copyable_in_binops.ll
>> >>>>     llvm/trunk/test/Transforms/SLPVectorizer/X86/zext.ll
>> >>>>
>> >>>> Modified: llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp
>> >>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp?rev=353923&r1=353922&r2=353923&view=diff
>> >>>> ==============================================================================
>> >>>> --- llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp (original)
>> >>>> +++ llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp Wed Feb 13 00:26:43 2019
>> >>>> @@ -146,6 +146,13 @@ unsigned X86TTIImpl::getRegisterBitWidth
>> >>>>    return 32;
>> >>>>  }
>> >>>>
>> >>>> +// Use horizontal 128-bit operations, which use low and high
>> >>>> +// 64-bit parts of vector register. This also allows vectorizer
>> >>>> +// to use partial vector operations.
>> >>>> +unsigned X86TTIImpl::getMinVectorRegisterBitWidth() const {
>> >>>> +  return 64;
>> >>>> +}
>> >>>> +
>> >>>>  unsigned X86TTIImpl::getLoadStoreVecRegBitWidth(unsigned) const {
>> >>>>    return getRegisterBitWidth(true);
>> >>>>  }
>> >>>>
>> >>>> Modified: llvm/trunk/lib/Target/X86/X86TargetTransformInfo.h
>> >>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86TargetTransformInfo.h?rev=353923&r1=353922&r2=353923&view=diff
>> >>>> ==============================================================================
>> >>>> --- llvm/trunk/lib/Target/X86/X86TargetTransformInfo.h (original)
>> >>>> +++ llvm/trunk/lib/Target/X86/X86TargetTransformInfo.h Wed Feb 13 00:26:43 2019
>> >>>> @@ -59,6 +59,7 @@ public:
>> >>>>
>> >>>>    unsigned getNumberOfRegisters(bool Vector);
>> >>>>    unsigned getRegisterBitWidth(bool Vector) const;
>> >>>> +  unsigned getMinVectorRegisterBitWidth() const;
>> >>>>    unsigned getLoadStoreVecRegBitWidth(unsigned AS) const;
>> >>>>    unsigned getMaxInterleaveFactor(unsigned VF);
>> >>>>    int getArithmeticInstrCost(
>> >>>>
>> >>>> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/addsub.ll
>> >>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/addsub.ll?rev=353923&r1=353922&r2=353923&view=diff
>> >>>> ==============================================================================
>> >>>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/addsub.ll (original)
>> >>>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/addsub.ll Wed Feb 13 00:26:43 2019
>> >>>> @@ -348,22 +348,18 @@ define void @reorder_alt_rightsubTree(do
>> >>>>
>> >>>>  define void @no_vec_shuff_reorder() #0 {
>> >>>>  ; CHECK-LABEL: @no_vec_shuff_reorder(
>> >>>> -; CHECK-NEXT:    [[TMP1:%.*]] = load float, float* getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 0), align 4
>> >>>> -; CHECK-NEXT:    [[TMP2:%.*]] = load float, float* getelementptr inbounds ([4 x float], [4 x float]* @fa, i32 0, i64 0), align 4
>> >>>> -; CHECK-NEXT:    [[TMP3:%.*]] = fadd float [[TMP1]], [[TMP2]]
>> >>>> -; CHECK-NEXT:    store float [[TMP3]], float* getelementptr inbounds ([4 x float], [4 x float]* @fc, i32 0, i64 0), align 4
>> >>>> -; CHECK-NEXT:    [[TMP4:%.*]] = load float, float* getelementptr inbounds ([4 x float], [4 x float]* @fa, i32 0, i64 1), align 4
>> >>>> -; CHECK-NEXT:    [[TMP5:%.*]] = load float, float* getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 1), align 4
>> >>>> -; CHECK-NEXT:    [[TMP6:%.*]] = fsub float [[TMP4]], [[TMP5]]
>> >>>> -; CHECK-NEXT:    store float [[TMP6]], float* getelementptr inbounds ([4 x float], [4 x float]* @fc, i32 0, i64 1), align 4
>> >>>> -; CHECK-NEXT:    [[TMP7:%.*]] = load float, float* getelementptr inbounds ([4 x float], [4 x float]* @fa, i32 0, i64 2), align 4
>> >>>> -; CHECK-NEXT:    [[TMP8:%.*]] = load float, float* getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 2), align 4
>> >>>> -; CHECK-NEXT:    [[TMP9:%.*]] = fadd float [[TMP7]], [[TMP8]]
>> >>>> -; CHECK-NEXT:    store float [[TMP9]], float* getelementptr inbounds ([4 x float], [4 x float]* @fc, i32 0, i64 2), align 4
>> >>>> -; CHECK-NEXT:    [[TMP10:%.*]] = load float, float* getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 3), align 4
>> >>>> -; CHECK-NEXT:    [[TMP11:%.*]] = load float, float* getelementptr inbounds ([4 x float], [4 x float]* @fa, i32 0, i64 3), align 4
>> >>>> -; CHECK-NEXT:    [[TMP12:%.*]] = fsub float [[TMP10]], [[TMP11]]
>> >>>> -; CHECK-NEXT:    store float [[TMP12]], float* getelementptr inbounds ([4 x float], [4 x float]* @fc, i32 0, i64 3), align 4
>> >>>> +; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x float>, <2 x float>* bitcast ([4 x float]* @fa to <2 x float>*), align 4
>> >>>> +; CHECK-NEXT:    [[TMP2:%.*]] = load <2 x float>, <2 x float>* bitcast ([4 x float]* @fb to <2 x float>*), align 4
>> >>>> +; CHECK-NEXT:    [[TMP3:%.*]] = fadd <2 x float> [[TMP1]], [[TMP2]]
>> >>>> +; CHECK-NEXT:    [[TMP4:%.*]] = fsub <2 x float> [[TMP1]], [[TMP2]]
>> >>>> +; CHECK-NEXT:    [[TMP5:%.*]] = shufflevector <2 x float> [[TMP3]], <2 x float> [[TMP4]], <2 x i32> <i32 0, i32 3>
>> >>>> +; CHECK-NEXT:    store <2 x float> [[TMP5]], <2 x float>* bitcast ([4 x float]* @fc to <2 x float>*), align 4
>> >>>> +; CHECK-NEXT:    [[TMP6:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 2) to <2 x float>*), align 4
>> >>>> +; CHECK-NEXT:    [[TMP7:%.*]] = load <2 x float>, <2 x float>* bitcast (float* getelementptr inbounds ([4 x float], [4 x float]* @fa, i32 0, i64 2) to <2 x float>*), align 4
>> >>>> +; CHECK-NEXT:    [[TMP8:%.*]] = fadd <2 x float> [[TMP6]], [[TMP7]]
>> >>>> +; CHECK-NEXT:    [[TMP9:%.*]] = fsub <2 x float> [[TMP6]], [[TMP7]]
>> >>>> +; CHECK-NEXT:    [[TMP10:%.*]] = shufflevector <2 x float> [[TMP8]], <2 x float> [[TMP9]], <2 x i32> <i32 0, i32 3>
>> >>>> +; CHECK-NEXT:    store <2 x float> [[TMP10]], <2 x float>* bitcast (float* getelementptr inbounds ([4 x float], [4 x float]* @fc, i32 0, i64 2) to <2 x float>*), align 4
>> >>>>  ; CHECK-NEXT:    ret void
>> >>>>  ;
>> >>>>    %1 = load float, float* getelementptr inbounds ([4 x float], [4 x float]* @fb, i32 0, i64 0), align 4
>> >>>>
>> >>>> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/alternate-fp.ll
>> >>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/alternate-fp.ll?rev=353923&r1=353922&r2=353923&view=diff
>> >>>> ==============================================================================
>> >>>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/alternate-fp.ll (original)
>> >>>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/alternate-fp.ll Wed Feb 13 00:26:43 2019
>> >>>> @@ -125,14 +125,15 @@ define <4 x float> @fmul_fdiv_v4f32_cons
>> >>>>  ; SSE-NEXT:    ret <4 x float> [[TMP1]]
>> >>>>  ;
>> >>>>  ; SLM-LABEL: @fmul_fdiv_v4f32_const(
>> >>>> -; SLM-NEXT:    [[A0:%.*]] = extractelement <4 x float> [[A:%.*]], i32 0
>> >>>> -; SLM-NEXT:    [[A1:%.*]] = extractelement <4 x float> [[A]], i32 1
>> >>>> -; SLM-NEXT:    [[A2:%.*]] = extractelement <4 x float> [[A]], i32 2
>> >>>> +; SLM-NEXT:    [[A2:%.*]] = extractelement <4 x float> [[A:%.*]], i32 2
>> >>>>  ; SLM-NEXT:    [[A3:%.*]] = extractelement <4 x float> [[A]], i32 3
>> >>>> -; SLM-NEXT:    [[AB0:%.*]] = fmul float [[A0]], 2.000000e+00
>> >>>> +; SLM-NEXT:    [[TMP1:%.*]] = shufflevector <4 x float> [[A]], <4 x float> undef, <2 x i32> <i32 0, i32 1>
>> >>>> +; SLM-NEXT:    [[TMP2:%.*]] = fmul <2 x float> [[TMP1]], <float 2.000000e+00, float 1.000000e+00>
>> >>>>  ; SLM-NEXT:    [[AB3:%.*]] = fmul float [[A3]], 2.000000e+00
>> >>>> -; SLM-NEXT:    [[R0:%.*]] = insertelement <4 x float> undef, float [[AB0]], i32 0
>> >>>> -; SLM-NEXT:    [[R1:%.*]] = insertelement <4 x float> [[R0]], float [[A1]], i32 1
>> >>>> +; SLM-NEXT:    [[TMP3:%.*]] = extractelement <2 x float> [[TMP2]], i32 0
>> >>>> +; SLM-NEXT:    [[R0:%.*]] = insertelement <4 x float> undef, float [[TMP3]], i32 0
>> >>>> +; SLM-NEXT:    [[TMP4:%.*]] = extractelement <2 x float> [[TMP2]], i32 1
>> >>>> +; SLM-NEXT:    [[R1:%.*]] = insertelement <4 x float> [[R0]], float [[TMP4]], i32 1
>> >>>>  ; SLM-NEXT:    [[R2:%.*]] = insertelement <4 x float> [[R1]], float [[A2]], i32 2
>> >>>>  ; SLM-NEXT:    [[R3:%.*]] = insertelement <4 x float> [[R2]], float [[AB3]], i32 3
>> >>>>  ; SLM-NEXT:    ret <4 x float> [[R3]]
>> >>>>
>> >>>> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/alternate-int.ll
>> >>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/alternate-int.ll?rev=353923&r1=353922&r2=353923&view=diff
>> >>>> ==============================================================================
>> >>>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/alternate-int.ll (original)
>> >>>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/alternate-int.ll Wed Feb 13 00:26:43 2019
>> >>>> @@ -83,20 +83,17 @@ define <4 x i32> @add_mul_v4i32(<4 x i32
>> >>>>  ;
>> >>>>  ; SLM-LABEL: @add_mul_v4i32(
>> >>>>  ; SLM-NEXT:    [[A0:%.*]] = extractelement <4 x i32> [[A:%.*]], i32 0
>> >>>> -; SLM-NEXT:    [[A1:%.*]] = extractelement <4 x i32> [[A]], i32 1
>> >>>> -; SLM-NEXT:    [[A2:%.*]] = extractelement <4 x i32> [[A]], i32 2
>> >>>>  ; SLM-NEXT:    [[A3:%.*]] = extractelement <4 x i32> [[A]], i32 3
>> >>>>  ; SLM-NEXT:    [[B0:%.*]] = extractelement <4 x i32> [[B:%.*]], i32 0
>> >>>> -; SLM-NEXT:    [[B1:%.*]] = extractelement <4 x i32> [[B]], i32 1
>> >>>> -; SLM-NEXT:    [[B2:%.*]] = extractelement <4 x i32> [[B]], i32 2
>> >>>>  ; SLM-NEXT:    [[B3:%.*]] = extractelement <4 x i32> [[B]], i32 3
>> >>>>  ; SLM-NEXT:    [[AB0:%.*]] = mul i32 [[A0]], [[B0]]
>> >>>> -; SLM-NEXT:    [[AB1:%.*]] = add i32 [[A1]], [[B1]]
>> >>>> -; SLM-NEXT:    [[AB2:%.*]] = add i32 [[A2]], [[B2]]
>> >>>> +; SLM-NEXT:    [[TMP1:%.*]] = add <4 x i32> [[A]], [[B]]
>> >>>>  ; SLM-NEXT:    [[AB3:%.*]] = mul i32 [[A3]], [[B3]]
>> >>>>  ; SLM-NEXT:    [[R0:%.*]] = insertelement <4 x i32> undef, i32 [[AB0]], i32 0
>> >>>> -; SLM-NEXT:    [[R1:%.*]] = insertelement <4 x i32> [[R0]], i32 [[AB1]], i32 1
>> >>>> -; SLM-NEXT:    [[R2:%.*]] = insertelement <4 x i32> [[R1]], i32 [[AB2]], i32 2
>> >>>> +; SLM-NEXT:    [[TMP2:%.*]] = extractelement <4 x i32> [[TMP1]], i32 1
>> >>>> +; SLM-NEXT:    [[R1:%.*]] = insertelement <4 x i32> [[R0]], i32 [[TMP2]], i32 1
>> >>>> +; SLM-NEXT:    [[TMP3:%.*]] = extractelement <4 x i32> [[TMP1]], i32 2
>> >>>> +; SLM-NEXT:    [[R2:%.*]] = insertelement <4 x i32> [[R1]], i32 [[TMP3]], i32 2
>> >>>>  ; SLM-NEXT:    [[R3:%.*]] = insertelement <4 x i32> [[R2]], i32 [[AB3]], i32 3
>> >>>>  ; SLM-NEXT:    ret <4 x i32> [[R3]]
>> >>>>  ;
>> >>>> @@ -274,34 +271,28 @@ define <8 x i32> @ashr_lshr_shl_v8i32(<8
>> >>>>  ; SSE-LABEL: @ashr_lshr_shl_v8i32(
>> >>>>  ; SSE-NEXT:    [[A0:%.*]] = extractelement <8 x i32> [[A:%.*]], i32 0
>> >>>>  ; SSE-NEXT:    [[A1:%.*]] = extractelement <8 x i32> [[A]], i32 1
>> >>>> -; SSE-NEXT:    [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2
>> >>>> -; SSE-NEXT:    [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
>> >>>> -; SSE-NEXT:    [[A4:%.*]] = extractelement <8 x i32> [[A]], i32 4
>> >>>> -; SSE-NEXT:    [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
>> >>>>  ; SSE-NEXT:    [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
>> >>>>  ; SSE-NEXT:    [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
>> >>>>  ; SSE-NEXT:    [[B0:%.*]] = extractelement <8 x i32> [[B:%.*]], i32 0
>> >>>>  ; SSE-NEXT:    [[B1:%.*]] = extractelement <8 x i32> [[B]], i32 1
>> >>>> -; SSE-NEXT:    [[B2:%.*]] = extractelement <8 x i32> [[B]], i32 2
>> >>>> -; SSE-NEXT:    [[B3:%.*]] = extractelement <8 x i32> [[B]], i32 3
>> >>>> -; SSE-NEXT:    [[B4:%.*]] = extractelement <8 x i32> [[B]], i32 4
>> >>>> -; SSE-NEXT:    [[B5:%.*]] = extractelement <8 x i32> [[B]], i32 5
>> >>>>  ; SSE-NEXT:    [[B6:%.*]] = extractelement <8 x i32> [[B]], i32 6
>> >>>>  ; SSE-NEXT:    [[B7:%.*]] = extractelement <8 x i32> [[B]], i32 7
>> >>>>  ; SSE-NEXT:    [[AB0:%.*]] = ashr i32 [[A0]], [[B0]]
>> >>>>  ; SSE-NEXT:    [[AB1:%.*]] = ashr i32 [[A1]], [[B1]]
>> >>>> -; SSE-NEXT:    [[AB2:%.*]] = lshr i32 [[A2]], [[B2]]
>> >>>> -; SSE-NEXT:    [[AB3:%.*]] = lshr i32 [[A3]], [[B3]]
>> >>>> -; SSE-NEXT:    [[AB4:%.*]] = lshr i32 [[A4]], [[B4]]
>> >>>> -; SSE-NEXT:    [[AB5:%.*]] = lshr i32 [[A5]], [[B5]]
>> >>>> +; SSE-NEXT:    [[TMP1:%.*]] = lshr <8 x i32> [[A]], [[B]]
>> >>>> +; SSE-NEXT:    [[TMP2:%.*]] = lshr <8 x i32> [[A]], [[B]]
>> >>>>  ; SSE-NEXT:    [[AB6:%.*]] = shl i32 [[A6]], [[B6]]
>> >>>>  ; SSE-NEXT:    [[AB7:%.*]] = shl i32 [[A7]], [[B7]]
>> >>>>  ; SSE-NEXT:    [[R0:%.*]] = insertelement <8 x i32> undef, i32 [[AB0]], i32 0
>> >>>>  ; SSE-NEXT:    [[R1:%.*]] = insertelement <8 x i32> [[R0]], i32 [[AB1]], i32 1
>> >>>> -; SSE-NEXT:    [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
>> >>>> -; SSE-NEXT:    [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
>> >>>> -; SSE-NEXT:    [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB4]], i32 4
>> >>>> -; SSE-NEXT:    [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[AB5]], i32 5
>> >>>> +; SSE-NEXT:    [[TMP3:%.*]] = extractelement <8 x i32> [[TMP1]], i32 2
>> >>>> +; SSE-NEXT:    [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[TMP3]], i32 2
>> >>>> +; SSE-NEXT:    [[TMP4:%.*]] = extractelement <8 x i32> [[TMP1]], i32 3
>> >>>> +; SSE-NEXT:    [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[TMP4]], i32 3
>> >>>> +; SSE-NEXT:    [[TMP5:%.*]] = extractelement <8 x i32> [[TMP2]], i32 4
>> >>>> +; SSE-NEXT:    [[R4:%.*]] = insertelement <8 x i32> [[R3]], i32 [[TMP5]], i32 4
>> >>>> +; SSE-NEXT:    [[TMP6:%.*]] = extractelement <8 x i32> [[TMP2]], i32 5
>> >>>> +; SSE-NEXT:    [[R5:%.*]] = insertelement <8 x i32> [[R4]], i32 [[TMP6]], i32 5
>> >>>>  ; SSE-NEXT:    [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
>> >>>>  ; SSE-NEXT:    [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
>> >>>>  ; SSE-NEXT:    ret <8 x i32> [[R7]]
>> >>>> @@ -486,26 +477,110 @@ define <8 x i32> @add_v8i32_undefs(<8 x
>> >>>>  }
>> >>>>
>> >>>>  define <8 x i32> @sdiv_v8i32_undefs(<8 x i32> %a) {
>> >>>> -; CHECK-LABEL: @sdiv_v8i32_undefs(
>> >>>> -; CHECK-NEXT:    [[A1:%.*]] = extractelement <8 x i32> [[A:%.*]], i32 1
>> >>>> -; CHECK-NEXT:    [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2
>> >>>> -; CHECK-NEXT:    [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
>> >>>> -; CHECK-NEXT:    [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
>> >>>> -; CHECK-NEXT:    [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
>> >>>> -; CHECK-NEXT:    [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
>> >>>> -; CHECK-NEXT:    [[AB1:%.*]] = sdiv i32 [[A1]], 4
>> >>>> -; CHECK-NEXT:    [[AB2:%.*]] = sdiv i32 [[A2]], 8
>> >>>> -; CHECK-NEXT:    [[AB3:%.*]] = sdiv i32 [[A3]], 16
>> >>>> -; CHECK-NEXT:    [[AB5:%.*]] = sdiv i32 [[A5]], 4
>> >>>> -; CHECK-NEXT:    [[AB6:%.*]] = sdiv i32 [[A6]], 8
>> >>>> -; CHECK-NEXT:    [[AB7:%.*]] = sdiv i32 [[A7]], 16
>> >>>> -; CHECK-NEXT:    [[R1:%.*]] = insertelement <8 x i32> undef, i32 [[AB1]], i32 1
>> >>>> -; CHECK-NEXT:    [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
>> >>>> -; CHECK-NEXT:    [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
>> >>>> -; CHECK-NEXT:    [[R5:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB5]], i32 5
>> >>>> -; CHECK-NEXT:    [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
>> >>>> -; CHECK-NEXT:    [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
>> >>>> -; CHECK-NEXT:    ret <8 x i32> [[R7]]
>> >>>> +; SSE-LABEL: @sdiv_v8i32_undefs(
>> >>>> +; SSE-NEXT:    [[A1:%.*]] = extractelement <8 x i32> [[A:%.*]], i32 1
>> >>>> +; SSE-NEXT:    [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2
>> >>>> +; SSE-NEXT:    [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
>> >>>> +; SSE-NEXT:    [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
>> >>>> +; SSE-NEXT:    [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
>> >>>> +; SSE-NEXT:    [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
>> >>>> +; SSE-NEXT:    [[AB1:%.*]] = sdiv i32 [[A1]], 4
>> >>>> +; SSE-NEXT:    [[AB2:%.*]] = sdiv i32 [[A2]], 8
>> >>>> +; SSE-NEXT:    [[AB3:%.*]] = sdiv i32 [[A3]], 16
>> >>>> +; SSE-NEXT:    [[AB5:%.*]] = sdiv i32 [[A5]], 4
>> >>>> +; SSE-NEXT:    [[AB6:%.*]] = sdiv i32 [[A6]], 8
>> >>>> +; SSE-NEXT:    [[AB7:%.*]] = sdiv i32 [[A7]], 16
>> >>>> +; SSE-NEXT:    [[R1:%.*]] = insertelement <8 x i32> undef, i32 [[AB1]], i32 1
>> >>>> +; SSE-NEXT:    [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
>> >>>> +; SSE-NEXT:    [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
>> >>>> +; SSE-NEXT:    [[R5:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB5]], i32 5
>> >>>> +; SSE-NEXT:    [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
>> >>>> +; SSE-NEXT:    [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
>> >>>> +; SSE-NEXT:    ret <8 x i32> [[R7]]
>> >>>> +;
>> >>>> +; SLM-LABEL: @sdiv_v8i32_undefs(
>> >>>> +; SLM-NEXT:    [[A1:%.*]] = extractelement <8 x i32> [[A:%.*]], i32 1
>> >>>> +; SLM-NEXT:    [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2
>> >>>> +; SLM-NEXT:    [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
>> >>>> +; SLM-NEXT:    [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
>> >>>> +; SLM-NEXT:    [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
>> >>>> +; SLM-NEXT:    [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
>> >>>> +; SLM-NEXT:    [[AB1:%.*]] = sdiv i32 [[A1]], 4
>> >>>> +; SLM-NEXT:    [[AB2:%.*]] = sdiv i32 [[A2]], 8
>> >>>> +; SLM-NEXT:    [[AB3:%.*]] = sdiv i32 [[A3]], 16
>> >>>> +; SLM-NEXT:    [[AB5:%.*]] = sdiv i32 [[A5]], 4
>> >>>> +; SLM-NEXT:    [[AB6:%.*]] = sdiv i32 [[A6]], 8
>> >>>> +; SLM-NEXT:    [[AB7:%.*]] = sdiv i32 [[A7]], 16
>> >>>> +; SLM-NEXT:    [[R1:%.*]] = insertelement <8 x i32> undef, i32 [[AB1]], i32 1
>> >>>> +; SLM-NEXT:    [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
>> >>>> +; SLM-NEXT:    [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
>> >>>> +; SLM-NEXT:    [[R5:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB5]], i32 5
>> >>>> +; SLM-NEXT:    [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
>> >>>> +; SLM-NEXT:    [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
>> >>>> +; SLM-NEXT:    ret <8 x i32> [[R7]]
>> >>>> +;
>> >>>> +; AVX1-LABEL: @sdiv_v8i32_undefs(
>> >>>> +; AVX1-NEXT:    [[A1:%.*]] = extractelement <8 x i32> [[A:%.*]], i32 1
>> >>>> +; AVX1-NEXT:    [[A2:%.*]] = extractelement <8 x i32> [[A]], i32 2
>> >>>> +; AVX1-NEXT:    [[A3:%.*]] = extractelement <8 x i32> [[A]], i32 3
>> >>>> +; AVX1-NEXT:    [[A5:%.*]] = extractelement <8 x i32> [[A]], i32 5
>> >>>> +; AVX1-NEXT:    [[A6:%.*]] = extractelement <8 x i32> [[A]], i32 6
>> >>>> +; AVX1-NEXT:    [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
>> >>>> +; AVX1-NEXT:    [[AB1:%.*]] = sdiv i32 [[A1]], 4
>> >>>> +; AVX1-NEXT:    [[AB2:%.*]] = sdiv i32 [[A2]], 8
>> >>>> +; AVX1-NEXT:    [[AB3:%.*]] = sdiv i32 [[A3]], 16
>> >>>> +; AVX1-NEXT:    [[AB5:%.*]] = sdiv i32 [[A5]], 4
>> >>>> +; AVX1-NEXT:    [[AB6:%.*]] = sdiv i32 [[A6]], 8
>> >>>> +; AVX1-NEXT:    [[AB7:%.*]] = sdiv i32 [[A7]], 16
>> >>>> +; AVX1-NEXT:    [[R1:%.*]] = insertelement <8 x i32> undef, i32 [[AB1]], i32 1
>> >>>> +; AVX1-NEXT:    [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[AB2]], i32 2
>> >>>> +; AVX1-NEXT:    [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
>> >>>> +; AVX1-NEXT:    [[R5:%.*]] = insertelement <8 x i32> [[R3]], i32 [[AB5]], i32 5
>> >>>> +; AVX1-NEXT:    [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[AB6]], i32 6
>> >>>> +; AVX1-NEXT:    [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
>> >>>> +; AVX1-NEXT:    ret <8 x i32> [[R7]]
>> >>>> +;
>> >>>> +; AVX2-LABEL: @sdiv_v8i32_undefs(
>> >>>> +; AVX2-NEXT:    [[A3:%.*]] = extractelement <8 x i32> [[A:%.*]], i32 3
>> >>>> +; AVX2-NEXT:    [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
>> >>>> +; AVX2-NEXT:    [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <2 x i32> <i32 1, i32 2>
>> >>>> +; AVX2-NEXT:    [[TMP2:%.*]] = sdiv <2 x i32> [[TMP1]], <i32 4, i32 8>
>> >>>> +; AVX2-NEXT:    [[AB3:%.*]] = sdiv i32 [[A3]], 16
>> >>>> +; AVX2-NEXT:    [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <2 x i32> <i32 5, i32 6>
>> >>>> +; AVX2-NEXT:    [[TMP4:%.*]] = sdiv <2 x i32> [[TMP3]], <i32 4, i32 8>
>> >>>> +; AVX2-NEXT:    [[AB7:%.*]] = sdiv i32 [[A7]], 16
>> >>>> +; AVX2-NEXT:    [[TMP5:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0
>> >>>> +; AVX2-NEXT:    [[R1:%.*]] = insertelement <8 x i32> undef, i32 [[TMP5]], i32 1
>> >>>> +; AVX2-NEXT:    [[TMP6:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1
>> >>>> +; AVX2-NEXT:    [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[TMP6]], i32 2
>> >>>> +; AVX2-NEXT:    [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
>> >>>> +; AVX2-NEXT:    [[TMP7:%.*]] = extractelement <2 x i32> [[TMP4]], i32 0
>> >>>> +; AVX2-NEXT:    [[R5:%.*]] = insertelement <8 x i32> [[R3]], i32 [[TMP7]], i32 5
>> >>>> +; AVX2-NEXT:    [[TMP8:%.*]] = extractelement <2 x i32> [[TMP4]], i32 1
>> >>>> +; AVX2-NEXT:    [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[TMP8]], i32 6
>> >>>> +; AVX2-NEXT:    [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
>> >>>> +; AVX2-NEXT:    ret <8 x i32> [[R7]]
>> >>>> +;
>> >>>> +; AVX512-LABEL: @sdiv_v8i32_undefs(
>> >>>> +; AVX512-NEXT:    [[A3:%.*]] = extractelement <8 x i32> [[A:%.*]], i32 3
>> >>>> +; AVX512-NEXT:    [[A7:%.*]] = extractelement <8 x i32> [[A]], i32 7
>> >>>> +; AVX512-NEXT:    [[TMP1:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <2 x i32> <i32 1, i32 2>
>> >>>> +; AVX512-NEXT:    [[TMP2:%.*]] = sdiv <2 x i32> [[TMP1]], <i32 4, i32 8>
>> >>>> +; AVX512-NEXT:    [[AB3:%.*]] = sdiv i32 [[A3]], 16
>> >>>> +; AVX512-NEXT:    [[TMP3:%.*]] = shufflevector <8 x i32> [[A]], <8 x i32> undef, <2 x i32> <i32 5, i32 6>
>> >>>> +; AVX512-NEXT:    [[TMP4:%.*]] = sdiv <2 x i32> [[TMP3]], <i32 4, i32 8>
>> >>>> +; AVX512-NEXT:    [[AB7:%.*]] = sdiv i32 [[A7]], 16
>> >>>> +; AVX512-NEXT:    [[TMP5:%.*]] = extractelement <2 x i32> [[TMP2]], i32 0
>> >>>> +; AVX512-NEXT:    [[R1:%.*]] = insertelement <8 x i32> undef, i32 [[TMP5]], i32 1
>> >>>> +; AVX512-NEXT:    [[TMP6:%.*]] = extractelement <2 x i32> [[TMP2]], i32 1
>> >>>> +; AVX512-NEXT:    [[R2:%.*]] = insertelement <8 x i32> [[R1]], i32 [[TMP6]], i32 2
>> >>>> +; AVX512-NEXT:    [[R3:%.*]] = insertelement <8 x i32> [[R2]], i32 [[AB3]], i32 3
>> >>>> +; AVX512-NEXT:    [[TMP7:%.*]] = extractelement <2 x i32> [[TMP4]], i32 0
>> >>>> +; AVX512-NEXT:    [[R5:%.*]] = insertelement <8 x i32> [[R3]], i32 [[TMP7]], i32 5
>> >>>> +; AVX512-NEXT:    [[TMP8:%.*]] = extractelement <2 x i32> [[TMP4]], i32 1
>> >>>> +; AVX512-NEXT:    [[R6:%.*]] = insertelement <8 x i32> [[R5]], i32 [[TMP8]], i32 6
>> >>>> +; AVX512-NEXT:    [[R7:%.*]] = insertelement <8 x i32> [[R6]], i32 [[AB7]], i32 7
>> >>>> +; AVX512-NEXT:    ret <8 x i32> [[R7]]
>> >>>>  ;
>> >>>>    %a0 = extractelement <8 x i32> %a, i32 0
>> >>>>    %a1 = extractelement <8 x i32> %a, i32 1
>> >>>>
>> >>>> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/crash_7zip.ll
>> >>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/crash_7zip.ll?rev=353923&r1=353922&r2=353923&view=diff
>> >>>> ==============================================================================
>> >>>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/crash_7zip.ll (original)
>> >>>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/crash_7zip.ll Wed Feb 13 00:26:43 2019
>> >>>> @@ -11,27 +11,23 @@ define fastcc void @LzmaDec_DecodeReal2(
>> >>>>  ; CHECK-LABEL: @LzmaDec_DecodeReal2(
>> >>>>  ; CHECK-NEXT:  entry:
>> >>>>  ; CHECK-NEXT:    [[RANGE20_I:%.*]] = getelementptr inbounds [[STRUCT_CLZMADEC_1_28_55_82_103_124_145_166_181_196_229_259_334:%.*]], %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334* [[P:%.*]], i64 0, i32 4
>> >>>> -; CHECK-NEXT:    [[CODE21_I:%.*]] = getelementptr inbounds [[STRUCT_CLZMADEC_1_28_55_82_103_124_145_166_181_196_229_259_334]], %struct.CLzmaDec.1.28.55.82.103.124.145.166.181.196.229.259.334* [[P]], i64 0, i32 5
>> >>>>  ; CHECK-NEXT:    br label [[DO_BODY66_I:%.*]]
>> >>>>  ; CHECK:       do.body66.i:
>> >>>> -; CHECK-NEXT:    [[RANGE_2_I:%.*]] = phi i32 [ [[RANGE_4_I:%.*]], [[DO_COND_I:%.*]] ], [ undef, [[ENTRY:%.*]] ]
>> >>>> -; CHECK-NEXT:    [[CODE_2_I:%.*]] = phi i32 [ [[CODE_4_I:%.*]], [[DO_COND_I]] ], [ undef, [[ENTRY]] ]
>> >>>> -; CHECK-NEXT:    [[DOTRANGE_2_I:%.*]] = select i1 undef, i32 undef, i32 [[RANGE_2_I]]
>> >>>> -; CHECK-NEXT:    [[DOTCODE_2_I:%.*]] = select i1 undef, i32 undef, i32 [[CODE_2_I]]
>> >>>> +; CHECK-NEXT:    [[TMP0:%.*]] = phi <2 x i32> [ [[TMP5:%.*]], [[DO_COND_I:%.*]] ], [ undef, [[ENTRY:%.*]] ]
>> >>>> +; CHECK-NEXT:    [[TMP1:%.*]] = select <2 x i1> undef, <2 x i32> undef, <2 x i32> [[TMP0]]
>> >>>> +; CHECK-NEXT:    [[TMP2:%.*]] = extractelement <2 x i32> [[TMP1]], i32 1
>> >>>> +; CHECK-NEXT:    [[TMP3:%.*]] = insertelement <2 x i32> undef, i32 [[TMP2]], i32 1
>> >>>>  ; CHECK-NEXT:    br i1 undef, label [[DO_COND_I]], label [[IF_ELSE_I:%.*]]
>> >>>>  ; CHECK:       if.else.i:
>> >>>> -; CHECK-NEXT:    [[SUB91_I:%.*]] = sub i32 [[DOTRANGE_2_I]], undef
>> >>>> -; CHECK-NEXT:    [[SUB92_I:%.*]] = sub i32 [[DOTCODE_2_I]], undef
>> >>>> +; CHECK-NEXT:    [[TMP4:%.*]] = sub <2 x i32> [[TMP1]], undef
>> >>>>  ; CHECK-NEXT:    br label [[DO_COND_I]]
>> >>>>  ; CHECK:       do.cond.i:
>> >>>> -; CHECK-NEXT:    [[RANGE_4_I]] = phi i32 [ [[SUB91_I]], [[IF_ELSE_I]] ], [ undef, [[DO_BODY66_I]] ]
>> >>>> -; CHECK-NEXT:    [[CODE_4_I]] = phi i32 [ [[SUB92_I]], [[IF_ELSE_I]] ], [ [[DOTCODE_2_I]], [[DO_BODY66_I]] ]
>> >>>> +; CHECK-NEXT:    [[TMP5]] = phi <2 x i32> [ [[TMP4]], [[IF_ELSE_I]] ], [ [[TMP3]], [[DO_BODY66_I]] ]
>> >>>>  ; CHECK-NEXT:    br i1 undef, label [[DO_BODY66_I]], label [[DO_END1006_I:%.*]]
>> >>>>  ; CHECK:       do.end1006.i:
>> >>>> -; CHECK-NEXT:    [[DOTRANGE_4_I:%.*]] = select i1 undef, i32 undef, i32 [[RANGE_4_I]]
>> >>>> -; CHECK-NEXT:    [[DOTCODE_4_I:%.*]] = select i1 undef, i32 undef, i32 [[CODE_4_I]]
>> >>>> -; CHECK-NEXT:    store i32 [[DOTRANGE_4_I]], i32* [[RANGE20_I]], align 4
>> >>>> -; CHECK-NEXT:    store i32 [[DOTCODE_4_I]], i32* [[CODE21_I]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP6:%.*]] = select <2 x i1> undef, <2 x i32> undef, <2 x i32> [[TMP5]]
>> >>>> +; CHECK-NEXT:    [[TMP7:%.*]] = bitcast i32* [[RANGE20_I]] to <2 x i32>*
>> >>>> +; CHECK-NEXT:    store <2 x i32> [[TMP6]], <2 x i32>* [[TMP7]], align 4
>> >>>>  ; CHECK-NEXT:    ret void
>> >>>>  ;
>> >>>>  entry:
>> >>>>
>> >>>> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/crash_bullet.ll
>> >>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/crash_bullet.ll?rev=353923&r1=353922&r2=353923&view=diff
>> >>>> ==============================================================================
>> >>>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/crash_bullet.ll (original)
>> >>>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/crash_bullet.ll Wed Feb 13 00:26:43 2019
>> >>>> @@ -14,23 +14,18 @@ define void @_ZN23btGeneric6DofConstrain
>> >>>>  ; CHECK-NEXT:    ret void
>> >>>>  ; CHECK:       if.else:
>> >>>>  ; CHECK-NEXT:    [[M_NUMCONSTRAINTROWS4:%.*]] = getelementptr inbounds %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960", %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960"* [[INFO:%.*]], i64 0, i32 0
>> >>>> -; CHECK-NEXT:    [[NUB5:%.*]] = getelementptr inbounds %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960", %"struct.btTypedConstraint::btConstraintInfo1.17.157.357.417.477.960"* [[INFO]], i64 0, i32 1
>> >>>>  ; CHECK-NEXT:    br i1 undef, label [[LAND_LHS_TRUE_I_1:%.*]], label [[IF_THEN7_1:%.*]]
>> >>>>  ; CHECK:       land.lhs.true.i.1:
>> >>>>  ; CHECK-NEXT:    br i1 undef, label [[FOR_INC_1:%.*]], label [[IF_THEN7_1]]
>> >>>>  ; CHECK:       if.then7.1:
>> >>>> -; CHECK-NEXT:    [[INC_1:%.*]] = add nsw i32 0, 1
>> >>>> -; CHECK-NEXT:    store i32 [[INC_1]], i32* [[M_NUMCONSTRAINTROWS4]], align 4
>> >>>> -; CHECK-NEXT:    [[DEC_1:%.*]] = add nsw i32 6, -1
>> >>>> -; CHECK-NEXT:    store i32 [[DEC_1]], i32* [[NUB5]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP0:%.*]] = bitcast i32* [[M_NUMCONSTRAINTROWS4]] to <2 x i32>*
>> >>>> +; CHECK-NEXT:    store <2 x i32> <i32 1, i32 5>, <2 x i32>* [[TMP0]], align 4
>> >>>>  ; CHECK-NEXT:    br label [[FOR_INC_1]]
>> >>>>  ; CHECK:       for.inc.1:
>> >>>> -; CHECK-NEXT:    [[TMP0:%.*]] = phi i32 [ [[DEC_1]], [[IF_THEN7_1]] ], [ 6, [[LAND_LHS_TRUE_I_1]] ]
>> >>>> -; CHECK-NEXT:    [[TMP1:%.*]] = phi i32 [ [[INC_1]], [[IF_THEN7_1]] ], [ 0, [[LAND_LHS_TRUE_I_1]] ]
>> >>>> -; CHECK-NEXT:    [[INC_2:%.*]] = add nsw i32 [[TMP1]], 1
>> >>>> -; CHECK-NEXT:    store i32 [[INC_2]], i32* [[M_NUMCONSTRAINTROWS4]], align 4
>> >>>> -; CHECK-NEXT:    [[DEC_2:%.*]] = add nsw i32 [[TMP0]], -1
>> >>>> -; CHECK-NEXT:    store i32 [[DEC_2]], i32* [[NUB5]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP1:%.*]] = phi <2 x i32> [ <i32 1, i32 5>, [[IF_THEN7_1]] ], [ <i32 0, i32 6>, [[LAND_LHS_TRUE_I_1]] ]
>> >>>> +; CHECK-NEXT:    [[TMP2:%.*]] = add nsw <2 x i32> <i32 1, i32 -1>, [[TMP1]]
>> >>>> +; CHECK-NEXT:    [[TMP3:%.*]] = bitcast i32* [[M_NUMCONSTRAINTROWS4]] to <2 x i32>*
>> >>>> +; CHECK-NEXT:    store <2 x i32> [[TMP2]], <2 x i32>* [[TMP3]], align 4
>> >>>>  ; CHECK-NEXT:    unreachable
>> >>>>  ;
>> >>>>  entry:
>> >>>> @@ -74,15 +69,14 @@ define void @_ZN30GIM_TRIANGLE_CALCULATI
>> >>>>  ; CHECK-NEXT:    [[ARRAYIDX26:%.*]] = getelementptr inbounds [[CLASS_GIM_TRIANGLE_CALCULATION_CACHE_9_34_69_94_119_144_179_189_264_284_332:%.*]], %class.GIM_TRIANGLE_CALCULATION_CACHE.9.34.69.94.119.144.179.189.264.284.332* [[THIS:%.*]], i64 0, i32 2, i64 0, i32 0, i64 1
>> >>>>  ; CHECK-NEXT:    [[ARRAYIDX36:%.*]] = getelementptr inbounds [[CLASS_GIM_TRIANGLE_CALCULATION_CACHE_9_34_69_94_119_144_179_189_264_284_332]], %class.GIM_TRIANGLE_CALCULATION_CACHE.9.34.69.94.119.144.179.189.264.284.332* [[THIS]], i64 0, i32 2, i64 0, i32 0, i64 2
>> >>>>  ; CHECK-NEXT:    [[TMP0:%.*]] = load float, float* [[ARRAYIDX36]], align 4
>> >>>> -; CHECK-NEXT:    [[ADD587:%.*]] = fadd float undef, undef
>> >>>> -; CHECK-NEXT:    [[SUB600:%.*]] = fsub float [[ADD587]], undef
>> >>>> -; CHECK-NEXT:    store float [[SUB600]], float* undef, align 4
>> >>>> -; CHECK-NEXT:    [[SUB613:%.*]] = fsub float [[ADD587]], [[SUB600]]
>> >>>> -; CHECK-NEXT:    store float [[SUB613]], float* [[ARRAYIDX26]], align 4
>> >>>> -; CHECK-NEXT:    [[ADD626:%.*]] = fadd float [[TMP0]], undef
>> >>>> -; CHECK-NEXT:    [[SUB639:%.*]] = fsub float [[ADD626]], undef
>> >>>> -; CHECK-NEXT:    [[SUB652:%.*]] = fsub float [[ADD626]], [[SUB639]]
>> >>>> -; CHECK-NEXT:    store float [[SUB652]], float* [[ARRAYIDX36]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP1:%.*]] = insertelement <2 x float> undef, float [[TMP0]], i32 1
>> >>>> +; CHECK-NEXT:    [[TMP2:%.*]] = fadd <2 x float> [[TMP1]], undef
>> >>>> +; CHECK-NEXT:    [[TMP3:%.*]] = fsub <2 x float> [[TMP2]], undef
>> >>>> +; CHECK-NEXT:    [[TMP4:%.*]] = extractelement <2 x float> [[TMP3]], i32 0
>> >>>> +; CHECK-NEXT:    store float [[TMP4]], float* undef, align 4
>> >>>> +; CHECK-NEXT:    [[TMP5:%.*]] = fsub <2 x float> [[TMP2]], [[TMP3]]
>> >>>> +; CHECK-NEXT:    [[TMP6:%.*]] = bitcast float* [[ARRAYIDX26]] to <2 x float>*
>> >>>> +; CHECK-NEXT:    store <2 x float> [[TMP5]], <2 x float>* [[TMP6]], align 4
>> >>>>  ; CHECK-NEXT:    br i1 undef, label [[IF_ELSE1609:%.*]], label [[IF_THEN1595:%.*]]
>> >>>>  ; CHECK:       if.then1595:
>> >>>>  ; CHECK-NEXT:    br i1 undef, label [[RETURN:%.*]], label [[FOR_BODY_LR_PH_I_I1702:%.*]]
>> >>>>
>> >>>> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/crash_bullet3.ll
>> >>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/crash_bullet3.ll?rev=353923&r1=353922&r2=353923&view=diff
>> >>>> ==============================================================================
>> >>>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/crash_bullet3.ll (original)
>> >>>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/crash_bullet3.ll Wed Feb 13 00:26:43 2019
>> >>>> @@ -24,34 +24,30 @@ define void @_ZN11HullLibrary15CleanupVe
>> >>>>  ; CHECK:       for.body233:
>> >>>>  ; CHECK-NEXT:    br i1 undef, label [[FOR_BODY233]], label [[FOR_END271]]
>> >>>>  ; CHECK:       for.end271:
>> >>>> -; CHECK-NEXT:    [[TMP0:%.*]] = phi float [ 0x47EFFFFFE0000000, [[FOR_END227]] ], [ undef, [[FOR_BODY233]] ]
>> >>>> -; CHECK-NEXT:    [[TMP1:%.*]] = phi float [ 0x47EFFFFFE0000000, [[FOR_END227]] ], [ undef, [[FOR_BODY233]] ]
>> >>>> -; CHECK-NEXT:    [[SUB275:%.*]] = fsub float undef, [[TMP1]]
>> >>>> -; CHECK-NEXT:    [[SUB279:%.*]] = fsub float undef, [[TMP0]]
>> >>>> +; CHECK-NEXT:    [[TMP0:%.*]] = phi <2 x float> [ <float 0x47EFFFFFE0000000, float 0x47EFFFFFE0000000>, [[FOR_END227]] ], [ undef, [[FOR_BODY233]] ]
>> >>>> +; CHECK-NEXT:    [[TMP1:%.*]] = fsub <2 x float> undef, [[TMP0]]
>> >>>>  ; CHECK-NEXT:    br i1 undef, label [[IF_THEN291:%.*]], label [[RETURN]]
>> >>>>  ; CHECK:       if.then291:
>> >>>> -; CHECK-NEXT:    [[MUL292:%.*]] = fmul float [[SUB275]], 5.000000e-01
>> >>>> -; CHECK-NEXT:    [[ADD294:%.*]] = fadd float [[TMP1]], [[MUL292]]
>> >>>> -; CHECK-NEXT:    [[MUL295:%.*]] = fmul float [[SUB279]], 5.000000e-01
>> >>>> -; CHECK-NEXT:    [[ADD297:%.*]] = fadd float [[TMP0]], [[MUL295]]
>> >>>> +; CHECK-NEXT:    [[TMP2:%.*]] = fmul <2 x float> <float 5.000000e-01, float 5.000000e-01>, [[TMP1]]
>> >>>> +; CHECK-NEXT:    [[TMP3:%.*]] = fadd <2 x float> [[TMP0]], [[TMP2]]
>> >>>>  ; CHECK-NEXT:    br i1 undef, label [[IF_END332:%.*]], label [[IF_ELSE319:%.*]]
>> >>>>  ; CHECK:       if.else319:
>> >>>>  ; CHECK-NEXT:    br i1 undef, label [[IF_THEN325:%.*]], label [[IF_END327:%.*]]
>> >>>>  ; CHECK:       if.then325:
>> >>>>  ; CHECK-NEXT:    br label [[IF_END327]]
>> >>>>  ; CHECK:       if.end327:
>> >>>> +; CHECK-NEXT:    [[TMP4:%.*]] = extractelement <2 x float> [[TMP1]], i32 0
>> >>>> +; CHECK-NEXT:    [[TMP5:%.*]] = insertelement <2 x float> undef, float [[TMP4]], i32 0
>> >>>> +; CHECK-NEXT:    [[TMP6:%.*]] = insertelement <2 x float> [[TMP5]], float undef, i32 1
>> >>>>  ; CHECK-NEXT:    br i1 undef, label [[IF_THEN329:%.*]], label [[IF_END332]]
>> >>>>  ; CHECK:       if.then329:
>> >>>>  ; CHECK-NEXT:    br label [[IF_END332]]
>> >>>>  ; CHECK:       if.end332:
>> >>>> -; CHECK-NEXT:    [[DX272_1:%.*]] = phi float [ [[SUB275]], [[IF_THEN329]] ], [ [[SUB275]], [[IF_END327]] ], [ 0x3F847AE140000000, [[IF_THEN291]] ]
>> >>>> -; CHECK-NEXT:    [[DY276_1:%.*]] = phi float [ undef, [[IF_THEN329]] ], [ undef, [[IF_END327]] ], [ 0x3F847AE140000000, [[IF_THEN291]] ]
>> >>>> -; CHECK-NEXT:    [[SUB334:%.*]] = fsub float [[ADD294]], [[DX272_1]]
>> >>>> -; CHECK-NEXT:    [[SUB338:%.*]] = fsub float [[ADD297]], [[DY276_1]]
>> >>>> +; CHECK-NEXT:    [[TMP7:%.*]] = phi <2 x float> [ [[TMP6]], [[IF_THEN329]] ], [ [[TMP6]], [[IF_END327]] ], [ <float 0x3F847AE140000000, float 0x3F847AE140000000>, [[IF_THEN291]] ]
>> >>>> +; CHECK-NEXT:    [[TMP8:%.*]] = fsub <2 x float> [[TMP3]], [[TMP7]]
>> >>>>  ; CHECK-NEXT:    [[ARRAYIDX_I_I606:%.*]] = getelementptr inbounds [[CLASS_BTVECTOR3_23_221_463_485_507_573_595_683_727_749_815_837_991_1585_1607_1629_1651_1849_2047_2069_2091_2113:%.*]], %class.btVector3.23.221.463.485.507.573.595.683.727.749.815.837.991.1585.1607.1629.1651.1849.2047.2069.2091.2113* [[VERTICES:%.*]], i64 0, i32 0, i64 0
>> >>>> -; CHECK-NEXT:    store float [[SUB334]], float* [[ARRAYIDX_I_I606]], align 4
>> >>>> -; CHECK-NEXT:    [[ARRAYIDX3_I607:%.*]] = getelementptr inbounds [[CLASS_BTVECTOR3_23_221_463_485_507_573_595_683_727_749_815_837_991_1585_1607_1629_1651_1849_2047_2069_2091_2113]], %class.btVector3.23.221.463.485.507.573.595.683.727.749.815.837.991.1585.1607.1629.1651.1849.2047.2069.2091.2113* [[VERTICES]], i64 0, i32 0, i64 1
>> >>>> -; CHECK-NEXT:    store float [[SUB338]], float* [[ARRAYIDX3_I607]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP9:%.*]] = bitcast float* [[ARRAYIDX_I_I606]] to <2 x float>*
>> >>>> +; CHECK-NEXT:    store <2 x float> [[TMP8]], <2 x float>* [[TMP9]], align 4
>> >>>>  ; CHECK-NEXT:    br label [[RETURN]]
>> >>>>  ; CHECK:       return:
>> >>>>  ; CHECK-NEXT:    ret void
>> >>>>
>> >>>> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/crash_cmpop.ll
>> >>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/crash_cmpop.ll?rev=353923&r1=353922&r2=353923&view=diff
>> >>>> ==============================================================================
>> >>>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/crash_cmpop.ll (original)
>> >>>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/crash_cmpop.ll Wed Feb 13 00:26:43 2019
>> >>>> @@ -55,35 +55,32 @@ define void @testfunc(float* nocapture %
>> >>>>  ; AVX:       for.body:
>> >>>>  ; AVX-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
>> >>>>  ; AVX-NEXT:    [[ACC1_056:%.*]] = phi float [ 0.000000e+00, [[ENTRY]] ], [ [[ADD13:%.*]], [[FOR_BODY]] ]
>> >>>> -; AVX-NEXT:    [[TMP0:%.*]] = phi <2 x float> [ zeroinitializer, [[ENTRY]] ], [ [[TMP23:%.*]], [[FOR_BODY]] ]
>> >>>> +; AVX-NEXT:    [[TMP0:%.*]] = phi <2 x float> [ zeroinitializer, [[ENTRY]] ], [ [[TMP19:%.*]], [[FOR_BODY]] ]
>> >>>>  ; AVX-NEXT:    [[ARRAYIDX:%.*]] = getelementptr inbounds float, float* [[SRC:%.*]], i64 [[INDVARS_IV]]
>> >>>>  ; AVX-NEXT:    [[TMP1:%.*]] = load float, float* [[ARRAYIDX]], align 4
>> >>>>  ; AVX-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
>> >>>>  ; AVX-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds float, float* [[DEST:%.*]], i64 [[INDVARS_IV]]
>> >>>>  ; AVX-NEXT:    store float [[ACC1_056]], float* [[ARRAYIDX2]], align 4
>> >>>> -; AVX-NEXT:    [[TMP2:%.*]] = extractelement <2 x float> [[TMP0]], i32 1
>> >>>> -; AVX-NEXT:    [[TMP3:%.*]] = insertelement <2 x float> undef, float [[TMP2]], i32 0
>> >>>> -; AVX-NEXT:    [[TMP4:%.*]] = extractelement <2 x float> [[TMP0]], i32 0
>> >>>> -; AVX-NEXT:    [[TMP5:%.*]] = insertelement <2 x float> [[TMP3]], float [[TMP4]], i32 1
>> >>>> -; AVX-NEXT:    [[TMP6:%.*]] = insertelement <2 x float> undef, float [[TMP1]], i32 0
>> >>>> -; AVX-NEXT:    [[TMP7:%.*]] = insertelement <2 x float> [[TMP6]], float [[TMP1]], i32 1
>> >>>> -; AVX-NEXT:    [[TMP8:%.*]] = fadd <2 x float> [[TMP5]], [[TMP7]]
>> >>>> -; AVX-NEXT:    [[TMP9:%.*]] = fmul <2 x float> zeroinitializer, [[TMP0]]
>> >>>> -; AVX-NEXT:    [[TMP10:%.*]] = fadd <2 x float> [[TMP9]], [[TMP8]]
>> >>>> -; AVX-NEXT:    [[TMP11:%.*]] = fcmp olt <2 x float> [[TMP10]], <float 1.000000e+00, float 1.000000e+00>
>> >>>> -; AVX-NEXT:    [[TMP12:%.*]] = select <2 x i1> [[TMP11]], <2 x float> [[TMP10]], <2 x float> <float 1.000000e+00, float 1.000000e+00>
>> >>>> -; AVX-NEXT:    [[TMP13:%.*]] = fcmp olt <2 x float> [[TMP12]], <float -1.000000e+00, float -1.000000e+00>
>> >>>> -; AVX-NEXT:    [[TMP14:%.*]] = fmul <2 x float> zeroinitializer, [[TMP12]]
>> >>>> -; AVX-NEXT:    [[TMP15:%.*]] = select <2 x i1> [[TMP13]], <2 x float> <float -0.000000e+00, float -0.000000e+00>, <2 x float> [[TMP14]]
>> >>>> -; AVX-NEXT:    [[TMP16:%.*]] = extractelement <2 x float> [[TMP15]], i32 0
>> >>>> -; AVX-NEXT:    [[TMP17:%.*]] = extractelement <2 x float> [[TMP15]], i32 1
>> >>>> -; AVX-NEXT:    [[ADD13]] = fadd float [[TMP16]], [[TMP17]]
>> >>>> -; AVX-NEXT:    [[TMP18:%.*]] = insertelement <2 x float> undef, float [[TMP17]], i32 0
>> >>>> -; AVX-NEXT:    [[TMP19:%.*]] = insertelement <2 x float> [[TMP18]], float [[ADD13]], i32 1
>> >>>> -; AVX-NEXT:    [[TMP20:%.*]] = fcmp olt <2 x float> [[TMP19]], <float 1.000000e+00, float 1.000000e+00>
>> >>>> -; AVX-NEXT:    [[TMP21:%.*]] = select <2 x i1> [[TMP20]], <2 x float> [[TMP19]], <2 x float> <float 1.000000e+00, float 1.000000e+00>
>> >>>> -; AVX-NEXT:    [[TMP22:%.*]] = fcmp olt <2 x float> [[TMP21]], <float -1.000000e+00, float -1.000000e+00>
>> >>>> -; AVX-NEXT:    [[TMP23]] = select <2 x i1> [[TMP22]], <2 x float> <float -1.000000e+00, float -1.000000e+00>, <2 x float> [[TMP21]]
>> >>>> +; AVX-NEXT:    [[REORDER_SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP0]], <2 x float> undef, <2 x i32> <i32 1, i32 0>
>> >>>> +; AVX-NEXT:    [[TMP2:%.*]] = insertelement <2 x float> undef, float [[TMP1]], i32 0
>> >>>> +; AVX-NEXT:    [[TMP3:%.*]] = insertelement <2 x float> [[TMP2]], float [[TMP1]], i32 1
>> >>>> +; AVX-NEXT:    [[TMP4:%.*]] = fadd <2 x float> [[REORDER_SHUFFLE]], [[TMP3]]
>> >>>> +; AVX-NEXT:    [[TMP5:%.*]] = fmul <2 x float> zeroinitializer, [[TMP0]]
>> >>>> +; AVX-NEXT:    [[TMP6:%.*]] = fadd <2 x float> [[TMP5]], [[TMP4]]
>> >>>> +; AVX-NEXT:    [[TMP7:%.*]] = fcmp olt <2 x float> [[TMP6]], <float 1.000000e+00, float 1.000000e+00>
>> >>>> +; AVX-NEXT:    [[TMP8:%.*]] = select <2 x i1> [[TMP7]], <2 x float> [[TMP6]], <2 x float> <float 1.000000e+00, float 1.000000e+00>
>> >>>> +; AVX-NEXT:    [[TMP9:%.*]] = fcmp olt <2 x float> [[TMP8]], <float -1.000000e+00, float -1.000000e+00>
>> >>>> +; AVX-NEXT:    [[TMP10:%.*]] = fmul <2 x float> zeroinitializer, [[TMP8]]
>> >>>> +; AVX-NEXT:    [[TMP11:%.*]] = select <2 x i1> [[TMP9]], <2 x float> <float -0.000000e+00, float -0.000000e+00>, <2 x float> [[TMP10]]
>> >>>> +; AVX-NEXT:    [[TMP12:%.*]] = extractelement <2 x float> [[TMP11]], i32 0
>> >>>> +; AVX-NEXT:    [[TMP13:%.*]] = extractelement <2 x float> [[TMP11]], i32 1
>> >>>> +; AVX-NEXT:    [[ADD13]] = fadd float [[TMP12]], [[TMP13]]
>> >>>> +; AVX-NEXT:    [[TMP14:%.*]] = insertelement <2 x float> undef, float [[TMP13]], i32 0
>> >>>> +; AVX-NEXT:    [[TMP15:%.*]] = insertelement <2 x float> [[TMP14]], float [[ADD13]], i32 1
>> >>>> +; AVX-NEXT:    [[TMP16:%.*]] = fcmp olt <2 x float> [[TMP15]], <float 1.000000e+00, float 1.000000e+00>
>> >>>> +; AVX-NEXT:    [[TMP17:%.*]] = select <2 x i1> [[TMP16]], <2 x float> [[TMP15]], <2 x float> <float 1.000000e+00, float 1.000000e+00>
>> >>>> +; AVX-NEXT:    [[TMP18:%.*]] = fcmp olt <2 x float> [[TMP17]], <float -1.000000e+00, float -1.000000e+00>
>> >>>> +; AVX-NEXT:    [[TMP19]] = select <2 x i1> [[TMP18]], <2 x float> <float -1.000000e+00, float -1.000000e+00>, <2 x float> [[TMP17]]
>> >>>>  ; AVX-NEXT:    [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], 32
>> >>>>  ; AVX-NEXT:    br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[FOR_BODY]]
>> >>>>  ; AVX:       for.end:
>> >>>>
>> >>>> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/crash_sim4b1.ll
>> >>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/crash_sim4b1.ll?rev=353923&r1=353922&r2=353923&view=diff
>> >>>> ==============================================================================
>> >>>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/crash_sim4b1.ll (original)
>> >>>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/crash_sim4b1.ll Wed Feb 13 00:26:43 2019
>> >>>> @@ -27,25 +27,24 @@ define void @SIM4() {
>> >>>>  ; CHECK:       land.rhs.lr.ph:
>> >>>>  ; CHECK-NEXT:    unreachable
>> >>>>  ; CHECK:       if.end98:
>> >>>> -; CHECK-NEXT:    [[FROM299:%.*]] = getelementptr inbounds [[STRUCT__EXON_T_12_103_220_363_480_649_740_857_1039_1065_1078_1091_1117_1130_1156_1169_1195_1221_1234_1286_1299_1312_1338_1429_1455_1468_1494_1520_1884_1897_1975_2066_2105_2170_2171:%.*]], %struct._exon_t.12.103.220.363.480.649.740.857.1039.1065.1078.1091.1117.1130.1156.1169.1195.1221.1234.1286.1299.1312.1338.1429.1455.1468.1494.1520.1884.1897.1975.2066.2105.2170.2171* undef, i64 0, i32 1
>> >>>>  ; CHECK-NEXT:    br i1 undef, label [[LAND_LHS_TRUE167]], label [[IF_THEN103:%.*]]
>> >>>>  ; CHECK:       if.then103:
>> >>>>  ; CHECK-NEXT:    [[DOTSUB100:%.*]] = select i1 undef, i32 250, i32 undef
>> >>>>  ; CHECK-NEXT:    [[MUL114:%.*]] = shl nsw i32 [[DOTSUB100]], 2
>> >>>> -; CHECK-NEXT:    [[FROM1115:%.*]] = getelementptr inbounds [[STRUCT__EXON_T_12_103_220_363_480_649_740_857_1039_1065_1078_1091_1117_1130_1156_1169_1195_1221_1234_1286_1299_1312_1338_1429_1455_1468_1494_1520_1884_1897_1975_2066_2105_2170_2171]], %struct._exon_t.12.103.220.363.480.649.740.857.1039.1065.1078.1091.1117.1130.1156.1169.1195.1221.1234.1286.1299.1312.1338.1429.1455.1468.1494.1520.1884.1897.1975.2066.2105.2170.2171* undef, i64 0, i32 0
>> >>>> +; CHECK-NEXT:    [[FROM1115:%.*]] = getelementptr inbounds [[STRUCT__EXON_T_12_103_220_363_480_649_740_857_1039_1065_1078_1091_1117_1130_1156_1169_1195_1221_1234_1286_1299_1312_1338_1429_1455_1468_1494_1520_1884_1897_1975_2066_2105_2170_2171:%.*]], %struct._exon_t.12.103.220.363.480.649.740.857.1039.1065.1078.1091.1117.1130.1156.1169.1195.1221.1234.1286.1299.1312.1338.1429.1455.1468.1494.1520.1884.1897.1975.2066.2105.2170.2171* undef, i64 0, i32 0
>> >>>>  ; CHECK-NEXT:    [[COND125:%.*]] = select i1 undef, i32 undef, i32 [[MUL114]]
>> >>>> +; CHECK-NEXT:    [[TMP0:%.*]] = insertelement <2 x i32> undef, i32 [[COND125]], i32 0
>> >>>> +; CHECK-NEXT:    [[TMP1:%.*]] = insertelement <2 x i32> [[TMP0]], i32 [[DOTSUB100]], i32 1
>> >>>>  ; CHECK-NEXT:    br label [[FOR_COND_I:%.*]]
>> >>>>  ; CHECK:       for.cond.i:
>> >>>> -; CHECK-NEXT:    [[ROW_0_I:%.*]] = phi i32 [ undef, [[LAND_RHS_I874:%.*]] ], [ [[DOTSUB100]], [[IF_THEN103]] ]
>> >>>> -; CHECK-NEXT:    [[COL_0_I:%.*]] = phi i32 [ undef, [[LAND_RHS_I874]] ], [ [[COND125]], [[IF_THEN103]] ]
>> >>>> +; CHECK-NEXT:    [[TMP2:%.*]] = phi <2 x i32> [ undef, [[LAND_RHS_I874:%.*]] ], [ [[TMP1]], [[IF_THEN103]] ]
>> >>>>  ; CHECK-NEXT:    br i1 undef, label [[LAND_RHS_I874]], label [[FOR_END_I:%.*]]
>> >>>>  ; CHECK:       land.rhs.i874:
>> >>>>  ; CHECK-NEXT:    br i1 undef, label [[FOR_COND_I]], label [[FOR_END_I]]
>> >>>>  ; CHECK:       for.end.i:
>> >>>>  ; CHECK-NEXT:    br i1 undef, label [[IF_THEN_I:%.*]], label [[IF_END_I:%.*]]
>> >>>>  ; CHECK:       if.then.i:
>> >>>> -; CHECK-NEXT:    [[ADD14_I:%.*]] = add nsw i32 [[ROW_0_I]], undef
>> >>>> -; CHECK-NEXT:    [[ADD15_I:%.*]] = add nsw i32 [[COL_0_I]], undef
>> >>>> +; CHECK-NEXT:    [[TMP3:%.*]] = add nsw <2 x i32> undef, [[TMP2]]
>> >>>>  ; CHECK-NEXT:    br label [[EXTEND_BW_EXIT:%.*]]
>> >>>>  ; CHECK:       if.end.i:
>> >>>>  ; CHECK-NEXT:    [[ADD16_I:%.*]] = add i32 [[COND125]], [[DOTSUB100]]
>> >>>> @@ -66,14 +65,12 @@ define void @SIM4() {
>> >>>>  ; CHECK:       while.end275.i:
>> >>>>  ; CHECK-NEXT:    br label [[EXTEND_BW_EXIT]]
>> >>>>  ; CHECK:       extend_bw.exit:
>> >>>> -; CHECK-NEXT:    [[ADD14_I1262:%.*]] = phi i32 [ [[ADD14_I]], [[IF_THEN_I]] ], [ undef, [[WHILE_END275_I]] ]
>> >>>> -; CHECK-NEXT:    [[ADD15_I1261:%.*]] = phi i32 [ [[ADD15_I]], [[IF_THEN_I]] ], [ undef, [[WHILE_END275_I]] ]
>> >>>> +; CHECK-NEXT:    [[TMP4:%.*]] = phi <2 x i32> [ [[TMP3]], [[IF_THEN_I]] ], [ undef, [[WHILE_END275_I]] ]
>> >>>>  ; CHECK-NEXT:    br i1 false, label [[IF_THEN157:%.*]], label [[LAND_LHS_TRUE167]]
>> >>>>  ; CHECK:       if.then157:
>> >>>> -; CHECK-NEXT:    [[ADD158:%.*]] = add nsw i32 [[ADD14_I1262]], 1
>> >>>> -; CHECK-NEXT:    store i32 [[ADD158]], i32* [[FROM299]], align 4
>> >>>> -; CHECK-NEXT:    [[ADD160:%.*]] = add nsw i32 [[ADD15_I1261]], 1
>> >>>> -; CHECK-NEXT:    store i32 [[ADD160]], i32* [[FROM1115]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP5:%.*]] = add nsw <2 x i32> <i32 1, i32 1>, [[TMP4]]
>> >>>> +; CHECK-NEXT:    [[TMP6:%.*]] = bitcast i32* [[FROM1115]] to <2 x i32>*
>> >>>> +; CHECK-NEXT:    store <2 x i32> [[TMP5]], <2 x i32>* [[TMP6]], align 4
>> >>>>  ; CHECK-NEXT:    br label [[LAND_LHS_TRUE167]]
>> >>>>  ; CHECK:       land.lhs.true167:
>> >>>>  ; CHECK-NEXT:    unreachable
>> >>>>
>> >>>> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/fptosi.ll
>> >>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/fptosi.ll?rev=353923&r1=353922&r2=353923&view=diff
>> >>>> ==============================================================================
>> >>>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/fptosi.ll (original)
>> >>>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/fptosi.ll Wed Feb 13 00:26:43 2019
>> >>>> @@ -221,32 +221,38 @@ define void @fptosi_8f64_8i16() #0 {
>> >>>>  }
>> >>>>
>> >>>>  define void @fptosi_8f64_8i8() #0 {
>> >>>> -; CHECK-LABEL: @fptosi_8f64_8i8(
>> >>>> -; CHECK-NEXT:    [[A0:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
>> >>>> -; CHECK-NEXT:    [[A1:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
>> >>>> -; CHECK-NEXT:    [[A2:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
>> >>>> -; CHECK-NEXT:    [[A3:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
>> >>>> -; CHECK-NEXT:    [[A4:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
>> >>>> -; CHECK-NEXT:    [[A5:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
>> >>>> -; CHECK-NEXT:    [[A6:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
>> >>>> -; CHECK-NEXT:    [[A7:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 7), align 8
>> >>>> -; CHECK-NEXT:    [[CVT0:%.*]] = fptosi double [[A0]] to i8
>> >>>> -; CHECK-NEXT:    [[CVT1:%.*]] = fptosi double [[A1]] to i8
>> >>>> -; CHECK-NEXT:    [[CVT2:%.*]] = fptosi double [[A2]] to i8
>> >>>> -; CHECK-NEXT:    [[CVT3:%.*]] = fptosi double [[A3]] to i8
>> >>>> -; CHECK-NEXT:    [[CVT4:%.*]] = fptosi double [[A4]] to i8
>> >>>> -; CHECK-NEXT:    [[CVT5:%.*]] = fptosi double [[A5]] to i8
>> >>>> -; CHECK-NEXT:    [[CVT6:%.*]] = fptosi double [[A6]] to i8
>> >>>> -; CHECK-NEXT:    [[CVT7:%.*]] = fptosi double [[A7]] to i8
>> >>>> -; CHECK-NEXT:    store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
>> >>>> -; CHECK-NEXT:    store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
>> >>>> -; CHECK-NEXT:    store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
>> >>>> -; CHECK-NEXT:    store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
>> >>>> -; CHECK-NEXT:    store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
>> >>>> -; CHECK-NEXT:    store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
>> >>>> -; CHECK-NEXT:    store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
>> >>>> -; CHECK-NEXT:    store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
>> >>>> -; CHECK-NEXT:    ret void
>> >>>> +; SSE-LABEL: @fptosi_8f64_8i8(
>> >>>> +; SSE-NEXT:    [[A0:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
>> >>>> +; SSE-NEXT:    [[A1:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
>> >>>> +; SSE-NEXT:    [[A2:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
>> >>>> +; SSE-NEXT:    [[A3:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
>> >>>> +; SSE-NEXT:    [[A4:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
>> >>>> +; SSE-NEXT:    [[A5:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
>> >>>> +; SSE-NEXT:    [[A6:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
>> >>>> +; SSE-NEXT:    [[A7:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 7), align 8
>> >>>> +; SSE-NEXT:    [[CVT0:%.*]] = fptosi double [[A0]] to i8
>> >>>> +; SSE-NEXT:    [[CVT1:%.*]] = fptosi double [[A1]] to i8
>> >>>> +; SSE-NEXT:    [[CVT2:%.*]] = fptosi double [[A2]] to i8
>> >>>> +; SSE-NEXT:    [[CVT3:%.*]] = fptosi double [[A3]] to i8
>> >>>> +; SSE-NEXT:    [[CVT4:%.*]] = fptosi double [[A4]] to i8
>> >>>> +; SSE-NEXT:    [[CVT5:%.*]] = fptosi double [[A5]] to i8
>> >>>> +; SSE-NEXT:    [[CVT6:%.*]] = fptosi double [[A6]] to i8
>> >>>> +; SSE-NEXT:    [[CVT7:%.*]] = fptosi double [[A7]] to i8
>> >>>> +; SSE-NEXT:    store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
>> >>>> +; SSE-NEXT:    store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
>> >>>> +; SSE-NEXT:    store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
>> >>>> +; SSE-NEXT:    store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
>> >>>> +; SSE-NEXT:    store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
>> >>>> +; SSE-NEXT:    store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
>> >>>> +; SSE-NEXT:    store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
>> >>>> +; SSE-NEXT:    store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
>> >>>> +; SSE-NEXT:    ret void
>> >>>> +;
>> >>>> +; AVX-LABEL: @fptosi_8f64_8i8(
>> >>>> +; AVX-NEXT:    [[TMP1:%.*]] = load <8 x double>, <8 x double>* bitcast ([8 x double]* @src64 to <8 x double>*), align 8
>> >>>> +; AVX-NEXT:    [[TMP2:%.*]] = fptosi <8 x double> [[TMP1]] to <8 x i8>
>> >>>> +; AVX-NEXT:    store <8 x i8> [[TMP2]], <8 x i8>* bitcast ([64 x i8]* @dst8 to <8 x i8>*), align 1
>> >>>> +; AVX-NEXT:    ret void
>> >>>>  ;
>> >>>>    %a0 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
>> >>>>    %a1 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
>> >>>> @@ -455,30 +461,9 @@ define void @fptosi_8f32_8i16() #0 {
>> >>>>
>> >>>>  define void @fptosi_8f32_8i8() #0 {
>> >>>>  ; CHECK-LABEL: @fptosi_8f32_8i8(
>> >>>> -; CHECK-NEXT:    [[A0:%.*]] = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4
>> >>>> -; CHECK-NEXT:    [[A1:%.*]] = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4
>> >>>> -; CHECK-NEXT:    [[A2:%.*]] = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4
>> >>>> -; CHECK-NEXT:    [[A3:%.*]] = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4
>> >>>> -; CHECK-NEXT:    [[A4:%.*]] = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4
>> >>>> -; CHECK-NEXT:    [[A5:%.*]] = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4
>> >>>> -; CHECK-NEXT:    [[A6:%.*]] = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4
>> >>>> -; CHECK-NEXT:    [[A7:%.*]] = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 7), align 4
>> >>>> -; CHECK-NEXT:    [[CVT0:%.*]] = fptosi float [[A0]] to i8
>> >>>> -; CHECK-NEXT:    [[CVT1:%.*]] = fptosi float [[A1]] to i8
>> >>>> -; CHECK-NEXT:    [[CVT2:%.*]] = fptosi float [[A2]] to i8
>> >>>> -; CHECK-NEXT:    [[CVT3:%.*]] = fptosi float [[A3]] to i8
>> >>>> -; CHECK-NEXT:    [[CVT4:%.*]] = fptosi float [[A4]] to i8
>> >>>> -; CHECK-NEXT:    [[CVT5:%.*]] = fptosi float [[A5]] to i8
>> >>>> -; CHECK-NEXT:    [[CVT6:%.*]] = fptosi float [[A6]] to i8
>> >>>> -; CHECK-NEXT:    [[CVT7:%.*]] = fptosi float [[A7]] to i8
>> >>>> -; CHECK-NEXT:    store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
>> >>>> -; CHECK-NEXT:    store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
>> >>>> -; CHECK-NEXT:    store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
>> >>>> -; CHECK-NEXT:    store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
>> >>>> -; CHECK-NEXT:    store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
>> >>>> -; CHECK-NEXT:    store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
>> >>>> -; CHECK-NEXT:    store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
>> >>>> -; CHECK-NEXT:    store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
>> >>>> +; CHECK-NEXT:    [[TMP1:%.*]] = load <8 x float>, <8 x float>* bitcast ([16 x float]* @src32 to <8 x float>*), align 4
>> >>>> +; CHECK-NEXT:    [[TMP2:%.*]] = fptosi <8 x float> [[TMP1]] to <8 x i8>
>> >>>> +; CHECK-NEXT:    store <8 x i8> [[TMP2]], <8 x i8>* bitcast ([64 x i8]* @dst8 to <8 x i8>*), align 1
>> >>>>  ; CHECK-NEXT:    ret void
>> >>>>  ;
>> >>>>    %a0 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4
>> >>>>
>> >>>> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/fptoui.ll
>> >>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/fptoui.ll?rev=353923&r1=353922&r2=353923&view=diff
>> >>>> ==============================================================================
>> >>>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/fptoui.ll (original)
>> >>>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/fptoui.ll Wed Feb 13 00:26:43 2019
>> >>>> @@ -305,32 +305,71 @@ define void @fptoui_8f64_8i16() #0 {
>> >>>>  }
>> >>>>
>> >>>>  define void @fptoui_8f64_8i8() #0 {
>> >>>> -; CHECK-LABEL: @fptoui_8f64_8i8(
>> >>>> -; CHECK-NEXT:    [[A0:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
>> >>>> -; CHECK-NEXT:    [[A1:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
>> >>>> -; CHECK-NEXT:    [[A2:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
>> >>>> -; CHECK-NEXT:    [[A3:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
>> >>>> -; CHECK-NEXT:    [[A4:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
>> >>>> -; CHECK-NEXT:    [[A5:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
>> >>>> -; CHECK-NEXT:    [[A6:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
>> >>>> -; CHECK-NEXT:    [[A7:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 7), align 8
>> >>>> -; CHECK-NEXT:    [[CVT0:%.*]] = fptoui double [[A0]] to i8
>> >>>> -; CHECK-NEXT:    [[CVT1:%.*]] = fptoui double [[A1]] to i8
>> >>>> -; CHECK-NEXT:    [[CVT2:%.*]] = fptoui double [[A2]] to i8
>> >>>> -; CHECK-NEXT:    [[CVT3:%.*]] = fptoui double [[A3]] to i8
>> >>>> -; CHECK-NEXT:    [[CVT4:%.*]] = fptoui double [[A4]] to i8
>> >>>> -; CHECK-NEXT:    [[CVT5:%.*]] = fptoui double [[A5]] to i8
>> >>>> -; CHECK-NEXT:    [[CVT6:%.*]] = fptoui double [[A6]] to i8
>> >>>> -; CHECK-NEXT:    [[CVT7:%.*]] = fptoui double [[A7]] to i8
>> >>>> -; CHECK-NEXT:    store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
>> >>>> -; CHECK-NEXT:    store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
>> >>>> -; CHECK-NEXT:    store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
>> >>>> -; CHECK-NEXT:    store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
>> >>>> -; CHECK-NEXT:    store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
>> >>>> -; CHECK-NEXT:    store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
>> >>>> -; CHECK-NEXT:    store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
>> >>>> -; CHECK-NEXT:    store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
>> >>>> -; CHECK-NEXT:    ret void
>> >>>> +; SSE-LABEL: @fptoui_8f64_8i8(
>> >>>> +; SSE-NEXT:    [[A0:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
>> >>>> +; SSE-NEXT:    [[A1:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
>> >>>> +; SSE-NEXT:    [[A2:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
>> >>>> +; SSE-NEXT:    [[A3:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
>> >>>> +; SSE-NEXT:    [[A4:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
>> >>>> +; SSE-NEXT:    [[A5:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
>> >>>> +; SSE-NEXT:    [[A6:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
>> >>>> +; SSE-NEXT:    [[A7:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 7), align 8
>> >>>> +; SSE-NEXT:    [[CVT0:%.*]] = fptoui double [[A0]] to i8
>> >>>> +; SSE-NEXT:    [[CVT1:%.*]] = fptoui double [[A1]] to i8
>> >>>> +; SSE-NEXT:    [[CVT2:%.*]] = fptoui double [[A2]] to i8
>> >>>> +; SSE-NEXT:    [[CVT3:%.*]] = fptoui double [[A3]] to i8
>> >>>> +; SSE-NEXT:    [[CVT4:%.*]] = fptoui double [[A4]] to i8
>> >>>> +; SSE-NEXT:    [[CVT5:%.*]] = fptoui double [[A5]] to i8
>> >>>> +; SSE-NEXT:    [[CVT6:%.*]] = fptoui double [[A6]] to i8
>> >>>> +; SSE-NEXT:    [[CVT7:%.*]] = fptoui double [[A7]] to i8
>> >>>> +; SSE-NEXT:    store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
>> >>>> +; SSE-NEXT:    store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
>> >>>> +; SSE-NEXT:    store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
>> >>>> +; SSE-NEXT:    store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
>> >>>> +; SSE-NEXT:    store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
>> >>>> +; SSE-NEXT:    store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
>> >>>> +; SSE-NEXT:    store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
>> >>>> +; SSE-NEXT:    store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
>> >>>> +; SSE-NEXT:    ret void
>> >>>> +;
>> >>>> +; AVX256NODQ-LABEL: @fptoui_8f64_8i8(
>> >>>> +; AVX256NODQ-NEXT:    [[A0:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
>> >>>> +; AVX256NODQ-NEXT:    [[A1:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
>> >>>> +; AVX256NODQ-NEXT:    [[A2:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 2), align 8
>> >>>> +; AVX256NODQ-NEXT:    [[A3:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 3), align 8
>> >>>> +; AVX256NODQ-NEXT:    [[A4:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 4), align 8
>> >>>> +; AVX256NODQ-NEXT:    [[A5:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 5), align 8
>> >>>> +; AVX256NODQ-NEXT:    [[A6:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 6), align 8
>> >>>> +; AVX256NODQ-NEXT:    [[A7:%.*]] = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 7), align 8
>> >>>> +; AVX256NODQ-NEXT:    [[CVT0:%.*]] = fptoui double [[A0]] to i8
>> >>>> +; AVX256NODQ-NEXT:    [[CVT1:%.*]] = fptoui double [[A1]] to i8
>> >>>> +; AVX256NODQ-NEXT:    [[CVT2:%.*]] = fptoui double [[A2]] to i8
>> >>>> +; AVX256NODQ-NEXT:    [[CVT3:%.*]] = fptoui double [[A3]] to i8
>> >>>> +; AVX256NODQ-NEXT:    [[CVT4:%.*]] = fptoui double [[A4]] to i8
>> >>>> +; AVX256NODQ-NEXT:    [[CVT5:%.*]] = fptoui double [[A5]] to i8
>> >>>> +; AVX256NODQ-NEXT:    [[CVT6:%.*]] = fptoui double [[A6]] to i8
>> >>>> +; AVX256NODQ-NEXT:    [[CVT7:%.*]] = fptoui double [[A7]] to i8
>> >>>> +; AVX256NODQ-NEXT:    store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
>> >>>> +; AVX256NODQ-NEXT:    store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
>> >>>> +; AVX256NODQ-NEXT:    store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
>> >>>> +; AVX256NODQ-NEXT:    store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
>> >>>> +; AVX256NODQ-NEXT:    store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
>> >>>> +; AVX256NODQ-NEXT:    store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
>> >>>> +; AVX256NODQ-NEXT:    store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
>> >>>> +; AVX256NODQ-NEXT:    store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
>> >>>> +; AVX256NODQ-NEXT:    ret void
>> >>>> +;
>> >>>> +; AVX512-LABEL: @fptoui_8f64_8i8(
>> >>>> +; AVX512-NEXT:    [[TMP1:%.*]] = load <8 x double>, <8 x double>* bitcast ([8 x double]* @src64 to <8 x double>*), align 8
>> >>>> +; AVX512-NEXT:    [[TMP2:%.*]] = fptoui <8 x double> [[TMP1]] to <8 x i8>
>> >>>> +; AVX512-NEXT:    store <8 x i8> [[TMP2]], <8 x i8>* bitcast ([64 x i8]* @dst8 to <8 x i8>*), align 1
>> >>>> +; AVX512-NEXT:    ret void
>> >>>> +;
>> >>>> +; AVX256DQ-LABEL: @fptoui_8f64_8i8(
>> >>>> +; AVX256DQ-NEXT:    [[TMP1:%.*]] = load <8 x double>, <8 x double>* bitcast ([8 x double]* @src64 to <8 x double>*), align 8
>> >>>> +; AVX256DQ-NEXT:    [[TMP2:%.*]] = fptoui <8 x double> [[TMP1]] to <8 x i8>
>> >>>> +; AVX256DQ-NEXT:    store <8 x i8> [[TMP2]], <8 x i8>* bitcast ([64 x i8]* @dst8 to <8 x i8>*), align 1
>> >>>> +; AVX256DQ-NEXT:    ret void
>> >>>>  ;
>> >>>>    %a0 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 0), align 8
>> >>>>    %a1 = load double, double* getelementptr inbounds ([8 x double], [8 x double]* @src64, i32 0, i64 1), align 8
>> >>>> @@ -616,32 +655,38 @@ define void @fptoui_8f32_8i16() #0 {
>> >>>>  }
>> >>>>
>> >>>>  define void @fptoui_8f32_8i8() #0 {
>> >>>> -; CHECK-LABEL: @fptoui_8f32_8i8(
>> >>>> -; CHECK-NEXT:    [[A0:%.*]] = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4
>> >>>> -; CHECK-NEXT:    [[A1:%.*]] = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4
>> >>>> -; CHECK-NEXT:    [[A2:%.*]] = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4
>> >>>> -; CHECK-NEXT:    [[A3:%.*]] = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4
>> >>>> -; CHECK-NEXT:    [[A4:%.*]] = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4
>> >>>> -; CHECK-NEXT:    [[A5:%.*]] = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4
>> >>>> -; CHECK-NEXT:    [[A6:%.*]] = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4
>> >>>> -; CHECK-NEXT:    [[A7:%.*]] = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 7), align 4
>> >>>> -; CHECK-NEXT:    [[CVT0:%.*]] = fptoui float [[A0]] to i8
>> >>>> -; CHECK-NEXT:    [[CVT1:%.*]] = fptoui float [[A1]] to i8
>> >>>> -; CHECK-NEXT:    [[CVT2:%.*]] = fptoui float [[A2]] to i8
>> >>>> -; CHECK-NEXT:    [[CVT3:%.*]] = fptoui float [[A3]] to i8
>> >>>> -; CHECK-NEXT:    [[CVT4:%.*]] = fptoui float [[A4]] to i8
>> >>>> -; CHECK-NEXT:    [[CVT5:%.*]] = fptoui float [[A5]] to i8
>> >>>> -; CHECK-NEXT:    [[CVT6:%.*]] = fptoui float [[A6]] to i8
>> >>>> -; CHECK-NEXT:    [[CVT7:%.*]] = fptoui float [[A7]] to i8
>> >>>> -; CHECK-NEXT:    store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
>> >>>> -; CHECK-NEXT:    store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
>> >>>> -; CHECK-NEXT:    store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
>> >>>> -; CHECK-NEXT:    store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
>> >>>> -; CHECK-NEXT:    store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
>> >>>> -; CHECK-NEXT:    store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
>> >>>> -; CHECK-NEXT:    store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
>> >>>> -; CHECK-NEXT:    store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
>> >>>> -; CHECK-NEXT:    ret void
>> >>>> +; SSE-LABEL: @fptoui_8f32_8i8(
>> >>>> +; SSE-NEXT:    [[A0:%.*]] = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4
>> >>>> +; SSE-NEXT:    [[A1:%.*]] = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4
>> >>>> +; SSE-NEXT:    [[A2:%.*]] = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 2), align 4
>> >>>> +; SSE-NEXT:    [[A3:%.*]] = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 3), align 4
>> >>>> +; SSE-NEXT:    [[A4:%.*]] = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 4), align 4
>> >>>> +; SSE-NEXT:    [[A5:%.*]] = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 5), align 4
>> >>>> +; SSE-NEXT:    [[A6:%.*]] = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 6), align 4
>> >>>> +; SSE-NEXT:    [[A7:%.*]] = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 7), align 4
>> >>>> +; SSE-NEXT:    [[CVT0:%.*]] = fptoui float [[A0]] to i8
>> >>>> +; SSE-NEXT:    [[CVT1:%.*]] = fptoui float [[A1]] to i8
>> >>>> +; SSE-NEXT:    [[CVT2:%.*]] = fptoui float [[A2]] to i8
>> >>>> +; SSE-NEXT:    [[CVT3:%.*]] = fptoui float [[A3]] to i8
>> >>>> +; SSE-NEXT:    [[CVT4:%.*]] = fptoui float [[A4]] to i8
>> >>>> +; SSE-NEXT:    [[CVT5:%.*]] = fptoui float [[A5]] to i8
>> >>>> +; SSE-NEXT:    [[CVT6:%.*]] = fptoui float [[A6]] to i8
>> >>>> +; SSE-NEXT:    [[CVT7:%.*]] = fptoui float [[A7]] to i8
>> >>>> +; SSE-NEXT:    store i8 [[CVT0]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 0), align 1
>> >>>> +; SSE-NEXT:    store i8 [[CVT1]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 1), align 1
>> >>>> +; SSE-NEXT:    store i8 [[CVT2]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 2), align 1
>> >>>> +; SSE-NEXT:    store i8 [[CVT3]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 3), align 1
>> >>>> +; SSE-NEXT:    store i8 [[CVT4]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 4), align 1
>> >>>> +; SSE-NEXT:    store i8 [[CVT5]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 5), align 1
>> >>>> +; SSE-NEXT:    store i8 [[CVT6]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 6), align 1
>> >>>> +; SSE-NEXT:    store i8 [[CVT7]], i8* getelementptr inbounds ([64 x i8], [64 x i8]* @dst8, i32 0, i64 7), align 1
>> >>>> +; SSE-NEXT:    ret void
>> >>>> +;
>> >>>> +; AVX-LABEL: @fptoui_8f32_8i8(
>> >>>> +; AVX-NEXT:    [[TMP1:%.*]] = load <8 x float>, <8 x float>* bitcast ([16 x float]* @src32 to <8 x float>*), align 4
>> >>>> +; AVX-NEXT:    [[TMP2:%.*]] = fptoui <8 x float> [[TMP1]] to <8 x i8>
>> >>>> +; AVX-NEXT:    store <8 x i8> [[TMP2]], <8 x i8>* bitcast ([64 x i8]* @dst8 to <8 x i8>*), align 1
>> >>>> +; AVX-NEXT:    ret void
>> >>>>  ;
>> >>>>    %a0 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 0), align 4
>> >>>>    %a1 = load float, float* getelementptr inbounds ([16 x float], [16 x float]* @src32, i32 0, i64 1), align 4
>> >>>>
>> >>>> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/insertvalue.ll
>> >>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/insertvalue.ll?rev=353923&r1=353922&r2=353923&view=diff
>> >>>> ==============================================================================
>> >>>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/insertvalue.ll (original)
>> >>>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/insertvalue.ll Wed Feb 13 00:26:43 2019
>> >>>> @@ -216,24 +216,21 @@ top:
>> >>>>  define void @julia_load_array_of_i16([4 x i16]* %a, [4 x i16]* %b, [4 x i16]* %c) {
>> >>>>  ; CHECK-LABEL: @julia_load_array_of_i16(
>> >>>>  ; CHECK-NEXT:  top:
>> >>>> -; CHECK-NEXT:    [[A_ARR:%.*]] = load [4 x i16], [4 x i16]* [[A:%.*]], align 4
>> >>>> -; CHECK-NEXT:    [[A0:%.*]] = extractvalue [4 x i16] [[A_ARR]], 0
>> >>>> -; CHECK-NEXT:    [[A2:%.*]] = extractvalue [4 x i16] [[A_ARR]], 2
>> >>>> -; CHECK-NEXT:    [[A1:%.*]] = extractvalue [4 x i16] [[A_ARR]], 1
>> >>>> -; CHECK-NEXT:    [[B_ARR:%.*]] = load [4 x i16], [4 x i16]* [[B:%.*]], align 4
>> >>>> -; CHECK-NEXT:    [[B0:%.*]] = extractvalue [4 x i16] [[B_ARR]], 0
>> >>>> -; CHECK-NEXT:    [[B2:%.*]] = extractvalue [4 x i16] [[B_ARR]], 2
>> >>>> -; CHECK-NEXT:    [[B1:%.*]] = extractvalue [4 x i16] [[B_ARR]], 1
>> >>>> -; CHECK-NEXT:    [[A3:%.*]] = extractvalue [4 x i16] [[A_ARR]], 3
>> >>>> -; CHECK-NEXT:    [[C1:%.*]] = sub i16 [[A1]], [[B1]]
>> >>>> -; CHECK-NEXT:    [[B3:%.*]] = extractvalue [4 x i16] [[B_ARR]], 3
>> >>>> -; CHECK-NEXT:    [[C0:%.*]] = sub i16 [[A0]], [[B0]]
>> >>>> -; CHECK-NEXT:    [[C2:%.*]] = sub i16 [[A2]], [[B2]]
>> >>>> -; CHECK-NEXT:    [[C_ARR0:%.*]] = insertvalue [4 x i16] undef, i16 [[C0]], 0
>> >>>> -; CHECK-NEXT:    [[C_ARR1:%.*]] = insertvalue [4 x i16] [[C_ARR0]], i16 [[C1]], 1
>> >>>> -; CHECK-NEXT:    [[C3:%.*]] = sub i16 [[A3]], [[B3]]
>> >>>> -; CHECK-NEXT:    [[C_ARR2:%.*]] = insertvalue [4 x i16] [[C_ARR1]], i16 [[C2]], 2
>> >>>> -; CHECK-NEXT:    [[C_ARR3:%.*]] = insertvalue [4 x i16] [[C_ARR2]], i16 [[C3]], 3
>> >>>> +; CHECK-NEXT:    [[TMP0:%.*]] = bitcast [4 x i16]* [[A:%.*]] to <4 x i16>*
>> >>>> +; CHECK-NEXT:    [[TMP1:%.*]] = load <4 x i16>, <4 x i16>* [[TMP0]], align 4
>> >>>> +; CHECK-NEXT:    [[A_ARR:%.*]] = load [4 x i16], [4 x i16]* [[A]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP2:%.*]] = bitcast [4 x i16]* [[B:%.*]] to <4 x i16>*
>> >>>> +; CHECK-NEXT:    [[TMP3:%.*]] = load <4 x i16>, <4 x i16>* [[TMP2]], align 4
>> >>>> +; CHECK-NEXT:    [[B_ARR:%.*]] = load [4 x i16], [4 x i16]* [[B]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP4:%.*]] = sub <4 x i16> [[TMP1]], [[TMP3]]
>> >>>> +; CHECK-NEXT:    [[TMP5:%.*]] = extractelement <4 x i16> [[TMP4]], i32 0
>> >>>> +; CHECK-NEXT:    [[C_ARR0:%.*]] = insertvalue [4 x i16] undef, i16 [[TMP5]], 0
>> >>>> +; CHECK-NEXT:    [[TMP6:%.*]] = extractelement <4 x i16> [[TMP4]], i32 1
>> >>>> +; CHECK-NEXT:    [[C_ARR1:%.*]] = insertvalue [4 x i16] [[C_ARR0]], i16 [[TMP6]], 1
>> >>>> +; CHECK-NEXT:    [[TMP7:%.*]] = extractelement <4 x i16> [[TMP4]], i32 2
>> >>>> +; CHECK-NEXT:    [[C_ARR2:%.*]] = insertvalue [4 x i16] [[C_ARR1]], i16 [[TMP7]], 2
>> >>>> +; CHECK-NEXT:    [[TMP8:%.*]] = extractelement <4 x i16> [[TMP4]], i32 3
>> >>>> +; CHECK-NEXT:    [[C_ARR3:%.*]] = insertvalue [4 x i16] [[C_ARR2]], i16 [[TMP8]], 3
>> >>>>  ; CHECK-NEXT:    store [4 x i16] [[C_ARR3]], [4 x i16]* [[C:%.*]], align 4
>> >>>>  ; CHECK-NEXT:    ret void
>> >>>>  ;
>> >>>>
>> >>>> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/phi.ll
>> >>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/phi.ll?rev=353923&r1=353922&r2=353923&view=diff
>> >>>> ==============================================================================
>> >>>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/phi.ll (original)
>> >>>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/phi.ll Wed Feb 13 00:26:43 2019
>> >>>> @@ -146,44 +146,49 @@ define float @foo3(float* nocapture read
>> >>>>  ; CHECK-NEXT:    [[TMP2:%.*]] = load <4 x float>, <4 x float>* [[TMP1]], align 4
>> >>>>  ; CHECK-NEXT:    [[REORDER_SHUFFLE:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> undef, <4 x i32> <i32 3, i32 2, i32 1, i32 0>
>> >>>>  ; CHECK-NEXT:    [[TMP3:%.*]] = extractelement <4 x float> [[REORDER_SHUFFLE]], i32 3
>> >>>> +; CHECK-NEXT:    [[TMP4:%.*]] = insertelement <2 x float> undef, float [[TMP0]], i32 0
>> >>>> +; CHECK-NEXT:    [[TMP5:%.*]] = insertelement <2 x float> [[TMP4]], float [[TMP3]], i32 1
>> >>>>  ; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
>> >>>>  ; CHECK:       for.body:
>> >>>>  ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY]] ]
>> >>>> -; CHECK-NEXT:    [[R_052:%.*]] = phi float [ [[TMP0]], [[ENTRY]] ], [ [[ADD6:%.*]], [[FOR_BODY]] ]
>> >>>> -; CHECK-NEXT:    [[TMP4:%.*]] = phi float [ [[TMP3]], [[ENTRY]] ], [ [[TMP12:%.*]], [[FOR_BODY]] ]
>> >>>> -; CHECK-NEXT:    [[TMP5:%.*]] = phi float [ [[TMP0]], [[ENTRY]] ], [ [[TMP14:%.*]], [[FOR_BODY]] ]
>> >>>> -; CHECK-NEXT:    [[TMP6:%.*]] = phi <4 x float> [ [[REORDER_SHUFFLE]], [[ENTRY]] ], [ [[TMP19:%.*]], [[FOR_BODY]] ]
>> >>>> -; CHECK-NEXT:    [[MUL:%.*]] = fmul float [[TMP5]], 7.000000e+00
>> >>>> -; CHECK-NEXT:    [[ADD6]] = fadd float [[R_052]], [[MUL]]
>> >>>> -; CHECK-NEXT:    [[TMP7:%.*]] = add nsw i64 [[INDVARS_IV]], 2
>> >>>> -; CHECK-NEXT:    [[ARRAYIDX14:%.*]] = getelementptr inbounds float, float* [[A]], i64 [[TMP7]]
>> >>>> -; CHECK-NEXT:    [[TMP8:%.*]] = load float, float* [[ARRAYIDX14]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP6:%.*]] = phi float [ [[TMP0]], [[ENTRY]] ], [ [[TMP18:%.*]], [[FOR_BODY]] ]
>> >>>> +; CHECK-NEXT:    [[TMP7:%.*]] = phi <4 x float> [ [[REORDER_SHUFFLE]], [[ENTRY]] ], [ [[TMP23:%.*]], [[FOR_BODY]] ]
>> >>>> +; CHECK-NEXT:    [[TMP8:%.*]] = phi <2 x float> [ [[TMP5]], [[ENTRY]] ], [ [[TMP26:%.*]], [[FOR_BODY]] ]
>> >>>> +; CHECK-NEXT:    [[MUL:%.*]] = fmul float [[TMP6]], 7.000000e+00
>> >>>> +; CHECK-NEXT:    [[TMP9:%.*]] = extractelement <2 x float> [[TMP8]], i32 0
>> >>>> +; CHECK-NEXT:    [[ADD6:%.*]] = fadd float [[TMP9]], [[MUL]]
>> >>>> +; CHECK-NEXT:    [[TMP10:%.*]] = add nsw i64 [[INDVARS_IV]], 2
>> >>>> +; CHECK-NEXT:    [[ARRAYIDX14:%.*]] = getelementptr inbounds float, float* [[A]], i64 [[TMP10]]
>> >>>> +; CHECK-NEXT:    [[TMP11:%.*]] = load float, float* [[ARRAYIDX14]], align 4
>> >>>>  ; CHECK-NEXT:    [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 3
>> >>>>  ; CHECK-NEXT:    [[ARRAYIDX19:%.*]] = getelementptr inbounds float, float* [[A]], i64 [[INDVARS_IV_NEXT]]
>> >>>> -; CHECK-NEXT:    [[TMP9:%.*]] = bitcast float* [[ARRAYIDX19]] to <2 x float>*
>> >>>> -; CHECK-NEXT:    [[TMP10:%.*]] = load <2 x float>, <2 x float>* [[TMP9]], align 4
>> >>>> -; CHECK-NEXT:    [[REORDER_SHUFFLE1:%.*]] = shufflevector <2 x float> [[TMP10]], <2 x float> undef, <2 x i32> <i32 1, i32 0>
>> >>>> -; CHECK-NEXT:    [[TMP11:%.*]] = insertelement <4 x float> <float 1.100000e+01, float 1.000000e+01, float 9.000000e+00, float undef>, float [[TMP4]], i32 3
>> >>>> -; CHECK-NEXT:    [[TMP12]] = extractelement <2 x float> [[REORDER_SHUFFLE1]], i32 0
>> >>>> -; CHECK-NEXT:    [[TMP13:%.*]] = insertelement <4 x float> undef, float [[TMP12]], i32 0
>> >>>> -; CHECK-NEXT:    [[TMP14]] = extractelement <2 x float> [[REORDER_SHUFFLE1]], i32 1
>> >>>> -; CHECK-NEXT:    [[TMP15:%.*]] = insertelement <4 x float> [[TMP13]], float [[TMP14]], i32 1
>> >>>> -; CHECK-NEXT:    [[TMP16:%.*]] = insertelement <4 x float> [[TMP15]], float [[TMP8]], i32 2
>> >>>> -; CHECK-NEXT:    [[TMP17:%.*]] = insertelement <4 x float> [[TMP16]], float 8.000000e+00, i32 3
>> >>>> -; CHECK-NEXT:    [[TMP18:%.*]] = fmul <4 x float> [[TMP11]], [[TMP17]]
>> >>>> -; CHECK-NEXT:    [[TMP19]] = fadd <4 x float> [[TMP6]], [[TMP18]]
>> >>>> -; CHECK-NEXT:    [[TMP20:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
>> >>>> -; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32 [[TMP20]], 121
>> >>>> +; CHECK-NEXT:    [[TMP12:%.*]] = bitcast float* [[ARRAYIDX19]] to <2 x float>*
>> >>>> +; CHECK-NEXT:    [[TMP13:%.*]] = load <2 x float>, <2 x float>* [[TMP12]], align 4
>> >>>> +; CHECK-NEXT:    [[REORDER_SHUFFLE1:%.*]] = shufflevector <2 x float> [[TMP13]], <2 x float> undef, <2 x i32> <i32 1, i32 0>
>> >>>> +; CHECK-NEXT:    [[TMP14:%.*]] = extractelement <2 x float> [[TMP8]], i32 1
>> >>>> +; CHECK-NEXT:    [[TMP15:%.*]] = insertelement <4 x float> <float 1.100000e+01, float 1.000000e+01, float 9.000000e+00, float undef>, float [[TMP14]], i32 3
>> >>>> +; CHECK-NEXT:    [[TMP16:%.*]] = extractelement <2 x float> [[REORDER_SHUFFLE1]], i32 0
>> >>>> +; CHECK-NEXT:    [[TMP17:%.*]] = insertelement <4 x float> undef, float [[TMP16]], i32 0
>> >>>> +; CHECK-NEXT:    [[TMP18]] = extractelement <2 x float> [[REORDER_SHUFFLE1]], i32 1
>> >>>> +; CHECK-NEXT:    [[TMP19:%.*]] = insertelement <4 x float> [[TMP17]], float [[TMP18]], i32 1
>> >>>> +; CHECK-NEXT:    [[TMP20:%.*]] = insertelement <4 x float> [[TMP19]], float [[TMP11]], i32 2
>> >>>> +; CHECK-NEXT:    [[TMP21:%.*]] = insertelement <4 x float> [[TMP20]], float 8.000000e+00, i32 3
>> >>>> +; CHECK-NEXT:    [[TMP22:%.*]] = fmul <4 x float> [[TMP15]], [[TMP21]]
>> >>>> +; CHECK-NEXT:    [[TMP23]] = fadd <4 x float> [[TMP7]], [[TMP22]]
>> >>>> +; CHECK-NEXT:    [[TMP24:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
>> >>>> +; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32 [[TMP24]], 121
>> >>>> +; CHECK-NEXT:    [[TMP25:%.*]] = insertelement <2 x float> undef, float [[ADD6]], i32 0
>> >>>> +; CHECK-NEXT:    [[TMP26]] = insertelement <2 x float> [[TMP25]], float [[TMP16]], i32 1
>> >>>>  ; CHECK-NEXT:    br i1 [[CMP]], label [[FOR_BODY]], label [[FOR_END:%.*]]
>> >>>>  ; CHECK:       for.end:
>> >>>> -; CHECK-NEXT:    [[TMP21:%.*]] = extractelement <4 x float> [[TMP19]], i32 3
>> >>>> -; CHECK-NEXT:    [[ADD28:%.*]] = fadd float [[ADD6]], [[TMP21]]
>> >>>> -; CHECK-NEXT:    [[TMP22:%.*]] = extractelement <4 x float> [[TMP19]], i32 2
>> >>>> -; CHECK-NEXT:    [[ADD29:%.*]] = fadd float [[ADD28]], [[TMP22]]
>> >>>> -; CHECK-NEXT:    [[TMP23:%.*]] = extractelement <4 x float> [[TMP19]], i32 1
>> >>>> -; CHECK-NEXT:    [[ADD30:%.*]] = fadd float [[ADD29]], [[TMP23]]
>> >>>> -; CHECK-NEXT:    [[TMP24:%.*]] = extractelement <4 x float> [[TMP19]], i32 0
>> >>>> -; CHECK-NEXT:    [[ADD31:%.*]] = fadd float [[ADD30]], [[TMP24]]
>> >>>> +; CHECK-NEXT:    [[TMP27:%.*]] = extractelement <4 x float> [[TMP23]], i32 3
>> >>>> +; CHECK-NEXT:    [[ADD28:%.*]] = fadd float [[ADD6]], [[TMP27]]
>> >>>> +; CHECK-NEXT:    [[TMP28:%.*]] = extractelement <4 x float> [[TMP23]], i32 2
>> >>>> +; CHECK-NEXT:    [[ADD29:%.*]] = fadd float [[ADD28]], [[TMP28]]
>> >>>> +; CHECK-NEXT:    [[TMP29:%.*]] = extractelement <4 x float> [[TMP23]], i32 1
>> >>>> +; CHECK-NEXT:    [[ADD30:%.*]] = fadd float [[ADD29]], [[TMP29]]
>> >>>> +; CHECK-NEXT:    [[TMP30:%.*]] = extractelement <4 x float> [[TMP23]], i32 0
>> >>>> +; CHECK-NEXT:    [[ADD31:%.*]] = fadd float [[ADD30]], [[TMP30]]
>> >>>>  ; CHECK-NEXT:    ret float [[ADD31]]
>> >>>>  ;
>> >>>>  entry:
>> >>>>
>> >>>> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/remark_not_all_parts.ll
>> >>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/remark_not_all_parts.ll?rev=353923&r1=353922&r2=353923&view=diff
>> >>>> ==============================================================================
>> >>>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/remark_not_all_parts.ll (original)
>> >>>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/remark_not_all_parts.ll Wed Feb 13 00:26:43 2019
>> >>>> @@ -1,5 +1,5 @@
>> >>>>  ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
>> >>>> -; RUN: opt -S -mtriple=x86_64-pc-linux-gnu -mcpu=generic -slp-vectorizer -pass-remarks-output=%t < %s | FileCheck %s
>> >>>> +; RUN: opt -S -mtriple=x86_64-pc-linux-gnu -mcpu=generic -slp-vectorizer -slp-min-reg-size=128 -pass-remarks-output=%t < %s | FileCheck %s
>> >>>>  ; RUN: FileCheck --input-file=%t --check-prefix=YAML %s
>> >>>>
>> >>>>  define i32 @foo(i32* nocapture readonly %diff) #0 {
>> >>>>
>> >>>> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/reorder_phi.ll
>> >>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/reorder_phi.ll?rev=353923&r1=353922&r2=353923&view=diff
>> >>>> ==============================================================================
>> >>>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/reorder_phi.ll (original)
>> >>>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/reorder_phi.ll Wed Feb 13 00:26:43 2019
>> >>>> @@ -9,33 +9,38 @@ define  void @foo (%struct.complex* %A,
>> >>>>  ; CHECK-NEXT:    [[TMP0:%.*]] = add i64 256, 0
>> >>>>  ; CHECK-NEXT:    br label [[LOOP:%.*]]
>> >>>>  ; CHECK:       loop:
>> >>>> -; CHECK-NEXT:    [[TMP1:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[TMP20:%.*]], [[LOOP]] ]
>> >>>> -; CHECK-NEXT:    [[TMP2:%.*]] = phi float [ 0.000000e+00, [[ENTRY]] ], [ [[TMP19:%.*]], [[LOOP]] ]
>> >>>> -; CHECK-NEXT:    [[TMP3:%.*]] = phi float [ 0.000000e+00, [[ENTRY]] ], [ [[TMP18:%.*]], [[LOOP]] ]
>> >>>> -; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds [[STRUCT_COMPLEX:%.*]], %struct.complex* [[A:%.*]], i64 [[TMP1]], i32 0
>> >>>> -; CHECK-NEXT:    [[TMP5:%.*]] = load float, float* [[TMP4]], align 4
>> >>>> -; CHECK-NEXT:    [[TMP6:%.*]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex* [[A]], i64 [[TMP1]], i32 1
>> >>>> -; CHECK-NEXT:    [[TMP7:%.*]] = load float, float* [[TMP6]], align 4
>> >>>> -; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex* [[B:%.*]], i64 [[TMP1]], i32 0
>> >>>> -; CHECK-NEXT:    [[TMP9:%.*]] = load float, float* [[TMP8]], align 4
>> >>>> -; CHECK-NEXT:    [[TMP10:%.*]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex* [[B]], i64 [[TMP1]], i32 1
>> >>>> -; CHECK-NEXT:    [[TMP11:%.*]] = load float, float* [[TMP10]], align 4
>> >>>> -; CHECK-NEXT:    [[TMP12:%.*]] = fmul float [[TMP5]], [[TMP9]]
>> >>>> -; CHECK-NEXT:    [[TMP13:%.*]] = fmul float [[TMP7]], [[TMP11]]
>> >>>> -; CHECK-NEXT:    [[TMP14:%.*]] = fsub float [[TMP12]], [[TMP13]]
>> >>>> -; CHECK-NEXT:    [[TMP15:%.*]] = fmul float [[TMP7]], [[TMP9]]
>> >>>> -; CHECK-NEXT:    [[TMP16:%.*]] = fmul float [[TMP5]], [[TMP11]]
>> >>>> -; CHECK-NEXT:    [[TMP17:%.*]] = fadd float [[TMP15]], [[TMP16]]
>> >>>> -; CHECK-NEXT:    [[TMP18]] = fadd float [[TMP3]], [[TMP14]]
>> >>>> -; CHECK-NEXT:    [[TMP19]] = fadd float [[TMP2]], [[TMP17]]
>> >>>> -; CHECK-NEXT:    [[TMP20]] = add nuw nsw i64 [[TMP1]], 1
>> >>>> -; CHECK-NEXT:    [[TMP21:%.*]] = icmp eq i64 [[TMP20]], [[TMP0]]
>> >>>> -; CHECK-NEXT:    br i1 [[TMP21]], label [[EXIT:%.*]], label [[LOOP]]
>> >>>> +; CHECK-NEXT:    [[TMP1:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[TMP25:%.*]], [[LOOP]] ]
>> >>>> +; CHECK-NEXT:    [[TMP2:%.*]] = phi <2 x float> [ zeroinitializer, [[ENTRY]] ], [ [[TMP24:%.*]], [[LOOP]] ]
>> >>>> +; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds [[STRUCT_COMPLEX:%.*]], %struct.complex* [[A:%.*]], i64 [[TMP1]], i32 0
>> >>>> +; CHECK-NEXT:    [[TMP4:%.*]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex* [[A]], i64 [[TMP1]], i32 1
>> >>>> +; CHECK-NEXT:    [[TMP5:%.*]] = bitcast float* [[TMP3]] to <2 x float>*
>> >>>> +; CHECK-NEXT:    [[TMP6:%.*]] = load <2 x float>, <2 x float>* [[TMP5]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP7:%.*]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex* [[B:%.*]], i64 [[TMP1]], i32 0
>> >>>> +; CHECK-NEXT:    [[TMP8:%.*]] = load float, float* [[TMP7]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP9:%.*]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex* [[B]], i64 [[TMP1]], i32 1
>> >>>> +; CHECK-NEXT:    [[TMP10:%.*]] = load float, float* [[TMP9]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP11:%.*]] = insertelement <2 x float> undef, float [[TMP8]], i32 0
>> >>>> +; CHECK-NEXT:    [[TMP12:%.*]] = insertelement <2 x float> [[TMP11]], float [[TMP8]], i32 1
>> >>>> +; CHECK-NEXT:    [[TMP13:%.*]] = fmul <2 x float> [[TMP6]], [[TMP12]]
>> >>>> +; CHECK-NEXT:    [[TMP14:%.*]] = extractelement <2 x float> [[TMP6]], i32 1
>> >>>> +; CHECK-NEXT:    [[TMP15:%.*]] = insertelement <2 x float> undef, float [[TMP14]], i32 0
>> >>>> +; CHECK-NEXT:    [[TMP16:%.*]] = extractelement <2 x float> [[TMP6]], i32 0
>> >>>> +; CHECK-NEXT:    [[TMP17:%.*]] = insertelement <2 x float> [[TMP15]], float [[TMP16]], i32 1
>> >>>> +; CHECK-NEXT:    [[TMP18:%.*]] = insertelement <2 x float> undef, float [[TMP10]], i32 0
>> >>>> +; CHECK-NEXT:    [[TMP19:%.*]] = insertelement <2 x float> [[TMP18]], float [[TMP10]], i32 1
>> >>>> +; CHECK-NEXT:    [[TMP20:%.*]] = fmul <2 x float> [[TMP17]], [[TMP19]]
>> >>>> +; CHECK-NEXT:    [[TMP21:%.*]] = fsub <2 x float> [[TMP13]], [[TMP20]]
>> >>>> +; CHECK-NEXT:    [[TMP22:%.*]] = fadd <2 x float> [[TMP13]], [[TMP20]]
>> >>>> +; CHECK-NEXT:    [[TMP23:%.*]] = shufflevector <2 x float> [[TMP21]], <2 x float> [[TMP22]], <2 x i32> <i32 0, i32 3>
>> >>>> +; CHECK-NEXT:    [[TMP24]] = fadd <2 x float> [[TMP2]], [[TMP23]]
>> >>>> +; CHECK-NEXT:    [[TMP25]] = add nuw nsw i64 [[TMP1]], 1
>> >>>> +; CHECK-NEXT:    [[TMP26:%.*]] = icmp eq i64 [[TMP25]], [[TMP0]]
>> >>>> +; CHECK-NEXT:    br i1 [[TMP26]], label [[EXIT:%.*]], label [[LOOP]]
>> >>>>  ; CHECK:       exit:
>> >>>> -; CHECK-NEXT:    [[TMP22:%.*]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex* [[RESULT:%.*]], i32 0, i32 0
>> >>>> -; CHECK-NEXT:    store float [[TMP18]], float* [[TMP22]], align 4
>> >>>> -; CHECK-NEXT:    [[TMP23:%.*]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex* [[RESULT]], i32 0, i32 1
>> >>>> -; CHECK-NEXT:    store float [[TMP19]], float* [[TMP23]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP27:%.*]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex* [[RESULT:%.*]], i32 0, i32 0
>> >>>> +; CHECK-NEXT:    [[TMP28:%.*]] = getelementptr inbounds [[STRUCT_COMPLEX]], %struct.complex* [[RESULT]], i32 0, i32 1
>> >>>> +; CHECK-NEXT:    [[TMP29:%.*]] = bitcast float* [[TMP27]] to <2 x float>*
>> >>>> +; CHECK-NEXT:    store <2 x float> [[TMP24]], <2 x float>* [[TMP29]], align 4
>> >>>>  ; CHECK-NEXT:    ret void
>> >>>>  ;
>> >>>>  entry:
>> >>>>
>> >>>> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/resched.ll
>> >>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/resched.ll?rev=353923&r1=353922&r2=353923&view=diff
>> >>>> ==============================================================================
>> >>>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/resched.ll (original)
>> >>>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/resched.ll Wed Feb 13 00:26:43 2019
>> >>>> @@ -38,44 +38,47 @@ define fastcc void @_ZN12_GLOBAL__N_127P
>> >>>>  ; CHECK-NEXT:    [[TMP13:%.*]] = insertelement <4 x i32> [[TMP12]], i32 [[CONV31_I]], i32 3
>> >>>>  ; CHECK-NEXT:    [[TMP14:%.*]] = lshr <4 x i32> [[TMP13]], <i32 9, i32 10, i32 11, i32 12>
>> >>>>  ; CHECK-NEXT:    [[ARRAYIDX_I_I7_12_I_I:%.*]] = getelementptr inbounds %"struct.std::array", %"struct.std::array"* undef, i64 0, i32 0, i64 12
>> >>>> -; CHECK-NEXT:    [[SHR_12_I_I:%.*]] = lshr i32 [[CONV31_I]], 13
>> >>>>  ; CHECK-NEXT:    [[ARRAYIDX_I_I7_13_I_I:%.*]] = getelementptr inbounds %"struct.std::array", %"struct.std::array"* undef, i64 0, i32 0, i64 13
>> >>>> -; CHECK-NEXT:    [[SHR_13_I_I:%.*]] = lshr i32 [[CONV31_I]], 14
>> >>>> +; CHECK-NEXT:    [[TMP15:%.*]] = insertelement <2 x i32> undef, i32 [[CONV31_I]], i32 0
>> >>>> +; CHECK-NEXT:    [[TMP16:%.*]] = insertelement <2 x i32> [[TMP15]], i32 [[CONV31_I]], i32 1
>> >>>> +; CHECK-NEXT:    [[TMP17:%.*]] = lshr <2 x i32> [[TMP16]], <i32 13, i32 14>
>> >>>>  ; CHECK-NEXT:    [[ARRAYIDX_I_I7_14_I_I:%.*]] = getelementptr inbounds %"struct.std::array", %"struct.std::array"* undef, i64 0, i32 0, i64 14
>> >>>>  ; CHECK-NEXT:    [[SHR_14_I_I:%.*]] = lshr i32 [[CONV31_I]], 15
>> >>>> -; CHECK-NEXT:    [[TMP15:%.*]] = insertelement <16 x i32> undef, i32 [[SUB_I]], i32 0
>> >>>> -; CHECK-NEXT:    [[TMP16:%.*]] = extractelement <8 x i32> [[TMP9]], i32 0
>> >>>> -; CHECK-NEXT:    [[TMP17:%.*]] = insertelement <16 x i32> [[TMP15]], i32 [[TMP16]], i32 1
>> >>>> -; CHECK-NEXT:    [[TMP18:%.*]] = extractelement <8 x i32> [[TMP9]], i32 1
>> >>>> -; CHECK-NEXT:    [[TMP19:%.*]] = insertelement <16 x i32> [[TMP17]], i32 [[TMP18]], i32 2
>> >>>> -; CHECK-NEXT:    [[TMP20:%.*]] = extractelement <8 x i32> [[TMP9]], i32 2
>> >>>> -; CHECK-NEXT:    [[TMP21:%.*]] = insertelement <16 x i32> [[TMP19]], i32 [[TMP20]], i32 3
>> >>>> -; CHECK-NEXT:    [[TMP22:%.*]] = extractelement <8 x i32> [[TMP9]], i32 3
>> >>>> -; CHECK-NEXT:    [[TMP23:%.*]] = insertelement <16 x i32> [[TMP21]], i32 [[TMP22]], i32 4
>> >>>> -; CHECK-NEXT:    [[TMP24:%.*]] = extractelement <8 x i32> [[TMP9]], i32 4
>> >>>> -; CHECK-NEXT:    [[TMP25:%.*]] = insertelement <16 x i32> [[TMP23]], i32 [[TMP24]], i32 5
>> >>>> -; CHECK-NEXT:    [[TMP26:%.*]] = extractelement <8 x i32> [[TMP9]], i32 5
>> >>>> -; CHECK-NEXT:    [[TMP27:%.*]] = insertelement <16 x i32> [[TMP25]], i32 [[TMP26]], i32 6
>> >>>> -; CHECK-NEXT:    [[TMP28:%.*]] = extractelement <8 x i32> [[TMP9]], i32 6
>> >>>> -; CHECK-NEXT:    [[TMP29:%.*]] = insertelement <16 x i32> [[TMP27]], i32 [[TMP28]], i32 7
>> >>>> -; CHECK-NEXT:    [[TMP30:%.*]] = extractelement <8 x i32> [[TMP9]], i32 7
>> >>>> -; CHECK-NEXT:    [[TMP31:%.*]] = insertelement <16 x i32> [[TMP29]], i32 [[TMP30]], i32 8
>> >>>> -; CHECK-NEXT:    [[TMP32:%.*]] = extractelement <4 x i32> [[TMP14]], i32 0
>> >>>> -; CHECK-NEXT:    [[TMP33:%.*]] = insertelement <16 x i32> [[TMP31]], i32 [[TMP32]], i32 9
>> >>>> -; CHECK-NEXT:    [[TMP34:%.*]] = extractelement <4 x i32> [[TMP14]], i32 1
>> >>>> -; CHECK-NEXT:    [[TMP35:%.*]] = insertelement <16 x i32> [[TMP33]], i32 [[TMP34]], i32 10
>> >>>> -; CHECK-NEXT:    [[TMP36:%.*]] = extractelement <4 x i32> [[TMP14]], i32 2
>> >>>> -; CHECK-NEXT:    [[TMP37:%.*]] = insertelement <16 x i32> [[TMP35]], i32 [[TMP36]], i32 11
>> >>>> -; CHECK-NEXT:    [[TMP38:%.*]] = extractelement <4 x i32> [[TMP14]], i32 3
>> >>>> -; CHECK-NEXT:    [[TMP39:%.*]] = insertelement <16 x i32> [[TMP37]], i32 [[TMP38]], i32 12
>> >>>> -; CHECK-NEXT:    [[TMP40:%.*]] = insertelement <16 x i32> [[TMP39]], i32 [[SHR_12_I_I]], i32 13
>> >>>> -; CHECK-NEXT:    [[TMP41:%.*]] = insertelement <16 x i32> [[TMP40]], i32 [[SHR_13_I_I]], i32 14
>> >>>> -; CHECK-NEXT:    [[TMP42:%.*]] = insertelement <16 x i32> [[TMP41]], i32 [[SHR_14_I_I]], i32 15
>> >>>> -; CHECK-NEXT:    [[TMP43:%.*]] = trunc <16 x i32> [[TMP42]] to <16 x i8>
>> >>>> -; CHECK-NEXT:    [[TMP44:%.*]] = and <16 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>, [[TMP43]]
>> >>>> +; CHECK-NEXT:    [[TMP18:%.*]] = insertelement <16 x i32> undef, i32 [[SUB_I]], i32 0
>> >>>> +; CHECK-NEXT:    [[TMP19:%.*]] = extractelement <8 x i32> [[TMP9]], i32 0
>> >>>> +; CHECK-NEXT:    [[TMP20:%.*]] = insertelement <16 x i32> [[TMP18]], i32 [[TMP19]], i32 1
>> >>>> +; CHECK-NEXT:    [[TMP21:%.*]] = extractelement <8 x i32> [[TMP9]], i32 1
>> >>>> +; CHECK-NEXT:    [[TMP22:%.*]] = insertelement <16 x i32> [[TMP20]], i32 [[TMP21]], i32 2
>> >>>> +; CHECK-NEXT:    [[TMP23:%.*]] = extractelement <8 x i32> [[TMP9]], i32 2
>> >>>> +; CHECK-NEXT:    [[TMP24:%.*]] = insertelement <16 x i32> [[TMP22]], i32 [[TMP23]], i32 3
>> >>>> +; CHECK-NEXT:    [[TMP25:%.*]] = extractelement <8 x i32> [[TMP9]], i32 3
>> >>>> +; CHECK-NEXT:    [[TMP26:%.*]] = insertelement <16 x i32> [[TMP24]], i32 [[TMP25]], i32 4
>> >>>> +; CHECK-NEXT:    [[TMP27:%.*]] = extractelement <8 x i32> [[TMP9]], i32 4
>> >>>> +; CHECK-NEXT:    [[TMP28:%.*]] = insertelement <16 x i32> [[TMP26]], i32 [[TMP27]], i32 5
>> >>>> +; CHECK-NEXT:    [[TMP29:%.*]] = extractelement <8 x i32> [[TMP9]], i32 5
>> >>>> +; CHECK-NEXT:    [[TMP30:%.*]] = insertelement <16 x i32> [[TMP28]], i32 [[TMP29]], i32 6
>> >>>> +; CHECK-NEXT:    [[TMP31:%.*]] = extractelement <8 x i32> [[TMP9]], i32 6
>> >>>> +; CHECK-NEXT:    [[TMP32:%.*]] = insertelement <16 x i32> [[TMP30]], i32 [[TMP31]], i32 7
>> >>>> +; CHECK-NEXT:    [[TMP33:%.*]] = extractelement <8 x i32> [[TMP9]], i32 7
>> >>>> +; CHECK-NEXT:    [[TMP34:%.*]] = insertelement <16 x i32> [[TMP32]], i32 [[TMP33]], i32 8
>> >>>> +; CHECK-NEXT:    [[TMP35:%.*]] = extractelement <4 x i32> [[TMP14]], i32 0
>> >>>> +; CHECK-NEXT:    [[TMP36:%.*]] = insertelement <16 x i32> [[TMP34]], i32 [[TMP35]], i32 9
>> >>>> +; CHECK-NEXT:    [[TMP37:%.*]] = extractelement <4 x i32> [[TMP14]], i32 1
>> >>>> +; CHECK-NEXT:    [[TMP38:%.*]] = insertelement <16 x i32> [[TMP36]], i32 [[TMP37]], i32 10
>> >>>> +; CHECK-NEXT:    [[TMP39:%.*]] = extractelement <4 x i32> [[TMP14]], i32 2
>> >>>> +; CHECK-NEXT:    [[TMP40:%.*]] = insertelement <16 x i32> [[TMP38]], i32 [[TMP39]], i32 11
>> >>>> +; CHECK-NEXT:    [[TMP41:%.*]] = extractelement <4 x i32> [[TMP14]], i32 3
>> >>>> +; CHECK-NEXT:    [[TMP42:%.*]] = insertelement <16 x i32> [[TMP40]], i32 [[TMP41]], i32 12
>> >>>> +; CHECK-NEXT:    [[TMP43:%.*]] = extractelement <2 x i32> [[TMP17]], i32 0
>> >>>> +; CHECK-NEXT:    [[TMP44:%.*]] = insertelement <16 x i32> [[TMP42]], i32 [[TMP43]], i32 13
>> >>>> +; CHECK-NEXT:    [[TMP45:%.*]] = extractelement <2 x i32> [[TMP17]], i32 1
>> >>>> +; CHECK-NEXT:    [[TMP46:%.*]] = insertelement <16 x i32> [[TMP44]], i32 [[TMP45]], i32 14
>> >>>> +; CHECK-NEXT:    [[TMP47:%.*]] = insertelement <16 x i32> [[TMP46]], i32 [[SHR_14_I_I]], i32 15
>> >>>> +; CHECK-NEXT:    [[TMP48:%.*]] = trunc <16 x i32> [[TMP47]] to <16 x i8>
>> >>>> +; CHECK-NEXT:    [[TMP49:%.*]] = and <16 x i8> <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>, [[TMP48]]
>> >>>>  ; CHECK-NEXT:    [[ARRAYIDX_I_I7_15_I_I:%.*]] = getelementptr inbounds %"struct.std::array", %"struct.std::array"* undef, i64 0, i32 0, i64 15
>> >>>> -; CHECK-NEXT:    [[TMP45:%.*]] = bitcast i8* [[TMP0]] to <16 x i8>*
>> >>>> -; CHECK-NEXT:    store <16 x i8> [[TMP44]], <16 x i8>* [[TMP45]], align 1
>> >>>> +; CHECK-NEXT:    [[TMP50:%.*]] = bitcast i8* [[TMP0]] to <16 x i8>*
>> >>>> +; CHECK-NEXT:    store <16 x i8> [[TMP49]], <16 x i8>* [[TMP50]], align 1
>> >>>>  ; CHECK-NEXT:    unreachable
>> >>>>  ; CHECK:       if.end50.i:
>> >>>>  ; CHECK-NEXT:    ret void
>> >>>>
>> >>>> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/rgb_phi.ll
>> >>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/rgb_phi.ll?rev=353923&r1=353922&r2=353923&view=diff
>> >>>> ==============================================================================
>> >>>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/rgb_phi.ll (original)
>> >>>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/rgb_phi.ll Wed Feb 13 00:26:43 2019
>> >>>> @@ -25,39 +25,37 @@ define float @foo(float* nocapture reado
>> >>>>  ; CHECK-NEXT:  entry:
>> >>>>  ; CHECK-NEXT:    [[TMP0:%.*]] = load float, float* [[A:%.*]], align 4
>> >>>>  ; CHECK-NEXT:    [[ARRAYIDX1:%.*]] = getelementptr inbounds float, float* [[A]], i64 1
>> >>>> -; CHECK-NEXT:    [[TMP1:%.*]] = load float, float* [[ARRAYIDX1]], align 4
>> >>>> -; CHECK-NEXT:    [[ARRAYIDX2:%.*]] = getelementptr inbounds float, float* [[A]], i64 2
>> >>>> -; CHECK-NEXT:    [[TMP2:%.*]] = load float, float* [[ARRAYIDX2]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP1:%.*]] = bitcast float* [[ARRAYIDX1]] to <2 x float>*
>> >>>> +; CHECK-NEXT:    [[TMP2:%.*]] = load <2 x float>, <2 x float>* [[TMP1]], align 4
>> >>>> +; CHECK-NEXT:    [[REORDER_SHUFFLE:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> undef, <2 x i32> <i32 1, i32 0>
>> >>>>  ; CHECK-NEXT:    br label [[FOR_BODY:%.*]]
>> >>>>  ; CHECK:       for.body:
>> >>>>  ; CHECK-NEXT:    [[TMP3:%.*]] = phi float [ [[TMP0]], [[ENTRY:%.*]] ], [ [[DOTPRE:%.*]], [[FOR_BODY_FOR_BODY_CRIT_EDGE:%.*]] ]
>> >>>>  ; CHECK-NEXT:    [[INDVARS_IV:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[INDVARS_IV_NEXT:%.*]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]
>> >>>> -; CHECK-NEXT:    [[B_032:%.*]] = phi float [ [[TMP2]], [[ENTRY]] ], [ [[ADD14:%.*]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]
>> >>>> -; CHECK-NEXT:    [[G_031:%.*]] = phi float [ [[TMP1]], [[ENTRY]] ], [ [[ADD9:%.*]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]
>> >>>>  ; CHECK-NEXT:    [[R_030:%.*]] = phi float [ [[TMP0]], [[ENTRY]] ], [ [[ADD4:%.*]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]
>> >>>> +; CHECK-NEXT:    [[TMP4:%.*]] = phi <2 x float> [ [[REORDER_SHUFFLE]], [[ENTRY]] ], [ [[TMP9:%.*]], [[FOR_BODY_FOR_BODY_CRIT_EDGE]] ]
>> >>>>  ; CHECK-NEXT:    [[MUL:%.*]] = fmul float [[TMP3]], 7.000000e+00
>> >>>>  ; CHECK-NEXT:    [[ADD4]] = fadd float [[R_030]], [[MUL]]
>> >>>> -; CHECK-NEXT:    [[TMP4:%.*]] = add nsw i64 [[INDVARS_IV]], 1
>> >>>> -; CHECK-NEXT:    [[ARRAYIDX7:%.*]] = getelementptr inbounds float, float* [[A]], i64 [[TMP4]]
>> >>>> -; CHECK-NEXT:    [[TMP5:%.*]] = load float, float* [[ARRAYIDX7]], align 4
>> >>>> -; CHECK-NEXT:    [[MUL8:%.*]] = fmul float [[TMP5]], 8.000000e+00
>> >>>> -; CHECK-NEXT:    [[ADD9]] = fadd float [[G_031]], [[MUL8]]
>> >>>> -; CHECK-NEXT:    [[TMP6:%.*]] = add nsw i64 [[INDVARS_IV]], 2
>> >>>> -; CHECK-NEXT:    [[ARRAYIDX12:%.*]] = getelementptr inbounds float, float* [[A]], i64 [[TMP6]]
>> >>>> -; CHECK-NEXT:    [[TMP7:%.*]] = load float, float* [[ARRAYIDX12]], align 4
>> >>>> -; CHECK-NEXT:    [[MUL13:%.*]] = fmul float [[TMP7]], 9.000000e+00
>> >>>> -; CHECK-NEXT:    [[ADD14]] = fadd float [[B_032]], [[MUL13]]
>> >>>> +; CHECK-NEXT:    [[TMP5:%.*]] = add nsw i64 [[INDVARS_IV]], 1
>> >>>> +; CHECK-NEXT:    [[ARRAYIDX7:%.*]] = getelementptr inbounds float, float* [[A]], i64 [[TMP5]]
>> >>>> +; CHECK-NEXT:    [[TMP6:%.*]] = bitcast float* [[ARRAYIDX7]] to <2 x float>*
>> >>>> +; CHECK-NEXT:    [[TMP7:%.*]] = load <2 x float>, <2 x float>* [[TMP6]], align 4
>> >>>> +; CHECK-NEXT:    [[REORDER_SHUFFLE1:%.*]] = shufflevector <2 x float> [[TMP7]], <2 x float> undef, <2 x i32> <i32 1, i32 0>
>> >>>> +; CHECK-NEXT:    [[TMP8:%.*]] = fmul <2 x float> <float 9.000000e+00, float 8.000000e+00>, [[REORDER_SHUFFLE1]]
>> >>>> +; CHECK-NEXT:    [[TMP9]] = fadd <2 x float> [[TMP4]], [[TMP8]]
>> >>>>  ; CHECK-NEXT:    [[INDVARS_IV_NEXT]] = add i64 [[INDVARS_IV]], 3
>> >>>> -; CHECK-NEXT:    [[TMP8:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
>> >>>> -; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32 [[TMP8]], 121
>> >>>> +; CHECK-NEXT:    [[TMP10:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
>> >>>> +; CHECK-NEXT:    [[CMP:%.*]] = icmp slt i32 [[TMP10]], 121
>> >>>>  ; CHECK-NEXT:    br i1 [[CMP]], label [[FOR_BODY_FOR_BODY_CRIT_EDGE]], label [[FOR_END:%.*]]
>> >>>>  ; CHECK:       for.body.for.body_crit_edge:
>> >>>>  ; CHECK-NEXT:    [[ARRAYIDX3_PHI_TRANS_INSERT:%.*]] = getelementptr inbounds float, float* [[A]], i64 [[INDVARS_IV_NEXT]]
>> >>>>  ; CHECK-NEXT:    [[DOTPRE]] = load float, float* [[ARRAYIDX3_PHI_TRANS_INSERT]], align 4
>> >>>>  ; CHECK-NEXT:    br label [[FOR_BODY]]
>> >>>>  ; CHECK:       for.end:
>> >>>> -; CHECK-NEXT:    [[ADD16:%.*]] = fadd float [[ADD4]], [[ADD9]]
>> >>>> -; CHECK-NEXT:    [[ADD17:%.*]] = fadd float [[ADD16]], [[ADD14]]
>> >>>> +; CHECK-NEXT:    [[TMP11:%.*]] = extractelement <2 x float> [[TMP9]], i32 1
>> >>>> +; CHECK-NEXT:    [[ADD16:%.*]] = fadd float [[ADD4]], [[TMP11]]
>> >>>> +; CHECK-NEXT:    [[TMP12:%.*]] = extractelement <2 x float> [[TMP9]], i32 0
>> >>>> +; CHECK-NEXT:    [[ADD17:%.*]] = fadd float [[ADD16]], [[TMP12]]
>> >>>>  ; CHECK-NEXT:    ret float [[ADD17]]
>> >>>>  ;
>> >>>>  entry:
>> >>>>
>> >>>> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/saxpy.ll
>> >>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/saxpy.ll?rev=353923&r1=353922&r2=353923&view=diff
>> >>>> ==============================================================================
>> >>>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/saxpy.ll (original)
>> >>>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/saxpy.ll Wed Feb 13 00:26:43 2019
>> >>>> @@ -63,15 +63,11 @@ define void @SAXPY_crash(i32* noalias no
>> >>>>  ; CHECK-NEXT:    [[TMP1:%.*]] = add i64 [[I:%.*]], 1
>> >>>>  ; CHECK-NEXT:    [[TMP2:%.*]] = getelementptr inbounds i32, i32* [[X:%.*]], i64 [[TMP1]]
>> >>>>  ; CHECK-NEXT:    [[TMP3:%.*]] = getelementptr inbounds i32, i32* [[Y:%.*]], i64 [[TMP1]]
>> >>>> -; CHECK-NEXT:    [[TMP4:%.*]] = load i32, i32* [[TMP3]], align 4
>> >>>> -; CHECK-NEXT:    [[TMP5:%.*]] = add nsw i32 undef, [[TMP4]]
>> >>>> -; CHECK-NEXT:    store i32 [[TMP5]], i32* [[TMP2]], align 4
>> >>>> -; CHECK-NEXT:    [[TMP6:%.*]] = add i64 [[I]], 2
>> >>>> -; CHECK-NEXT:    [[TMP7:%.*]] = getelementptr inbounds i32, i32* [[X]], i64 [[TMP6]]
>> >>>> -; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr inbounds i32, i32* [[Y]], i64 [[TMP6]]
>> >>>> -; CHECK-NEXT:    [[TMP9:%.*]] = load i32, i32* [[TMP8]], align 4
>> >>>> -; CHECK-NEXT:    [[TMP10:%.*]] = add nsw i32 undef, [[TMP9]]
>> >>>> -; CHECK-NEXT:    store i32 [[TMP10]], i32* [[TMP7]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP4:%.*]] = bitcast i32* [[TMP3]] to <2 x i32>*
>> >>>> +; CHECK-NEXT:    [[TMP5:%.*]] = load <2 x i32>, <2 x i32>* [[TMP4]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP6:%.*]] = add nsw <2 x i32> undef, [[TMP5]]
>> >>>> +; CHECK-NEXT:    [[TMP7:%.*]] = bitcast i32* [[TMP2]] to <2 x i32>*
>> >>>> +; CHECK-NEXT:    store <2 x i32> [[TMP6]], <2 x i32>* [[TMP7]], align 4
>> >>>>  ; CHECK-NEXT:    ret void
>> >>>>  ;
>> >>>>    %1 = add i64 %i, 1
>> >>>>
>> >>>> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/schedule-bundle.ll
>> >>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/schedule-bundle.ll?rev=353923&r1=353922&r2=353923&view=diff
>> >>>> ==============================================================================
>> >>>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/schedule-bundle.ll (original)
>> >>>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/schedule-bundle.ll Wed Feb 13 00:26:43 2019
>> >>>> @@ -14,14 +14,10 @@ define i32 @slp_schedule_bundle() local_
>> >>>>  ; CHECK-NEXT:    [[TMP1:%.*]] = lshr <4 x i32> [[TMP0]], <i32 31, i32 31, i32 31, i32 31>
>> >>>>  ; CHECK-NEXT:    [[TMP2:%.*]] = xor <4 x i32> <i32 1, i32 1, i32 1, i32 1>, [[TMP1]]
>> >>>>  ; CHECK-NEXT:    store <4 x i32> [[TMP2]], <4 x i32>* bitcast ([1 x i32]* @a to <4 x i32>*), align 4
>> >>>> -; CHECK-NEXT:    [[TMP3:%.*]] = load i32, i32* getelementptr ([1 x i32], [1 x i32]* @b, i64 4, i64 0), align 4
>> >>>> -; CHECK-NEXT:    [[DOTLOBIT_4:%.*]] = lshr i32 [[TMP3]], 31
>> >>>> -; CHECK-NEXT:    [[DOTLOBIT_NOT_4:%.*]] = xor i32 [[DOTLOBIT_4]], 1
>> >>>> -; CHECK-NEXT:    store i32 [[DOTLOBIT_NOT_4]], i32* getelementptr ([1 x i32], [1 x i32]* @a, i64 4, i64 0), align 4
>> >>>> -; CHECK-NEXT:    [[TMP4:%.*]] = load i32, i32* getelementptr ([1 x i32], [1 x i32]* @b, i64 5, i64 0), align 4
>> >>>> -; CHECK-NEXT:    [[DOTLOBIT_5:%.*]] = lshr i32 [[TMP4]], 31
>> >>>> -; CHECK-NEXT:    [[DOTLOBIT_NOT_5:%.*]] = xor i32 [[DOTLOBIT_5]], 1
>> >>>> -; CHECK-NEXT:    store i32 [[DOTLOBIT_NOT_5]], i32* getelementptr ([1 x i32], [1 x i32]* @a, i64 5, i64 0), align 4
>> >>>> +; CHECK-NEXT:    [[TMP3:%.*]] = load <2 x i32>, <2 x i32>* bitcast (i32* getelementptr ([1 x i32], [1 x i32]* @b, i64 4, i64 0) to <2 x i32>*), align 4
>> >>>> +; CHECK-NEXT:    [[TMP4:%.*]] = lshr <2 x i32> [[TMP3]], <i32 31, i32 31>
>> >>>> +; CHECK-NEXT:    [[TMP5:%.*]] = xor <2 x i32> <i32 1, i32 1>, [[TMP4]]
>> >>>> +; CHECK-NEXT:    store <2 x i32> [[TMP5]], <2 x i32>* bitcast (i32* getelementptr ([1 x i32], [1 x i32]* @a, i64 4, i64 0) to <2 x i32>*), align 4
>> >>>>  ; CHECK-NEXT:    ret i32 undef
>> >>>>  ;
>> >>>>  entry:
>> >>>>
>> >>>> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/sext.ll
>> >>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/sext.ll?rev=353923&r1=353922&r2=353923&view=diff
>> >>>> ==============================================================================
>> >>>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/sext.ll (original)
>> >>>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/sext.ll Wed Feb 13 00:26:43 2019
>> >>>> @@ -808,18 +808,20 @@ define <4 x i64> @loadext_4i32_to_4i64(i
>> >>>>  ; SSE2-NEXT:    [[P1:%.*]] = getelementptr inbounds i32, i32* [[P0:%.*]], i64 1
>> >>>>  ; SSE2-NEXT:    [[P2:%.*]] = getelementptr inbounds i32, i32* [[P0]], i64 2
>> >>>>  ; SSE2-NEXT:    [[P3:%.*]] = getelementptr inbounds i32, i32* [[P0]], i64 3
>> >>>> -; SSE2-NEXT:    [[I0:%.*]] = load i32, i32* [[P0]], align 1
>> >>>> -; SSE2-NEXT:    [[I1:%.*]] = load i32, i32* [[P1]], align 1
>> >>>> -; SSE2-NEXT:    [[I2:%.*]] = load i32, i32* [[P2]], align 1
>> >>>> -; SSE2-NEXT:    [[I3:%.*]] = load i32, i32* [[P3]], align 1
>> >>>> -; SSE2-NEXT:    [[X0:%.*]] = sext i32 [[I0]] to i64
>> >>>> -; SSE2-NEXT:    [[X1:%.*]] = sext i32 [[I1]] to i64
>> >>>> -; SSE2-NEXT:    [[X2:%.*]] = sext i32 [[I2]] to i64
>> >>>> -; SSE2-NEXT:    [[X3:%.*]] = sext i32 [[I3]] to i64
>> >>>> -; SSE2-NEXT:    [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0
>> >>>> -; SSE2-NEXT:    [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1
>> >>>> -; SSE2-NEXT:    [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
>> >>>> -; SSE2-NEXT:    [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
>> >>>> +; SSE2-NEXT:    [[TMP1:%.*]] = bitcast i32* [[P0]] to <2 x i32>*
>> >>>> +; SSE2-NEXT:    [[TMP2:%.*]] = load <2 x i32>, <2 x i32>* [[TMP1]], align 1
>> >>>> +; SSE2-NEXT:    [[TMP3:%.*]] = bitcast i32* [[P2]] to <2 x i32>*
>> >>>> +; SSE2-NEXT:    [[TMP4:%.*]] = load <2 x i32>, <2 x i32>* [[TMP3]], align 1
>> >>>> +; SSE2-NEXT:    [[TMP5:%.*]] = sext <2 x i32> [[TMP2]] to <2 x i64>
>> >>>> +; SSE2-NEXT:    [[TMP6:%.*]] = sext <2 x i32> [[TMP4]] to <2 x i64>
>> >>>> +; SSE2-NEXT:    [[TMP7:%.*]] = extractelement <2 x i64> [[TMP5]], i32 0
>> >>>> +; SSE2-NEXT:    [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP7]], i32 0
>> >>>> +; SSE2-NEXT:    [[TMP8:%.*]] = extractelement <2 x i64> [[TMP5]], i32 1
>> >>>> +; SSE2-NEXT:    [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP8]], i32 1
>> >>>> +; SSE2-NEXT:    [[TMP9:%.*]] = extractelement <2 x i64> [[TMP6]], i32 0
>> >>>> +; SSE2-NEXT:    [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP9]], i32 2
>> >>>> +; SSE2-NEXT:    [[TMP10:%.*]] = extractelement <2 x i64> [[TMP6]], i32 1
>> >>>> +; SSE2-NEXT:    [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP10]], i32 3
>> >>>>  ; SSE2-NEXT:    ret <4 x i64> [[V3]]
>> >>>>  ;
>> >>>>  ; SLM-LABEL: @loadext_4i32_to_4i64(
>> >>>> @@ -845,17 +847,18 @@ define <4 x i64> @loadext_4i32_to_4i64(i
>> >>>>  ; AVX1-NEXT:    [[P3:%.*]] = getelementptr inbounds i32, i32* [[P0]], i64 3
>> >>>>  ; AVX1-NEXT:    [[TMP1:%.*]] = bitcast i32* [[P0]] to <2 x i32>*
>> >>>>  ; AVX1-NEXT:    [[TMP2:%.*]] = load <2 x i32>, <2 x i32>* [[TMP1]], align 1
>> >>>> -; AVX1-NEXT:    [[I2:%.*]] = load i32, i32* [[P2]], align 1
>> >>>> -; AVX1-NEXT:    [[I3:%.*]] = load i32, i32* [[P3]], align 1
>> >>>> -; AVX1-NEXT:    [[TMP3:%.*]] = sext <2 x i32> [[TMP2]] to <2 x i64>
>> >>>> -; AVX1-NEXT:    [[X2:%.*]] = sext i32 [[I2]] to i64
>> >>>> -; AVX1-NEXT:    [[X3:%.*]] = sext i32 [[I3]] to i64
>> >>>> -; AVX1-NEXT:    [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
>> >>>> -; AVX1-NEXT:    [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
>> >>>> -; AVX1-NEXT:    [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
>> >>>> -; AVX1-NEXT:    [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
>> >>>> -; AVX1-NEXT:    [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
>> >>>> -; AVX1-NEXT:    [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
>> >>>> +; AVX1-NEXT:    [[TMP3:%.*]] = bitcast i32* [[P2]] to <2 x i32>*
>> >>>> +; AVX1-NEXT:    [[TMP4:%.*]] = load <2 x i32>, <2 x i32>* [[TMP3]], align 1
>> >>>> +; AVX1-NEXT:    [[TMP5:%.*]] = sext <2 x i32> [[TMP2]] to <2 x i64>
>> >>>> +; AVX1-NEXT:    [[TMP6:%.*]] = sext <2 x i32> [[TMP4]] to <2 x i64>
>> >>>> +; AVX1-NEXT:    [[TMP7:%.*]] = extractelement <2 x i64> [[TMP5]], i32 0
>> >>>> +; AVX1-NEXT:    [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP7]], i32 0
>> >>>> +; AVX1-NEXT:    [[TMP8:%.*]] = extractelement <2 x i64> [[TMP5]], i32 1
>> >>>> +; AVX1-NEXT:    [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP8]], i32 1
>> >>>> +; AVX1-NEXT:    [[TMP9:%.*]] = extractelement <2 x i64> [[TMP6]], i32 0
>> >>>> +; AVX1-NEXT:    [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP9]], i32 2
>> >>>> +; AVX1-NEXT:    [[TMP10:%.*]] = extractelement <2 x i64> [[TMP6]], i32 1
>> >>>> +; AVX1-NEXT:    [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP10]], i32 3
>> >>>>  ; AVX1-NEXT:    ret <4 x i64> [[V3]]
>> >>>>  ;
>> >>>>  ; AVX2-LABEL: @loadext_4i32_to_4i64(
>> >>>>
>> >>>> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/shift-lshr.ll
>> >>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/shift-lshr.ll?rev=353923&r1=353922&r2=353923&view=diff
>> >>>> ==============================================================================
>> >>>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/shift-lshr.ll (original)
>> >>>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/shift-lshr.ll Wed Feb 13 00:26:43 2019
>> >>>> @@ -125,70 +125,38 @@ define void @lshr_v8i64() {
>> >>>>
>> >>>>  define void @lshr_v16i32() {
>> >>>>  ; SSE-LABEL: @lshr_v16i32(
>> >>>> -; SSE-NEXT:    [[A0:%.*]] = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 0), align 4
>> >>>> -; SSE-NEXT:    [[A1:%.*]] = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 1), align 4
>> >>>> -; SSE-NEXT:    [[A2:%.*]] = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 2), align 4
>> >>>> -; SSE-NEXT:    [[A3:%.*]] = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 3), align 4
>> >>>> -; SSE-NEXT:    [[A4:%.*]] = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4), align 4
>> >>>> -; SSE-NEXT:    [[A5:%.*]] = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 5), align 4
>> >>>> -; SSE-NEXT:    [[A6:%.*]] = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 6), align 4
>> >>>> -; SSE-NEXT:    [[A7:%.*]] = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 7), align 4
>> >>>> -; SSE-NEXT:    [[A8:%.*]] = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8), align 4
>> >>>> -; SSE-NEXT:    [[A9:%.*]] = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 9), align 4
>> >>>> -; SSE-NEXT:    [[A10:%.*]] = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 10), align 4
>> >>>> -; SSE-NEXT:    [[A11:%.*]] = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 11), align 4
>> >>>> -; SSE-NEXT:    [[A12:%.*]] = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12), align 4
>> >>>> -; SSE-NEXT:    [[A13:%.*]] = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 13), align 4
>> >>>> -; SSE-NEXT:    [[A14:%.*]] = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 14), align 4
>> >>>> -; SSE-NEXT:    [[A15:%.*]] = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 15), align 4
>> >>>> -; SSE-NEXT:    [[B0:%.*]] = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 0), align 4
>> >>>> -; SSE-NEXT:    [[B1:%.*]] = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 1), align 4
>> >>>> -; SSE-NEXT:    [[B2:%.*]] = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 2), align 4
>> >>>> -; SSE-NEXT:    [[B3:%.*]] = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 3), align 4
>> >>>> -; SSE-NEXT:    [[B4:%.*]] = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4), align 4
>> >>>> -; SSE-NEXT:    [[B5:%.*]] = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 5), align 4
>> >>>> -; SSE-NEXT:    [[B6:%.*]] = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 6), align 4
>> >>>> -; SSE-NEXT:    [[B7:%.*]] = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 7), align 4
>> >>>> -; SSE-NEXT:    [[B8:%.*]] = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8), align 4
>> >>>> -; SSE-NEXT:    [[B9:%.*]] = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 9), align 4
>> >>>> -; SSE-NEXT:    [[B10:%.*]] = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 10), align 4
>> >>>> -; SSE-NEXT:    [[B11:%.*]] = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 11), align 4
>> >>>> -; SSE-NEXT:    [[B12:%.*]] = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12), align 4
>> >>>> -; SSE-NEXT:    [[B13:%.*]] = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 13), align 4
>> >>>> -; SSE-NEXT:    [[B14:%.*]] = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 14), align 4
>> >>>> -; SSE-NEXT:    [[B15:%.*]] = load i32, i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 15), align 4
>> >>>> -; SSE-NEXT:    [[R0:%.*]] = lshr i32 [[A0]], [[B0]]
>> >>>> -; SSE-NEXT:    [[R1:%.*]] = lshr i32 [[A1]], [[B1]]
>> >>>> -; SSE-NEXT:    [[R2:%.*]] = lshr i32 [[A2]], [[B2]]
>> >>>> -; SSE-NEXT:    [[R3:%.*]] = lshr i32 [[A3]], [[B3]]
>> >>>> -; SSE-NEXT:    [[R4:%.*]] = lshr i32 [[A4]], [[B4]]
>> >>>> -; SSE-NEXT:    [[R5:%.*]] = lshr i32 [[A5]], [[B5]]
>> >>>> -; SSE-NEXT:    [[R6:%.*]] = lshr i32 [[A6]], [[B6]]
>> >>>> -; SSE-NEXT:    [[R7:%.*]] = lshr i32 [[A7]], [[B7]]
>> >>>> -; SSE-NEXT:    [[R8:%.*]] = lshr i32 [[A8]], [[B8]]
>> >>>> -; SSE-NEXT:    [[R9:%.*]] = lshr i32 [[A9]], [[B9]]
>> >>>> -; SSE-NEXT:    [[R10:%.*]] = lshr i32 [[A10]], [[B10]]
>> >>>> -; SSE-NEXT:    [[R11:%.*]] = lshr i32 [[A11]], [[B11]]
>> >>>> -; SSE-NEXT:    [[R12:%.*]] = lshr i32 [[A12]], [[B12]]
>> >>>> -; SSE-NEXT:    [[R13:%.*]] = lshr i32 [[A13]], [[B13]]
>> >>>> -; SSE-NEXT:    [[R14:%.*]] = lshr i32 [[A14]], [[B14]]
>> >>>> -; SSE-NEXT:    [[R15:%.*]] = lshr i32 [[A15]], [[B15]]
>> >>>> -; SSE-NEXT:    store i32 [[R0]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 0), align 4
>> >>>> -; SSE-NEXT:    store i32 [[R1]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 1), align 4
>> >>>> -; SSE-NEXT:    store i32 [[R2]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 2), align 4
>> >>>> -; SSE-NEXT:    store i32 [[R3]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 3), align 4
>> >>>> -; SSE-NEXT:    store i32 [[R4]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4), align 4
>> >>>> -; SSE-NEXT:    store i32 [[R5]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 5), align 4
>> >>>> -; SSE-NEXT:    store i32 [[R6]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 6), align 4
>> >>>> -; SSE-NEXT:    store i32 [[R7]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 7), align 4
>> >>>> -; SSE-NEXT:    store i32 [[R8]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8), align 4
>> >>>> -; SSE-NEXT:    store i32 [[R9]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 9), align 4
>> >>>> -; SSE-NEXT:    store i32 [[R10]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 10), align 4
>> >>>> -; SSE-NEXT:    store i32 [[R11]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 11), align 4
>> >>>> -; SSE-NEXT:    store i32 [[R12]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12), align 4
>> >>>> -; SSE-NEXT:    store i32 [[R13]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 13), align 4
>> >>>> -; SSE-NEXT:    store i32 [[R14]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14), align 4
>> >>>> -; SSE-NEXT:    store i32 [[R15]], i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 15), align 4
>> >>>> +; SSE-NEXT:    [[TMP1:%.*]] = load <2 x i32>, <2 x i32>* bitcast ([16 x i32]* @a32 to <2 x i32>*), align 4
>> >>>> +; SSE-NEXT:    [[TMP2:%.*]] = load <2 x i32>, <2 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 2) to <2 x i32>*), align 4
>> >>>> +; SSE-NEXT:    [[TMP3:%.*]] = load <2 x i32>, <2 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 4) to <2 x i32>*), align 4
>> >>>> +; SSE-NEXT:    [[TMP4:%.*]] = load <2 x i32>, <2 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 6) to <2 x i32>*), align 4
>> >>>> +; SSE-NEXT:    [[TMP5:%.*]] = load <2 x i32>, <2 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 8) to <2 x i32>*), align 4
>> >>>> +; SSE-NEXT:    [[TMP6:%.*]] = load <2 x i32>, <2 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 10) to <2 x i32>*), align 4
>> >>>> +; SSE-NEXT:    [[TMP7:%.*]] = load <2 x i32>, <2 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 12) to <2 x i32>*), align 4
>> >>>> +; SSE-NEXT:    [[TMP8:%.*]] = load <2 x i32>, <2 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @a32, i32 0, i64 14) to <2 x i32>*), align 4
>> >>>> +; SSE-NEXT:    [[TMP9:%.*]] = load <2 x i32>, <2 x i32>* bitcast ([16 x i32]* @b32 to <2 x i32>*), align 4
>> >>>> +; SSE-NEXT:    [[TMP10:%.*]] = load <2 x i32>, <2 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 2) to <2 x i32>*), align 4
>> >>>> +; SSE-NEXT:    [[TMP11:%.*]] = load <2 x i32>, <2 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 4) to <2 x i32>*), align 4
>> >>>> +; SSE-NEXT:    [[TMP12:%.*]] = load <2 x i32>, <2 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 6) to <2 x i32>*), align 4
>> >>>> +; SSE-NEXT:    [[TMP13:%.*]] = load <2 x i32>, <2 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 8) to <2 x i32>*), align 4
>> >>>> +; SSE-NEXT:    [[TMP14:%.*]] = load <2 x i32>, <2 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 10) to <2 x i32>*), align 4
>> >>>> +; SSE-NEXT:    [[TMP15:%.*]] = load <2 x i32>, <2 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 12) to <2 x i32>*), align 4
>> >>>> +; SSE-NEXT:    [[TMP16:%.*]] = load <2 x i32>, <2 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @b32, i32 0, i64 14) to <2 x i32>*), align 4
>> >>>> +; SSE-NEXT:    [[TMP17:%.*]] = lshr <2 x i32> [[TMP1]], [[TMP9]]
>> >>>> +; SSE-NEXT:    [[TMP18:%.*]] = lshr <2 x i32> [[TMP2]], [[TMP10]]
>> >>>> +; SSE-NEXT:    [[TMP19:%.*]] = lshr <2 x i32> [[TMP3]], [[TMP11]]
>> >>>> +; SSE-NEXT:    [[TMP20:%.*]] = lshr <2 x i32> [[TMP4]], [[TMP12]]
>> >>>> +; SSE-NEXT:    [[TMP21:%.*]] = lshr <2 x i32> [[TMP5]], [[TMP13]]
>> >>>> +; SSE-NEXT:    [[TMP22:%.*]] = lshr <2 x i32> [[TMP6]], [[TMP14]]
>> >>>> +; SSE-NEXT:    [[TMP23:%.*]] = lshr <2 x i32> [[TMP7]], [[TMP15]]
>> >>>> +; SSE-NEXT:    [[TMP24:%.*]] = lshr <2 x i32> [[TMP8]], [[TMP16]]
>> >>>> +; SSE-NEXT:    store <2 x i32> [[TMP17]], <2 x i32>* bitcast ([16 x i32]* @c32 to <2 x i32>*), align 4
>> >>>> +; SSE-NEXT:    store <2 x i32> [[TMP18]], <2 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 2) to <2 x i32>*), align 4
>> >>>> +; SSE-NEXT:    store <2 x i32> [[TMP19]], <2 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 4) to <2 x i32>*), align 4
>> >>>> +; SSE-NEXT:    store <2 x i32> [[TMP20]], <2 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 6) to <2 x i32>*), align 4
>> >>>> +; SSE-NEXT:    store <2 x i32> [[TMP21]], <2 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 8) to <2 x i32>*), align 4
>> >>>> +; SSE-NEXT:    store <2 x i32> [[TMP22]], <2 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 10) to <2 x i32>*), align 4
>> >>>> +; SSE-NEXT:    store <2 x i32> [[TMP23]], <2 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 12) to <2 x i32>*), align 4
>> >>>> +; SSE-NEXT:    store <2 x i32> [[TMP24]], <2 x i32>* bitcast (i32* getelementptr inbounds ([16 x i32], [16 x i32]* @c32, i32 0, i64 14) to <2 x i32>*), align 4
>> >>>>  ; SSE-NEXT:    ret void
>> >>>>  ;
>> >>>>  ; AVX-LABEL: @lshr_v16i32(
>> >>>>
>> >>>> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/shift-shl.ll
>> >>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/shift-shl.ll?rev=353923&r1=353922&r2=353923&view=diff
>> >>>> ==============================================================================
>> >>>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/shift-shl.ll (original)
>> >>>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/shift-shl.ll Wed Feb 13 00:26:43 2019
>> >>>> @@ -241,134 +241,38 @@ define void @shl_v16i32() {
>> >>>>
>> >>>>  define void @shl_v32i16() {
>> >>>>  ; SSE-LABEL: @shl_v32i16(
>> >>>> -; SSE-NEXT:    [[A0:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 0), align 2
>> >>>> -; SSE-NEXT:    [[A1:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 1), align 2
>> >>>> -; SSE-NEXT:    [[A2:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 2), align 2
>> >>>> -; SSE-NEXT:    [[A3:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 3), align 2
>> >>>> -; SSE-NEXT:    [[A4:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 4), align 2
>> >>>> -; SSE-NEXT:    [[A5:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 5), align 2
>> >>>> -; SSE-NEXT:    [[A6:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 6), align 2
>> >>>> -; SSE-NEXT:    [[A7:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 7), align 2
>> >>>> -; SSE-NEXT:    [[A8:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8), align 2
>> >>>> -; SSE-NEXT:    [[A9:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 9), align 2
>> >>>> -; SSE-NEXT:    [[A10:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 10), align 2
>> >>>> -; SSE-NEXT:    [[A11:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 11), align 2
>> >>>> -; SSE-NEXT:    [[A12:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 12), align 2
>> >>>> -; SSE-NEXT:    [[A13:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 13), align 2
>> >>>> -; SSE-NEXT:    [[A14:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 14), align 2
>> >>>> -; SSE-NEXT:    [[A15:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 15), align 2
>> >>>> -; SSE-NEXT:    [[A16:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16), align 2
>> >>>> -; SSE-NEXT:    [[A17:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 17), align 2
>> >>>> -; SSE-NEXT:    [[A18:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 18), align 2
>> >>>> -; SSE-NEXT:    [[A19:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 19), align 2
>> >>>> -; SSE-NEXT:    [[A20:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 20), align 2
>> >>>> -; SSE-NEXT:    [[A21:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 21), align 2
>> >>>> -; SSE-NEXT:    [[A22:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 22), align 2
>> >>>> -; SSE-NEXT:    [[A23:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 23), align 2
>> >>>> -; SSE-NEXT:    [[A24:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24), align 2
>> >>>> -; SSE-NEXT:    [[A25:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 25), align 2
>> >>>> -; SSE-NEXT:    [[A26:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 26), align 2
>> >>>> -; SSE-NEXT:    [[A27:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 27), align 2
>> >>>> -; SSE-NEXT:    [[A28:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 28), align 2
>> >>>> -; SSE-NEXT:    [[A29:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 29), align 2
>> >>>> -; SSE-NEXT:    [[A30:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 30), align 2
>> >>>> -; SSE-NEXT:    [[A31:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 31), align 2
>> >>>> -; SSE-NEXT:    [[B0:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 0), align 2
>> >>>> -; SSE-NEXT:    [[B1:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 1), align 2
>> >>>> -; SSE-NEXT:    [[B2:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 2), align 2
>> >>>> -; SSE-NEXT:    [[B3:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 3), align 2
>> >>>> -; SSE-NEXT:    [[B4:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 4), align 2
>> >>>> -; SSE-NEXT:    [[B5:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 5), align 2
>> >>>> -; SSE-NEXT:    [[B6:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 6), align 2
>> >>>> -; SSE-NEXT:    [[B7:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 7), align 2
>> >>>> -; SSE-NEXT:    [[B8:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8), align 2
>> >>>> -; SSE-NEXT:    [[B9:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 9), align 2
>> >>>> -; SSE-NEXT:    [[B10:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 10), align 2
>> >>>> -; SSE-NEXT:    [[B11:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 11), align 2
>> >>>> -; SSE-NEXT:    [[B12:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 12), align 2
>> >>>> -; SSE-NEXT:    [[B13:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 13), align 2
>> >>>> -; SSE-NEXT:    [[B14:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 14), align 2
>> >>>> -; SSE-NEXT:    [[B15:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 15), align 2
>> >>>> -; SSE-NEXT:    [[B16:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16), align 2
>> >>>> -; SSE-NEXT:    [[B17:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 17), align 2
>> >>>> -; SSE-NEXT:    [[B18:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 18), align 2
>> >>>> -; SSE-NEXT:    [[B19:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 19), align 2
>> >>>> -; SSE-NEXT:    [[B20:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 20), align 2
>> >>>> -; SSE-NEXT:    [[B21:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 21), align 2
>> >>>> -; SSE-NEXT:    [[B22:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 22), align 2
>> >>>> -; SSE-NEXT:    [[B23:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 23), align 2
>> >>>> -; SSE-NEXT:    [[B24:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24), align 2
>> >>>> -; SSE-NEXT:    [[B25:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 25), align 2
>> >>>> -; SSE-NEXT:    [[B26:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 26), align 2
>> >>>> -; SSE-NEXT:    [[B27:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 27), align 2
>> >>>> -; SSE-NEXT:    [[B28:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 28), align 2
>> >>>> -; SSE-NEXT:    [[B29:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 29), align 2
>> >>>> -; SSE-NEXT:    [[B30:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 30), align 2
>> >>>> -; SSE-NEXT:    [[B31:%.*]] = load i16, i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 31), align 2
>> >>>> -; SSE-NEXT:    [[R0:%.*]] = shl i16 [[A0]], [[B0]]
>> >>>> -; SSE-NEXT:    [[R1:%.*]] = shl i16 [[A1]], [[B1]]
>> >>>> -; SSE-NEXT:    [[R2:%.*]] = shl i16 [[A2]], [[B2]]
>> >>>> -; SSE-NEXT:    [[R3:%.*]] = shl i16 [[A3]], [[B3]]
>> >>>> -; SSE-NEXT:    [[R4:%.*]] = shl i16 [[A4]], [[B4]]
>> >>>> -; SSE-NEXT:    [[R5:%.*]] = shl i16 [[A5]], [[B5]]
>> >>>> -; SSE-NEXT:    [[R6:%.*]] = shl i16 [[A6]], [[B6]]
>> >>>> -; SSE-NEXT:    [[R7:%.*]] = shl i16 [[A7]], [[B7]]
>> >>>> -; SSE-NEXT:    [[R8:%.*]] = shl i16 [[A8]], [[B8]]
>> >>>> -; SSE-NEXT:    [[R9:%.*]] = shl i16 [[A9]], [[B9]]
>> >>>> -; SSE-NEXT:    [[R10:%.*]] = shl i16 [[A10]], [[B10]]
>> >>>> -; SSE-NEXT:    [[R11:%.*]] = shl i16 [[A11]], [[B11]]
>> >>>> -; SSE-NEXT:    [[R12:%.*]] = shl i16 [[A12]], [[B12]]
>> >>>> -; SSE-NEXT:    [[R13:%.*]] = shl i16 [[A13]], [[B13]]
>> >>>> -; SSE-NEXT:    [[R14:%.*]] = shl i16 [[A14]], [[B14]]
>> >>>> -; SSE-NEXT:    [[R15:%.*]] = shl i16 [[A15]], [[B15]]
>> >>>> -; SSE-NEXT:    [[R16:%.*]] = shl i16 [[A16]], [[B16]]
>> >>>> -; SSE-NEXT:    [[R17:%.*]] = shl i16 [[A17]], [[B17]]
>> >>>> -; SSE-NEXT:    [[R18:%.*]] = shl i16 [[A18]], [[B18]]
>> >>>> -; SSE-NEXT:    [[R19:%.*]] = shl i16 [[A19]], [[B19]]
>> >>>> -; SSE-NEXT:    [[R20:%.*]] = shl i16 [[A20]], [[B20]]
>> >>>> -; SSE-NEXT:    [[R21:%.*]] = shl i16 [[A21]], [[B21]]
>> >>>> -; SSE-NEXT:    [[R22:%.*]] = shl i16 [[A22]], [[B22]]
>> >>>> -; SSE-NEXT:    [[R23:%.*]] = shl i16 [[A23]], [[B23]]
>> >>>> -; SSE-NEXT:    [[R24:%.*]] = shl i16 [[A24]], [[B24]]
>> >>>> -; SSE-NEXT:    [[R25:%.*]] = shl i16 [[A25]], [[B25]]
>> >>>> -; SSE-NEXT:    [[R26:%.*]] = shl i16 [[A26]], [[B26]]
>> >>>> -; SSE-NEXT:    [[R27:%.*]] = shl i16 [[A27]], [[B27]]
>> >>>> -; SSE-NEXT:    [[R28:%.*]] = shl i16 [[A28]], [[B28]]
>> >>>> -; SSE-NEXT:    [[R29:%.*]] = shl i16 [[A29]], [[B29]]
>> >>>> -; SSE-NEXT:    [[R30:%.*]] = shl i16 [[A30]], [[B30]]
>> >>>> -; SSE-NEXT:    [[R31:%.*]] = shl i16 [[A31]], [[B31]]
>> >>>> -; SSE-NEXT:    store i16 [[R0]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 0), align 2
>> >>>> -; SSE-NEXT:    store i16 [[R1]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 1), align 2
>> >>>> -; SSE-NEXT:    store i16 [[R2]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 2), align 2
>> >>>> -; SSE-NEXT:    store i16 [[R3]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 3), align 2
>> >>>> -; SSE-NEXT:    store i16 [[R4]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 4), align 2
>> >>>> -; SSE-NEXT:    store i16 [[R5]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 5), align 2
>> >>>> -; SSE-NEXT:    store i16 [[R6]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 6), align 2
>> >>>> -; SSE-NEXT:    store i16 [[R7]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 7), align 2
>> >>>> -; SSE-NEXT:    store i16 [[R8]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8), align 2
>> >>>> -; SSE-NEXT:    store i16 [[R9]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 9), align 2
>> >>>> -; SSE-NEXT:    store i16 [[R10]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 10), align 2
>> >>>> -; SSE-NEXT:    store i16 [[R11]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 11), align 2
>> >>>> -; SSE-NEXT:    store i16 [[R12]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 12), align 2
>> >>>> -; SSE-NEXT:    store i16 [[R13]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 13), align 2
>> >>>> -; SSE-NEXT:    store i16 [[R14]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 14), align 2
>> >>>> -; SSE-NEXT:    store i16 [[R15]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 15), align 2
>> >>>> -; SSE-NEXT:    store i16 [[R16]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16), align 2
>> >>>> -; SSE-NEXT:    store i16 [[R17]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 17), align 2
>> >>>> -; SSE-NEXT:    store i16 [[R18]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 18), align 2
>> >>>> -; SSE-NEXT:    store i16 [[R19]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 19), align 2
>> >>>> -; SSE-NEXT:    store i16 [[R20]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 20), align 2
>> >>>> -; SSE-NEXT:    store i16 [[R21]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 21), align 2
>> >>>> -; SSE-NEXT:    store i16 [[R22]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 22), align 2
>> >>>> -; SSE-NEXT:    store i16 [[R23]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 23), align 2
>> >>>> -; SSE-NEXT:    store i16 [[R24]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24), align 2
>> >>>> -; SSE-NEXT:    store i16 [[R25]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 25), align 2
>> >>>> -; SSE-NEXT:    store i16 [[R26]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 26), align 2
>> >>>> -; SSE-NEXT:    store i16 [[R27]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 27), align 2
>> >>>> -; SSE-NEXT:    store i16 [[R28]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 28), align 2
>> >>>> -; SSE-NEXT:    store i16 [[R29]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 29), align 2
>> >>>> -; SSE-NEXT:    store i16 [[R30]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 30), align 2
>> >>>> -; SSE-NEXT:    store i16 [[R31]], i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 31), align 2
>> >>>> +; SSE-NEXT:    [[TMP1:%.*]] = load <4 x i16>, <4 x i16>* bitcast ([32 x i16]* @a16 to <4 x i16>*), align 2
>> >>>> +; SSE-NEXT:    [[TMP2:%.*]] = load <4 x i16>, <4 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 4) to <4 x i16>*), align 2
>> >>>> +; SSE-NEXT:    [[TMP3:%.*]] = load <4 x i16>, <4 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 8) to <4 x i16>*), align 2
>> >>>> +; SSE-NEXT:    [[TMP4:%.*]] = load <4 x i16>, <4 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 12) to <4 x i16>*), align 2
>> >>>> +; SSE-NEXT:    [[TMP5:%.*]] = load <4 x i16>, <4 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 16) to <4 x i16>*), align 2
>> >>>> +; SSE-NEXT:    [[TMP6:%.*]] = load <4 x i16>, <4 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 20) to <4 x i16>*), align 2
>> >>>> +; SSE-NEXT:    [[TMP7:%.*]] = load <4 x i16>, <4 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 24) to <4 x i16>*), align 2
>> >>>> +; SSE-NEXT:    [[TMP8:%.*]] = load <4 x i16>, <4 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @a16, i32 0, i64 28) to <4 x i16>*), align 2
>> >>>> +; SSE-NEXT:    [[TMP9:%.*]] = load <4 x i16>, <4 x i16>* bitcast ([32 x i16]* @b16 to <4 x i16>*), align 2
>> >>>> +; SSE-NEXT:    [[TMP10:%.*]] = load <4 x i16>, <4 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 4) to <4 x i16>*), align 2
>> >>>> +; SSE-NEXT:    [[TMP11:%.*]] = load <4 x i16>, <4 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 8) to <4 x i16>*), align 2
>> >>>> +; SSE-NEXT:    [[TMP12:%.*]] = load <4 x i16>, <4 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 12) to <4 x i16>*), align 2
>> >>>> +; SSE-NEXT:    [[TMP13:%.*]] = load <4 x i16>, <4 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 16) to <4 x i16>*), align 2
>> >>>> +; SSE-NEXT:    [[TMP14:%.*]] = load <4 x i16>, <4 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 20) to <4 x i16>*), align 2
>> >>>> +; SSE-NEXT:    [[TMP15:%.*]] = load <4 x i16>, <4 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 24) to <4 x i16>*), align 2
>> >>>> +; SSE-NEXT:    [[TMP16:%.*]] = load <4 x i16>, <4 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @b16, i32 0, i64 28) to <4 x i16>*), align 2
>> >>>> +; SSE-NEXT:    [[TMP17:%.*]] = shl <4 x i16> [[TMP1]], [[TMP9]]
>> >>>> +; SSE-NEXT:    [[TMP18:%.*]] = shl <4 x i16> [[TMP2]], [[TMP10]]
>> >>>> +; SSE-NEXT:    [[TMP19:%.*]] = shl <4 x i16> [[TMP3]], [[TMP11]]
>> >>>> +; SSE-NEXT:    [[TMP20:%.*]] = shl <4 x i16> [[TMP4]], [[TMP12]]
>> >>>> +; SSE-NEXT:    [[TMP21:%.*]] = shl <4 x i16> [[TMP5]], [[TMP13]]
>> >>>> +; SSE-NEXT:    [[TMP22:%.*]] = shl <4 x i16> [[TMP6]], [[TMP14]]
>> >>>> +; SSE-NEXT:    [[TMP23:%.*]] = shl <4 x i16> [[TMP7]], [[TMP15]]
>> >>>> +; SSE-NEXT:    [[TMP24:%.*]] = shl <4 x i16> [[TMP8]], [[TMP16]]
>> >>>> +; SSE-NEXT:    store <4 x i16> [[TMP17]], <4 x i16>* bitcast ([32 x i16]* @c16 to <4 x i16>*), align 2
>> >>>> +; SSE-NEXT:    store <4 x i16> [[TMP18]], <4 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 4) to <4 x i16>*), align 2
>> >>>> +; SSE-NEXT:    store <4 x i16> [[TMP19]], <4 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 8) to <4 x i16>*), align 2
>> >>>> +; SSE-NEXT:    store <4 x i16> [[TMP20]], <4 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 12) to <4 x i16>*), align 2
>> >>>> +; SSE-NEXT:    store <4 x i16> [[TMP21]], <4 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 16) to <4 x i16>*), align 2
>> >>>> +; SSE-NEXT:    store <4 x i16> [[TMP22]], <4 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 20) to <4 x i16>*), align 2
>> >>>> +; SSE-NEXT:    store <4 x i16> [[TMP23]], <4 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 24) to <4 x i16>*), align 2
>> >>>> +; SSE-NEXT:    store <4 x i16> [[TMP24]], <4 x i16>* bitcast (i16* getelementptr inbounds ([32 x i16], [32 x i16]* @c16, i32 0, i64 28) to <4 x i16>*), align 2
>> >>>>  ; SSE-NEXT:    ret void
>> >>>>  ;
>> >>>>  ; AVX-LABEL: @shl_v32i16(
>> >>>>
>> >>>> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/sitofp.ll
>> >>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/sitofp.ll?rev=353923&r1=353922&r2=353923&view=diff
>> >>>> ==============================================================================
>> >>>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/sitofp.ll (original)
>> >>>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/sitofp.ll Wed Feb 13 00:26:43 2019
>> >>>> @@ -598,14 +598,35 @@ define void @sitofp_8i8_8f64() #0 {
>> >>>>  ;
>> >>>>
>> >>>>  define void @sitofp_2i64_2f32() #0 {
>> >>>> -; CHECK-LABEL: @sitofp_2i64_2f32(
>> >>>> -; CHECK-NEXT:    [[LD0:%.*]] = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64
>> >>>> -; CHECK-NEXT:    [[LD1:%.*]] = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8
>> >>>> -; CHECK-NEXT:    [[CVT0:%.*]] = sitofp i64 [[LD0]] to float
>> >>>> -; CHECK-NEXT:    [[CVT1:%.*]] = sitofp i64 [[LD1]] to float
>> >>>> -; CHECK-NEXT:    store float [[CVT0]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64
>> >>>> -; CHECK-NEXT:    store float [[CVT1]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4
>> >>>> -; CHECK-NEXT:    ret void
>> >>>> +; SSE-LABEL: @sitofp_2i64_2f32(
>> >>>> +; SSE-NEXT:    [[LD0:%.*]] = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64
>> >>>> +; SSE-NEXT:    [[LD1:%.*]] = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8
>> >>>> +; SSE-NEXT:    [[CVT0:%.*]] = sitofp i64 [[LD0]] to float
>> >>>> +; SSE-NEXT:    [[CVT1:%.*]] = sitofp i64 [[LD1]] to float
>> >>>> +; SSE-NEXT:    store float [[CVT0]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64
>> >>>> +; SSE-NEXT:    store float [[CVT1]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4
>> >>>> +; SSE-NEXT:    ret void
>> >>>> +;
>> >>>> +; AVX256NODQ-LABEL: @sitofp_2i64_2f32(
>> >>>> +; AVX256NODQ-NEXT:    [[LD0:%.*]] = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64
>> >>>> +; AVX256NODQ-NEXT:    [[LD1:%.*]] = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8
>> >>>> +; AVX256NODQ-NEXT:    [[CVT0:%.*]] = sitofp i64 [[LD0]] to float
>> >>>> +; AVX256NODQ-NEXT:    [[CVT1:%.*]] = sitofp i64 [[LD1]] to float
>> >>>> +; AVX256NODQ-NEXT:    store float [[CVT0]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64
>> >>>> +; AVX256NODQ-NEXT:    store float [[CVT1]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4
>> >>>> +; AVX256NODQ-NEXT:    ret void
>> >>>> +;
>> >>>> +; AVX512-LABEL: @sitofp_2i64_2f32(
>> >>>> +; AVX512-NEXT:    [[TMP1:%.*]] = load <2 x i64>, <2 x i64>* bitcast ([8 x i64]* @src64 to <2 x i64>*), align 64
>> >>>> +; AVX512-NEXT:    [[TMP2:%.*]] = sitofp <2 x i64> [[TMP1]] to <2 x float>
>> >>>> +; AVX512-NEXT:    store <2 x float> [[TMP2]], <2 x float>* bitcast ([16 x float]* @dst32 to <2 x float>*), align 64
>> >>>> +; AVX512-NEXT:    ret void
>> >>>> +;
>> >>>> +; AVX256DQ-LABEL: @sitofp_2i64_2f32(
>> >>>> +; AVX256DQ-NEXT:    [[TMP1:%.*]] = load <2 x i64>, <2 x i64>* bitcast ([8 x i64]* @src64 to <2 x i64>*), align 64
>> >>>> +; AVX256DQ-NEXT:    [[TMP2:%.*]] = sitofp <2 x i64> [[TMP1]] to <2 x float>
>> >>>> +; AVX256DQ-NEXT:    store <2 x float> [[TMP2]], <2 x float>* bitcast ([16 x float]* @dst32 to <2 x float>*), align 64
>> >>>> +; AVX256DQ-NEXT:    ret void
>> >>>>  ;
>> >>>>    %ld0 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64
>> >>>>    %ld1 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8
>> >>>>
>> >>>> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/tiny-tree.ll
>> >>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/tiny-tree.ll?rev=353923&r1=353922&r2=353923&view=diff
>> >>>> ==============================================================================
>> >>>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/tiny-tree.ll (original)
>> >>>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/tiny-tree.ll Wed Feb 13 00:26:43 2019
>> >>>> @@ -172,13 +172,13 @@ define void @tiny_tree_not_fully_vectori
>> >>>>  ; CHECK-NEXT:    [[ARRAYIDX3:%.*]] = getelementptr inbounds float, float* [[DST_ADDR_022]], i64 1
>> >>>>  ; CHECK-NEXT:    store float [[TMP1]], float* [[ARRAYIDX3]], align 4
>> >>>>  ; CHECK-NEXT:    [[ARRAYIDX4:%.*]] = getelementptr inbounds float, float* [[SRC_ADDR_021]], i64 2
>> >>>> -; CHECK-NEXT:    [[TMP2:%.*]] = load float, float* [[ARRAYIDX4]], align 4
>> >>>>  ; CHECK-NEXT:    [[ARRAYIDX5:%.*]] = getelementptr inbounds float, float* [[DST_ADDR_022]], i64 2
>> >>>> -; CHECK-NEXT:    store float [[TMP2]], float* [[ARRAYIDX5]], align 4
>> >>>>  ; CHECK-NEXT:    [[ARRAYIDX6:%.*]] = getelementptr inbounds float, float* [[SRC_ADDR_021]], i64 3
>> >>>> -; CHECK-NEXT:    [[TMP3:%.*]] = load float, float* [[ARRAYIDX6]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP2:%.*]] = bitcast float* [[ARRAYIDX4]] to <2 x float>*
>> >>>> +; CHECK-NEXT:    [[TMP3:%.*]] = load <2 x float>, <2 x float>* [[TMP2]], align 4
>> >>>>  ; CHECK-NEXT:    [[ARRAYIDX7:%.*]] = getelementptr inbounds float, float* [[DST_ADDR_022]], i64 3
>> >>>> -; CHECK-NEXT:    store float [[TMP3]], float* [[ARRAYIDX7]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP4:%.*]] = bitcast float* [[ARRAYIDX5]] to <2 x float>*
>> >>>> +; CHECK-NEXT:    store <2 x float> [[TMP3]], <2 x float>* [[TMP4]], align 4
>> >>>>  ; CHECK-NEXT:    [[ADD_PTR]] = getelementptr inbounds float, float* [[SRC_ADDR_021]], i64 [[I_023]]
>> >>>>  ; CHECK-NEXT:    [[ADD_PTR8]] = getelementptr inbounds float, float* [[DST_ADDR_022]], i64 [[I_023]]
>> >>>>  ; CHECK-NEXT:    [[INC]] = add i64 [[I_023]], 1
>> >>>>
>> >>>> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/uitofp.ll
>> >>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/uitofp.ll?rev=353923&r1=353922&r2=353923&view=diff
>> >>>> ==============================================================================
>> >>>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/uitofp.ll (original)
>> >>>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/uitofp.ll Wed Feb 13 00:26:43 2019
>> >>>> @@ -550,14 +550,35 @@ define void @uitofp_8i8_8f64() #0 {
>> >>>>  ;
>> >>>>
>> >>>>  define void @uitofp_2i64_2f32() #0 {
>> >>>> -; CHECK-LABEL: @uitofp_2i64_2f32(
>> >>>> -; CHECK-NEXT:    [[LD0:%.*]] = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64
>> >>>> -; CHECK-NEXT:    [[LD1:%.*]] = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8
>> >>>> -; CHECK-NEXT:    [[CVT0:%.*]] = uitofp i64 [[LD0]] to float
>> >>>> -; CHECK-NEXT:    [[CVT1:%.*]] = uitofp i64 [[LD1]] to float
>> >>>> -; CHECK-NEXT:    store float [[CVT0]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64
>> >>>> -; CHECK-NEXT:    store float [[CVT1]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4
>> >>>> -; CHECK-NEXT:    ret void
>> >>>> +; SSE-LABEL: @uitofp_2i64_2f32(
>> >>>> +; SSE-NEXT:    [[LD0:%.*]] = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64
>> >>>> +; SSE-NEXT:    [[LD1:%.*]] = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8
>> >>>> +; SSE-NEXT:    [[CVT0:%.*]] = uitofp i64 [[LD0]] to float
>> >>>> +; SSE-NEXT:    [[CVT1:%.*]] = uitofp i64 [[LD1]] to float
>> >>>> +; SSE-NEXT:    store float [[CVT0]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64
>> >>>> +; SSE-NEXT:    store float [[CVT1]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4
>> >>>> +; SSE-NEXT:    ret void
>> >>>> +;
>> >>>> +; AVX256NODQ-LABEL: @uitofp_2i64_2f32(
>> >>>> +; AVX256NODQ-NEXT:    [[LD0:%.*]] = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64
>> >>>> +; AVX256NODQ-NEXT:    [[LD1:%.*]] = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8
>> >>>> +; AVX256NODQ-NEXT:    [[CVT0:%.*]] = uitofp i64 [[LD0]] to float
>> >>>> +; AVX256NODQ-NEXT:    [[CVT1:%.*]] = uitofp i64 [[LD1]] to float
>> >>>> +; AVX256NODQ-NEXT:    store float [[CVT0]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 0), align 64
>> >>>> +; AVX256NODQ-NEXT:    store float [[CVT1]], float* getelementptr inbounds ([16 x float], [16 x float]* @dst32, i32 0, i64 1), align 4
>> >>>> +; AVX256NODQ-NEXT:    ret void
>> >>>> +;
>> >>>> +; AVX512-LABEL: @uitofp_2i64_2f32(
>> >>>> +; AVX512-NEXT:    [[TMP1:%.*]] = load <2 x i64>, <2 x i64>* bitcast ([8 x i64]* @src64 to <2 x i64>*), align 64
>> >>>> +; AVX512-NEXT:    [[TMP2:%.*]] = uitofp <2 x i64> [[TMP1]] to <2 x float>
>> >>>> +; AVX512-NEXT:    store <2 x float> [[TMP2]], <2 x float>* bitcast ([16 x float]* @dst32 to <2 x float>*), align 64
>> >>>> +; AVX512-NEXT:    ret void
>> >>>> +;
>> >>>> +; AVX256DQ-LABEL: @uitofp_2i64_2f32(
>> >>>> +; AVX256DQ-NEXT:    [[TMP1:%.*]] = load <2 x i64>, <2 x i64>* bitcast ([8 x i64]* @src64 to <2 x i64>*), align 64
>> >>>> +; AVX256DQ-NEXT:    [[TMP2:%.*]] = uitofp <2 x i64> [[TMP1]] to <2 x float>
>> >>>> +; AVX256DQ-NEXT:    store <2 x float> [[TMP2]], <2 x float>* bitcast ([16 x float]* @dst32 to <2 x float>*), align 64
>> >>>> +; AVX256DQ-NEXT:    ret void
>> >>>>  ;
>> >>>>    %ld0 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 0), align 64
>> >>>>    %ld1 = load i64, i64* getelementptr inbounds ([8 x i64], [8 x i64]* @src64, i32 0, i64 1), align 8
>> >>>>
>> >>>> Added: llvm/trunk/test/Transforms/SLPVectorizer/X86/vec-reg-64bit.ll
>> >>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/vec-reg-64bit.ll?rev=353923&view=auto
>> >>>> ==============================================================================
>> >>>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/vec-reg-64bit.ll (added)
>> >>>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/vec-reg-64bit.ll Wed Feb 13 00:26:43 2019
>> >>>> @@ -0,0 +1,51 @@
>> >>>> +; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
>> >>>> +; RUN: opt < %s -basicaa -slp-vectorizer -mcpu=btver2 -S | FileCheck %s --check-prefix=VECT
>> >>>> +; RUN: opt < %s -basicaa -slp-vectorizer -mcpu=btver2 -slp-min-reg-size=128 -S | FileCheck %s --check-prefix=NOVECT
>> >>>> +
>> >>>> +; Check SLPVectorizer works for packed horizontal 128-bit instrs.
>> >>>> +; See llvm.org/PR32433
>> >>>> +
>> >>>> +target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
>> >>>> +target triple = "x86_64-unknown-linux-gnu"
>> >>>> +
>> >>>> +define void @add_pairs_128(<4 x float>, float* nocapture) #0 {
>> >>>> +; VECT-LABEL: @add_pairs_128(
>> >>>> +; VECT-NEXT:    [[TMP3:%.*]] = extractelement <4 x float> [[TMP0:%.*]], i32 0
>> >>>> +; VECT-NEXT:    [[TMP4:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
>> >>>> +; VECT-NEXT:    [[TMP5:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
>> >>>> +; VECT-NEXT:    [[TMP6:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
>> >>>> +; VECT-NEXT:    [[TMP7:%.*]] = insertelement <2 x float> undef, float [[TMP3]], i32 0
>> >>>> +; VECT-NEXT:    [[TMP8:%.*]] = insertelement <2 x float> [[TMP7]], float [[TMP5]], i32 1
>> >>>> +; VECT-NEXT:    [[TMP9:%.*]] = insertelement <2 x float> undef, float [[TMP4]], i32 0
>> >>>> +; VECT-NEXT:    [[TMP10:%.*]] = insertelement <2 x float> [[TMP9]], float [[TMP6]], i32 1
>> >>>> +; VECT-NEXT:    [[TMP11:%.*]] = fadd <2 x float> [[TMP8]], [[TMP10]]
>> >>>> +; VECT-NEXT:    [[TMP12:%.*]] = getelementptr inbounds float, float* [[TMP1:%.*]], i64 1
>> >>>> +; VECT-NEXT:    [[TMP13:%.*]] = bitcast float* [[TMP1]] to <2 x float>*
>> >>>> +; VECT-NEXT:    store <2 x float> [[TMP11]], <2 x float>* [[TMP13]], align 4
>> >>>> +; VECT-NEXT:    ret void
>> >>>> +;
>> >>>> +; NOVECT-LABEL: @add_pairs_128(
>> >>>> +; NOVECT-NEXT:    [[TMP3:%.*]] = extractelement <4 x float> [[TMP0:%.*]], i32 0
>> >>>> +; NOVECT-NEXT:    [[TMP4:%.*]] = extractelement <4 x float> [[TMP0]], i32 1
>> >>>> +; NOVECT-NEXT:    [[TMP5:%.*]] = fadd float [[TMP3]], [[TMP4]]
>> >>>> +; NOVECT-NEXT:    store float [[TMP5]], float* [[TMP1:%.*]], align 4
>> >>>> +; NOVECT-NEXT:    [[TMP6:%.*]] = extractelement <4 x float> [[TMP0]], i32 2
>> >>>> +; NOVECT-NEXT:    [[TMP7:%.*]] = extractelement <4 x float> [[TMP0]], i32 3
>> >>>> +; NOVECT-NEXT:    [[TMP8:%.*]] = fadd float [[TMP6]], [[TMP7]]
>> >>>> +; NOVECT-NEXT:    [[TMP9:%.*]] = getelementptr inbounds float, float* [[TMP1]], i64 1
>> >>>> +; NOVECT-NEXT:    store float [[TMP8]], float* [[TMP9]], align 4
>> >>>> +; NOVECT-NEXT:    ret void
>> >>>> +;
>> >>>> +  %3 = extractelement <4 x float> %0, i32 0
>> >>>> +  %4 = extractelement <4 x float> %0, i32 1
>> >>>> +  %5 = fadd float %3, %4
>> >>>> +  store float %5, float* %1, align 4
>> >>>> +  %6 = extractelement <4 x float> %0, i32 2
>> >>>> +  %7 = extractelement <4 x float> %0, i32 3
>> >>>> +  %8 = fadd float %6, %7
>> >>>> +  %9 = getelementptr inbounds float, float* %1, i64 1
>> >>>> +  store float %8, float* %9, align 4
>> >>>> +  ret void
>> >>>> +}
>> >>>> +
>> >>>> +attributes #0 = { nounwind }
>> >>>>
>> >>>> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/vect_copyable_in_binops.ll
>> >>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/vect_copyable_in_binops.ll?rev=353923&r1=353922&r2=353923&view=diff
>> >>>> ==============================================================================
>> >>>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/vect_copyable_in_binops.ll (original)
>> >>>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/vect_copyable_in_binops.ll Wed Feb 13 00:26:43 2019
>> >>>> @@ -47,17 +47,16 @@ define void @add1(i32* noalias %dst, i32
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR1:%.*]] = getelementptr inbounds i32, i32* [[DST:%.*]], i64 1
>> >>>>  ; CHECK-NEXT:    store i32 [[TMP0]], i32* [[DST]], align 4
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR2:%.*]] = getelementptr inbounds i32, i32* [[SRC]], i64 2
>> >>>> -; CHECK-NEXT:    [[TMP1:%.*]] = load i32, i32* [[INCDEC_PTR]], align 4
>> >>>> -; CHECK-NEXT:    [[ADD3:%.*]] = add nsw i32 [[TMP1]], 1
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR4:%.*]] = getelementptr inbounds i32, i32* [[DST]], i64 2
>> >>>> -; CHECK-NEXT:    store i32 [[ADD3]], i32* [[INCDEC_PTR1]], align 4
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR5:%.*]] = getelementptr inbounds i32, i32* [[SRC]], i64 3
>> >>>> -; CHECK-NEXT:    [[TMP2:%.*]] = load i32, i32* [[INCDEC_PTR2]], align 4
>> >>>> -; CHECK-NEXT:    [[ADD6:%.*]] = add nsw i32 [[TMP2]], 2
>> >>>> +; CHECK-NEXT:    [[TMP1:%.*]] = bitcast i32* [[INCDEC_PTR]] to <2 x i32>*
>> >>>> +; CHECK-NEXT:    [[TMP2:%.*]] = load <2 x i32>, <2 x i32>* [[TMP1]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP3:%.*]] = add nsw <2 x i32> <i32 1, i32 2>, [[TMP2]]
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR7:%.*]] = getelementptr inbounds i32, i32* [[DST]], i64 3
>> >>>> -; CHECK-NEXT:    store i32 [[ADD6]], i32* [[INCDEC_PTR4]], align 4
>> >>>> -; CHECK-NEXT:    [[TMP3:%.*]] = load i32, i32* [[INCDEC_PTR5]], align 4
>> >>>> -; CHECK-NEXT:    [[ADD9:%.*]] = add nsw i32 [[TMP3]], 3
>> >>>> +; CHECK-NEXT:    [[TMP4:%.*]] = bitcast i32* [[INCDEC_PTR1]] to <2 x i32>*
>> >>>> +; CHECK-NEXT:    store <2 x i32> [[TMP3]], <2 x i32>* [[TMP4]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP5:%.*]] = load i32, i32* [[INCDEC_PTR5]], align 4
>> >>>> +; CHECK-NEXT:    [[ADD9:%.*]] = add nsw i32 [[TMP5]], 3
>> >>>>  ; CHECK-NEXT:    store i32 [[ADD9]], i32* [[INCDEC_PTR7]], align 4
>> >>>>  ; CHECK-NEXT:    ret void
>> >>>>  ;
>> >>>> @@ -95,13 +94,12 @@ define void @sub0(i32* noalias %dst, i32
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR3:%.*]] = getelementptr inbounds i32, i32* [[DST]], i64 2
>> >>>>  ; CHECK-NEXT:    store i32 [[TMP1]], i32* [[INCDEC_PTR1]], align 4
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR4:%.*]] = getelementptr inbounds i32, i32* [[SRC]], i64 3
>> >>>> -; CHECK-NEXT:    [[TMP2:%.*]] = load i32, i32* [[INCDEC_PTR2]], align 4
>> >>>> -; CHECK-NEXT:    [[SUB5:%.*]] = add nsw i32 [[TMP2]], -2
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR6:%.*]] = getelementptr inbounds i32, i32* [[DST]], i64 3
>> >>>> -; CHECK-NEXT:    store i32 [[SUB5]], i32* [[INCDEC_PTR3]], align 4
>> >>>> -; CHECK-NEXT:    [[TMP3:%.*]] = load i32, i32* [[INCDEC_PTR4]], align 4
>> >>>> -; CHECK-NEXT:    [[SUB8:%.*]] = add nsw i32 [[TMP3]], -3
>> >>>> -; CHECK-NEXT:    store i32 [[SUB8]], i32* [[INCDEC_PTR6]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP2:%.*]] = bitcast i32* [[INCDEC_PTR2]] to <2 x i32>*
>> >>>> +; CHECK-NEXT:    [[TMP3:%.*]] = load <2 x i32>, <2 x i32>* [[TMP2]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP4:%.*]] = add nsw <2 x i32> <i32 -2, i32 -3>, [[TMP3]]
>> >>>> +; CHECK-NEXT:    [[TMP5:%.*]] = bitcast i32* [[INCDEC_PTR3]] to <2 x i32>*
>> >>>> +; CHECK-NEXT:    store <2 x i32> [[TMP4]], <2 x i32>* [[TMP5]], align 4
>> >>>>  ; CHECK-NEXT:    ret void
>> >>>>  ;
>> >>>>  entry:
>> >>>> @@ -214,13 +212,14 @@ define void @addsub0(i32* noalias %dst,
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR3:%.*]] = getelementptr inbounds i32, i32* [[DST]], i64 2
>> >>>>  ; CHECK-NEXT:    store i32 [[TMP1]], i32* [[INCDEC_PTR1]], align 4
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR4:%.*]] = getelementptr inbounds i32, i32* [[SRC]], i64 3
>> >>>> -; CHECK-NEXT:    [[TMP2:%.*]] = load i32, i32* [[INCDEC_PTR2]], align 4
>> >>>> -; CHECK-NEXT:    [[SUB5:%.*]] = add nsw i32 [[TMP2]], -2
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR6:%.*]] = getelementptr inbounds i32, i32* [[DST]], i64 3
>> >>>> -; CHECK-NEXT:    store i32 [[SUB5]], i32* [[INCDEC_PTR3]], align 4
>> >>>> -; CHECK-NEXT:    [[TMP3:%.*]] = load i32, i32* [[INCDEC_PTR4]], align 4
>> >>>> -; CHECK-NEXT:    [[SUB8:%.*]] = sub nsw i32 [[TMP3]], -3
>> >>>> -; CHECK-NEXT:    store i32 [[SUB8]], i32* [[INCDEC_PTR6]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP2:%.*]] = bitcast i32* [[INCDEC_PTR2]] to <2 x i32>*
>> >>>> +; CHECK-NEXT:    [[TMP3:%.*]] = load <2 x i32>, <2 x i32>* [[TMP2]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP4:%.*]] = add nsw <2 x i32> [[TMP3]], <i32 -2, i32 -3>
>> >>>> +; CHECK-NEXT:    [[TMP5:%.*]] = sub nsw <2 x i32> [[TMP3]], <i32 -2, i32 -3>
>> >>>> +; CHECK-NEXT:    [[TMP6:%.*]] = shufflevector <2 x i32> [[TMP4]], <2 x i32> [[TMP5]], <2 x i32> <i32 0, i32 3>
>> >>>> +; CHECK-NEXT:    [[TMP7:%.*]] = bitcast i32* [[INCDEC_PTR3]] to <2 x i32>*
>> >>>> +; CHECK-NEXT:    store <2 x i32> [[TMP6]], <2 x i32>* [[TMP7]], align 4
>> >>>>  ; CHECK-NEXT:    ret void
>> >>>>  ;
>> >>>>  entry:
>> >>>> @@ -248,21 +247,22 @@ define void @addsub1(i32* noalias %dst,
>> >>>>  ; CHECK-LABEL: @addsub1(
>> >>>>  ; CHECK-NEXT:  entry:
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR:%.*]] = getelementptr inbounds i32, i32* [[SRC:%.*]], i64 1
>> >>>> -; CHECK-NEXT:    [[TMP0:%.*]] = load i32, i32* [[SRC]], align 4
>> >>>> -; CHECK-NEXT:    [[SUB:%.*]] = add nsw i32 [[TMP0]], -1
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR1:%.*]] = getelementptr inbounds i32, i32* [[DST:%.*]], i64 1
>> >>>> -; CHECK-NEXT:    store i32 [[SUB]], i32* [[DST]], align 4
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR2:%.*]] = getelementptr inbounds i32, i32* [[SRC]], i64 2
>> >>>> -; CHECK-NEXT:    [[TMP1:%.*]] = load i32, i32* [[INCDEC_PTR]], align 4
>> >>>> -; CHECK-NEXT:    [[SUB1:%.*]] = sub nsw i32 [[TMP1]], -1
>> >>>> +; CHECK-NEXT:    [[TMP0:%.*]] = bitcast i32* [[SRC]] to <2 x i32>*
>> >>>> +; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x i32>, <2 x i32>* [[TMP0]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP2:%.*]] = add nsw <2 x i32> [[TMP1]], <i32 -1, i32 -1>
>> >>>> +; CHECK-NEXT:    [[TMP3:%.*]] = sub nsw <2 x i32> [[TMP1]], <i32 -1, i32 -1>
>> >>>> +; CHECK-NEXT:    [[TMP4:%.*]] = shufflevector <2 x i32> [[TMP2]], <2 x i32> [[TMP3]], <2 x i32> <i32 0, i32 3>
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR3:%.*]] = getelementptr inbounds i32, i32* [[DST]], i64 2
>> >>>> -; CHECK-NEXT:    store i32 [[SUB1]], i32* [[INCDEC_PTR1]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP5:%.*]] = bitcast i32* [[DST]] to <2 x i32>*
>> >>>> +; CHECK-NEXT:    store <2 x i32> [[TMP4]], <2 x i32>* [[TMP5]], align 4
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR4:%.*]] = getelementptr inbounds i32, i32* [[SRC]], i64 3
>> >>>> -; CHECK-NEXT:    [[TMP2:%.*]] = load i32, i32* [[INCDEC_PTR2]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP6:%.*]] = load i32, i32* [[INCDEC_PTR2]], align 4
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR6:%.*]] = getelementptr inbounds i32, i32* [[DST]], i64 3
>> >>>> -; CHECK-NEXT:    store i32 [[TMP2]], i32* [[INCDEC_PTR3]], align 4
>> >>>> -; CHECK-NEXT:    [[TMP3:%.*]] = load i32, i32* [[INCDEC_PTR4]], align 4
>> >>>> -; CHECK-NEXT:    [[SUB8:%.*]] = sub nsw i32 [[TMP3]], -3
>> >>>> +; CHECK-NEXT:    store i32 [[TMP6]], i32* [[INCDEC_PTR3]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP7:%.*]] = load i32, i32* [[INCDEC_PTR4]], align 4
>> >>>> +; CHECK-NEXT:    [[SUB8:%.*]] = sub nsw i32 [[TMP7]], -3
>> >>>>  ; CHECK-NEXT:    store i32 [[SUB8]], i32* [[INCDEC_PTR6]], align 4
>> >>>>  ; CHECK-NEXT:    ret void
>> >>>>  ;
>> >>>> @@ -338,17 +338,16 @@ define void @shl0(i32* noalias %dst, i32
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR1:%.*]] = getelementptr inbounds i32, i32* [[DST:%.*]], i64 1
>> >>>>  ; CHECK-NEXT:    store i32 [[TMP0]], i32* [[DST]], align 4
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR2:%.*]] = getelementptr inbounds i32, i32* [[SRC]], i64 2
>> >>>> -; CHECK-NEXT:    [[TMP1:%.*]] = load i32, i32* [[INCDEC_PTR]], align 4
>> >>>> -; CHECK-NEXT:    [[SHL:%.*]] = shl i32 [[TMP1]], 1
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR3:%.*]] = getelementptr inbounds i32, i32* [[DST]], i64 2
>> >>>> -; CHECK-NEXT:    store i32 [[SHL]], i32* [[INCDEC_PTR1]], align 4
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR4:%.*]] = getelementptr inbounds i32, i32* [[SRC]], i64 3
>> >>>> -; CHECK-NEXT:    [[TMP2:%.*]] = load i32, i32* [[INCDEC_PTR2]], align 4
>> >>>> -; CHECK-NEXT:    [[SHL5:%.*]] = shl i32 [[TMP2]], 2
>> >>>> +; CHECK-NEXT:    [[TMP1:%.*]] = bitcast i32* [[INCDEC_PTR]] to <2 x i32>*
>> >>>> +; CHECK-NEXT:    [[TMP2:%.*]] = load <2 x i32>, <2 x i32>* [[TMP1]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP3:%.*]] = shl <2 x i32> [[TMP2]], <i32 1, i32 2>
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR6:%.*]] = getelementptr inbounds i32, i32* [[DST]], i64 3
>> >>>> -; CHECK-NEXT:    store i32 [[SHL5]], i32* [[INCDEC_PTR3]], align 4
>> >>>> -; CHECK-NEXT:    [[TMP3:%.*]] = load i32, i32* [[INCDEC_PTR4]], align 4
>> >>>> -; CHECK-NEXT:    [[SHL8:%.*]] = shl i32 [[TMP3]], 3
>> >>>> +; CHECK-NEXT:    [[TMP4:%.*]] = bitcast i32* [[INCDEC_PTR1]] to <2 x i32>*
>> >>>> +; CHECK-NEXT:    store <2 x i32> [[TMP3]], <2 x i32>* [[TMP4]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP5:%.*]] = load i32, i32* [[INCDEC_PTR4]], align 4
>> >>>> +; CHECK-NEXT:    [[SHL8:%.*]] = shl i32 [[TMP5]], 3
>> >>>>  ; CHECK-NEXT:    store i32 [[SHL8]], i32* [[INCDEC_PTR6]], align 4
>> >>>>  ; CHECK-NEXT:    ret void
>> >>>>  ;
>> >>>> @@ -457,17 +456,16 @@ define void @add1f(float* noalias %dst,
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR1:%.*]] = getelementptr inbounds float, float* [[DST:%.*]], i64 1
>> >>>>  ; CHECK-NEXT:    store float [[TMP0]], float* [[DST]], align 4
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR2:%.*]] = getelementptr inbounds float, float* [[SRC]], i64 2
>> >>>> -; CHECK-NEXT:    [[TMP1:%.*]] = load float, float* [[INCDEC_PTR]], align 4
>> >>>> -; CHECK-NEXT:    [[ADD3:%.*]] = fadd fast float [[TMP1]], 1.000000e+00
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR4:%.*]] = getelementptr inbounds float, float* [[DST]], i64 2
>> >>>> -; CHECK-NEXT:    store float [[ADD3]], float* [[INCDEC_PTR1]], align 4
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR5:%.*]] = getelementptr inbounds float, float* [[SRC]], i64 3
>> >>>> -; CHECK-NEXT:    [[TMP2:%.*]] = load float, float* [[INCDEC_PTR2]], align 4
>> >>>> -; CHECK-NEXT:    [[ADD6:%.*]] = fadd fast float [[TMP2]], 2.000000e+00
>> >>>> +; CHECK-NEXT:    [[TMP1:%.*]] = bitcast float* [[INCDEC_PTR]] to <2 x float>*
>> >>>> +; CHECK-NEXT:    [[TMP2:%.*]] = load <2 x float>, <2 x float>* [[TMP1]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP3:%.*]] = fadd fast <2 x float> <float 1.000000e+00, float 2.000000e+00>, [[TMP2]]
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR7:%.*]] = getelementptr inbounds float, float* [[DST]], i64 3
>> >>>> -; CHECK-NEXT:    store float [[ADD6]], float* [[INCDEC_PTR4]], align 4
>> >>>> -; CHECK-NEXT:    [[TMP3:%.*]] = load float, float* [[INCDEC_PTR5]], align 4
>> >>>> -; CHECK-NEXT:    [[ADD9:%.*]] = fadd fast float [[TMP3]], 3.000000e+00
>> >>>> +; CHECK-NEXT:    [[TMP4:%.*]] = bitcast float* [[INCDEC_PTR1]] to <2 x float>*
>> >>>> +; CHECK-NEXT:    store <2 x float> [[TMP3]], <2 x float>* [[TMP4]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP5:%.*]] = load float, float* [[INCDEC_PTR5]], align 4
>> >>>> +; CHECK-NEXT:    [[ADD9:%.*]] = fadd fast float [[TMP5]], 3.000000e+00
>> >>>>  ; CHECK-NEXT:    store float [[ADD9]], float* [[INCDEC_PTR7]], align 4
>> >>>>  ; CHECK-NEXT:    ret void
>> >>>>  ;
>> >>>> @@ -505,13 +503,12 @@ define void @sub0f(float* noalias %dst,
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR4:%.*]] = getelementptr inbounds float, float* [[DST]], i64 2
>> >>>>  ; CHECK-NEXT:    store float [[TMP1]], float* [[INCDEC_PTR1]], align 4
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR5:%.*]] = getelementptr inbounds float, float* [[SRC]], i64 3
>> >>>> -; CHECK-NEXT:    [[TMP2:%.*]] = load float, float* [[INCDEC_PTR2]], align 4
>> >>>> -; CHECK-NEXT:    [[ADD6:%.*]] = fadd fast float [[TMP2]], -2.000000e+00
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR7:%.*]] = getelementptr inbounds float, float* [[DST]], i64 3
>> >>>> -; CHECK-NEXT:    store float [[ADD6]], float* [[INCDEC_PTR4]], align 4
>> >>>> -; CHECK-NEXT:    [[TMP3:%.*]] = load float, float* [[INCDEC_PTR5]], align 4
>> >>>> -; CHECK-NEXT:    [[ADD9:%.*]] = fadd fast float [[TMP3]], -3.000000e+00
>> >>>> -; CHECK-NEXT:    store float [[ADD9]], float* [[INCDEC_PTR7]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP2:%.*]] = bitcast float* [[INCDEC_PTR2]] to <2 x float>*
>> >>>> +; CHECK-NEXT:    [[TMP3:%.*]] = load <2 x float>, <2 x float>* [[TMP2]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP4:%.*]] = fadd fast <2 x float> <float -2.000000e+00, float -3.000000e+00>, [[TMP3]]
>> >>>> +; CHECK-NEXT:    [[TMP5:%.*]] = bitcast float* [[INCDEC_PTR4]] to <2 x float>*
>> >>>> +; CHECK-NEXT:    store <2 x float> [[TMP4]], <2 x float>* [[TMP5]], align 4
>> >>>>  ; CHECK-NEXT:    ret void
>> >>>>  ;
>> >>>>  entry:
>> >>>> @@ -624,13 +621,14 @@ define void @addsub0f(float* noalias %ds
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR3:%.*]] = getelementptr inbounds float, float* [[DST]], i64 2
>> >>>>  ; CHECK-NEXT:    store float [[TMP1]], float* [[INCDEC_PTR1]], align 4
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR4:%.*]] = getelementptr inbounds float, float* [[SRC]], i64 3
>> >>>> -; CHECK-NEXT:    [[TMP2:%.*]] = load float, float* [[INCDEC_PTR2]], align 4
>> >>>> -; CHECK-NEXT:    [[SUB5:%.*]] = fadd fast float [[TMP2]], -2.000000e+00
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR6:%.*]] = getelementptr inbounds float, float* [[DST]], i64 3
>> >>>> -; CHECK-NEXT:    store float [[SUB5]], float* [[INCDEC_PTR3]], align 4
>> >>>> -; CHECK-NEXT:    [[TMP3:%.*]] = load float, float* [[INCDEC_PTR4]], align 4
>> >>>> -; CHECK-NEXT:    [[SUB8:%.*]] = fsub fast float [[TMP3]], -3.000000e+00
>> >>>> -; CHECK-NEXT:    store float [[SUB8]], float* [[INCDEC_PTR6]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP2:%.*]] = bitcast float* [[INCDEC_PTR2]] to <2 x float>*
>> >>>> +; CHECK-NEXT:    [[TMP3:%.*]] = load <2 x float>, <2 x float>* [[TMP2]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP4:%.*]] = fadd fast <2 x float> [[TMP3]], <float -2.000000e+00, float -3.000000e+00>
>> >>>> +; CHECK-NEXT:    [[TMP5:%.*]] = fsub fast <2 x float> [[TMP3]], <float -2.000000e+00, float -3.000000e+00>
>> >>>> +; CHECK-NEXT:    [[TMP6:%.*]] = shufflevector <2 x float> [[TMP4]], <2 x float> [[TMP5]], <2 x i32> <i32 0, i32 3>
>> >>>> +; CHECK-NEXT:    [[TMP7:%.*]] = bitcast float* [[INCDEC_PTR3]] to <2 x float>*
>> >>>> +; CHECK-NEXT:    store <2 x float> [[TMP6]], <2 x float>* [[TMP7]], align 4
>> >>>>  ; CHECK-NEXT:    ret void
>> >>>>  ;
>> >>>>  entry:
>> >>>> @@ -658,21 +656,22 @@ define void @addsub1f(float* noalias %ds
>> >>>>  ; CHECK-LABEL: @addsub1f(
>> >>>>  ; CHECK-NEXT:  entry:
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR:%.*]] = getelementptr inbounds float, float* [[SRC:%.*]], i64 1
>> >>>> -; CHECK-NEXT:    [[TMP0:%.*]] = load float, float* [[SRC]], align 4
>> >>>> -; CHECK-NEXT:    [[SUB:%.*]] = fadd fast float [[TMP0]], -1.000000e+00
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR1:%.*]] = getelementptr inbounds float, float* [[DST:%.*]], i64 1
>> >>>> -; CHECK-NEXT:    store float [[SUB]], float* [[DST]], align 4
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR2:%.*]] = getelementptr inbounds float, float* [[SRC]], i64 2
>> >>>> -; CHECK-NEXT:    [[TMP1:%.*]] = load float, float* [[INCDEC_PTR]], align 4
>> >>>> -; CHECK-NEXT:    [[SUB1:%.*]] = fsub fast float [[TMP1]], -1.000000e+00
>> >>>> +; CHECK-NEXT:    [[TMP0:%.*]] = bitcast float* [[SRC]] to <2 x float>*
>> >>>> +; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x float>, <2 x float>* [[TMP0]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP2:%.*]] = fadd fast <2 x float> [[TMP1]], <float -1.000000e+00, float -1.000000e+00>
>> >>>> +; CHECK-NEXT:    [[TMP3:%.*]] = fsub fast <2 x float> [[TMP1]], <float -1.000000e+00, float -1.000000e+00>
>> >>>> +; CHECK-NEXT:    [[TMP4:%.*]] = shufflevector <2 x float> [[TMP2]], <2 x float> [[TMP3]], <2 x i32> <i32 0, i32 3>
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR3:%.*]] = getelementptr inbounds float, float* [[DST]], i64 2
>> >>>> -; CHECK-NEXT:    store float [[SUB1]], float* [[INCDEC_PTR1]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP5:%.*]] = bitcast float* [[DST]] to <2 x float>*
>> >>>> +; CHECK-NEXT:    store <2 x float> [[TMP4]], <2 x float>* [[TMP5]], align 4
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR4:%.*]] = getelementptr inbounds float, float* [[SRC]], i64 3
>> >>>> -; CHECK-NEXT:    [[TMP2:%.*]] = load float, float* [[INCDEC_PTR2]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP6:%.*]] = load float, float* [[INCDEC_PTR2]], align 4
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR6:%.*]] = getelementptr inbounds float, float* [[DST]], i64 3
>> >>>> -; CHECK-NEXT:    store float [[TMP2]], float* [[INCDEC_PTR3]], align 4
>> >>>> -; CHECK-NEXT:    [[TMP3:%.*]] = load float, float* [[INCDEC_PTR4]], align 4
>> >>>> -; CHECK-NEXT:    [[SUB8:%.*]] = fsub fast float [[TMP3]], -3.000000e+00
>> >>>> +; CHECK-NEXT:    store float [[TMP6]], float* [[INCDEC_PTR3]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP7:%.*]] = load float, float* [[INCDEC_PTR4]], align 4
>> >>>> +; CHECK-NEXT:    [[SUB8:%.*]] = fsub fast float [[TMP7]], -3.000000e+00
>> >>>>  ; CHECK-NEXT:    store float [[SUB8]], float* [[INCDEC_PTR6]], align 4
>> >>>>  ; CHECK-NEXT:    ret void
>> >>>>  ;
>> >>>> @@ -701,21 +700,20 @@ define void @mulf(float* noalias %dst, f
>> >>>>  ; CHECK-LABEL: @mulf(
>> >>>>  ; CHECK-NEXT:  entry:
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR:%.*]] = getelementptr inbounds float, float* [[SRC:%.*]], i64 1
>> >>>> -; CHECK-NEXT:    [[TMP0:%.*]] = load float, float* [[SRC]], align 4
>> >>>> -; CHECK-NEXT:    [[SUB:%.*]] = fmul fast float [[TMP0]], 2.570000e+02
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR1:%.*]] = getelementptr inbounds float, float* [[DST:%.*]], i64 1
>> >>>> -; CHECK-NEXT:    store float [[SUB]], float* [[DST]], align 4
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR2:%.*]] = getelementptr inbounds float, float* [[SRC]], i64 2
>> >>>> -; CHECK-NEXT:    [[TMP1:%.*]] = load float, float* [[INCDEC_PTR]], align 4
>> >>>> -; CHECK-NEXT:    [[SUB3:%.*]] = fmul fast float [[TMP1]], -3.000000e+00
>> >>>> +; CHECK-NEXT:    [[TMP0:%.*]] = bitcast float* [[SRC]] to <2 x float>*
>> >>>> +; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x float>, <2 x float>* [[TMP0]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP2:%.*]] = fmul fast <2 x float> <float 2.570000e+02, float -3.000000e+00>, [[TMP1]]
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR4:%.*]] = getelementptr inbounds float, float* [[DST]], i64 2
>> >>>> -; CHECK-NEXT:    store float [[SUB3]], float* [[INCDEC_PTR1]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP3:%.*]] = bitcast float* [[DST]] to <2 x float>*
>> >>>> +; CHECK-NEXT:    store <2 x float> [[TMP2]], <2 x float>* [[TMP3]], align 4
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR5:%.*]] = getelementptr inbounds float, float* [[SRC]], i64 3
>> >>>> -; CHECK-NEXT:    [[TMP2:%.*]] = load float, float* [[INCDEC_PTR2]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP4:%.*]] = load float, float* [[INCDEC_PTR2]], align 4
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR7:%.*]] = getelementptr inbounds float, float* [[DST]], i64 3
>> >>>> -; CHECK-NEXT:    store float [[TMP2]], float* [[INCDEC_PTR4]], align 4
>> >>>> -; CHECK-NEXT:    [[TMP3:%.*]] = load float, float* [[INCDEC_PTR5]], align 4
>> >>>> -; CHECK-NEXT:    [[SUB9:%.*]] = fmul fast float [[TMP3]], -9.000000e+00
>> >>>> +; CHECK-NEXT:    store float [[TMP4]], float* [[INCDEC_PTR4]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP5:%.*]] = load float, float* [[INCDEC_PTR5]], align 4
>> >>>> +; CHECK-NEXT:    [[SUB9:%.*]] = fmul fast float [[TMP5]], -9.000000e+00
>> >>>>  ; CHECK-NEXT:    store float [[SUB9]], float* [[INCDEC_PTR7]], align 4
>> >>>>  ; CHECK-NEXT:    ret void
>> >>>>  ;
>> >>>> @@ -786,17 +784,16 @@ define void @add1fn(float* noalias %dst,
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR1:%.*]] = getelementptr inbounds float, float* [[DST:%.*]], i64 1
>> >>>>  ; CHECK-NEXT:    store float [[TMP0]], float* [[DST]], align 4
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR2:%.*]] = getelementptr inbounds float, float* [[SRC]], i64 2
>> >>>> -; CHECK-NEXT:    [[TMP1:%.*]] = load float, float* [[INCDEC_PTR]], align 4
>> >>>> -; CHECK-NEXT:    [[ADD3:%.*]] = fadd float [[TMP1]], 1.000000e+00
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR4:%.*]] = getelementptr inbounds float, float* [[DST]], i64 2
>> >>>> -; CHECK-NEXT:    store float [[ADD3]], float* [[INCDEC_PTR1]], align 4
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR5:%.*]] = getelementptr inbounds float, float* [[SRC]], i64 3
>> >>>> -; CHECK-NEXT:    [[TMP2:%.*]] = load float, float* [[INCDEC_PTR2]], align 4
>> >>>> -; CHECK-NEXT:    [[ADD6:%.*]] = fadd float [[TMP2]], 2.000000e+00
>> >>>> +; CHECK-NEXT:    [[TMP1:%.*]] = bitcast float* [[INCDEC_PTR]] to <2 x float>*
>> >>>> +; CHECK-NEXT:    [[TMP2:%.*]] = load <2 x float>, <2 x float>* [[TMP1]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP3:%.*]] = fadd <2 x float> <float 1.000000e+00, float 2.000000e+00>, [[TMP2]]
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR7:%.*]] = getelementptr inbounds float, float* [[DST]], i64 3
>> >>>> -; CHECK-NEXT:    store float [[ADD6]], float* [[INCDEC_PTR4]], align 4
>> >>>> -; CHECK-NEXT:    [[TMP3:%.*]] = load float, float* [[INCDEC_PTR5]], align 4
>> >>>> -; CHECK-NEXT:    [[ADD9:%.*]] = fadd float [[TMP3]], 3.000000e+00
>> >>>> +; CHECK-NEXT:    [[TMP4:%.*]] = bitcast float* [[INCDEC_PTR1]] to <2 x float>*
>> >>>> +; CHECK-NEXT:    store <2 x float> [[TMP3]], <2 x float>* [[TMP4]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP5:%.*]] = load float, float* [[INCDEC_PTR5]], align 4
>> >>>> +; CHECK-NEXT:    [[ADD9:%.*]] = fadd float [[TMP5]], 3.000000e+00
>> >>>>  ; CHECK-NEXT:    store float [[ADD9]], float* [[INCDEC_PTR7]], align 4
>> >>>>  ; CHECK-NEXT:    ret void
>> >>>>  ;
>> >>>> @@ -834,13 +831,12 @@ define void @sub0fn(float* noalias %dst,
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR4:%.*]] = getelementptr inbounds float, float* [[DST]], i64 2
>> >>>>  ; CHECK-NEXT:    store float [[TMP1]], float* [[INCDEC_PTR1]], align 4
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR5:%.*]] = getelementptr inbounds float, float* [[SRC]], i64 3
>> >>>> -; CHECK-NEXT:    [[TMP2:%.*]] = load float, float* [[INCDEC_PTR2]], align 4
>> >>>> -; CHECK-NEXT:    [[ADD6:%.*]] = fadd float [[TMP2]], -2.000000e+00
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR7:%.*]] = getelementptr inbounds float, float* [[DST]], i64 3
>> >>>> -; CHECK-NEXT:    store float [[ADD6]], float* [[INCDEC_PTR4]], align 4
>> >>>> -; CHECK-NEXT:    [[TMP3:%.*]] = load float, float* [[INCDEC_PTR5]], align 4
>> >>>> -; CHECK-NEXT:    [[ADD9:%.*]] = fadd float [[TMP3]], -3.000000e+00
>> >>>> -; CHECK-NEXT:    store float [[ADD9]], float* [[INCDEC_PTR7]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP2:%.*]] = bitcast float* [[INCDEC_PTR2]] to <2 x float>*
>> >>>> +; CHECK-NEXT:    [[TMP3:%.*]] = load <2 x float>, <2 x float>* [[TMP2]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP4:%.*]] = fadd <2 x float> <float -2.000000e+00, float -3.000000e+00>, [[TMP3]]
>> >>>> +; CHECK-NEXT:    [[TMP5:%.*]] = bitcast float* [[INCDEC_PTR4]] to <2 x float>*
>> >>>> +; CHECK-NEXT:    store <2 x float> [[TMP4]], <2 x float>* [[TMP5]], align 4
>> >>>>  ; CHECK-NEXT:    ret void
>> >>>>  ;
>> >>>>  entry:
>> >>>> @@ -944,21 +940,20 @@ define void @mulfn(float* noalias %dst,
>> >>>>  ; CHECK-LABEL: @mulfn(
>> >>>>  ; CHECK-NEXT:  entry:
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR:%.*]] = getelementptr inbounds float, float* [[SRC:%.*]], i64 1
>> >>>> -; CHECK-NEXT:    [[TMP0:%.*]] = load float, float* [[SRC]], align 4
>> >>>> -; CHECK-NEXT:    [[SUB:%.*]] = fmul float [[TMP0]], 2.570000e+02
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR1:%.*]] = getelementptr inbounds float, float* [[DST:%.*]], i64 1
>> >>>> -; CHECK-NEXT:    store float [[SUB]], float* [[DST]], align 4
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR2:%.*]] = getelementptr inbounds float, float* [[SRC]], i64 2
>> >>>> -; CHECK-NEXT:    [[TMP1:%.*]] = load float, float* [[INCDEC_PTR]], align 4
>> >>>> -; CHECK-NEXT:    [[SUB3:%.*]] = fmul float [[TMP1]], -3.000000e+00
>> >>>> +; CHECK-NEXT:    [[TMP0:%.*]] = bitcast float* [[SRC]] to <2 x float>*
>> >>>> +; CHECK-NEXT:    [[TMP1:%.*]] = load <2 x float>, <2 x float>* [[TMP0]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP2:%.*]] = fmul <2 x float> <float 2.570000e+02, float -3.000000e+00>, [[TMP1]]
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR4:%.*]] = getelementptr inbounds float, float* [[DST]], i64 2
>> >>>> -; CHECK-NEXT:    store float [[SUB3]], float* [[INCDEC_PTR1]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP3:%.*]] = bitcast float* [[DST]] to <2 x float>*
>> >>>> +; CHECK-NEXT:    store <2 x float> [[TMP2]], <2 x float>* [[TMP3]], align 4
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR5:%.*]] = getelementptr inbounds float, float* [[SRC]], i64 3
>> >>>> -; CHECK-NEXT:    [[TMP2:%.*]] = load float, float* [[INCDEC_PTR2]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP4:%.*]] = load float, float* [[INCDEC_PTR2]], align 4
>> >>>>  ; CHECK-NEXT:    [[INCDEC_PTR7:%.*]] = getelementptr inbounds float, float* [[DST]], i64 3
>> >>>> -; CHECK-NEXT:    store float [[TMP2]], float* [[INCDEC_PTR4]], align 4
>> >>>> -; CHECK-NEXT:    [[TMP3:%.*]] = load float, float* [[INCDEC_PTR5]], align 4
>> >>>> -; CHECK-NEXT:    [[SUB9:%.*]] = fmul fast float [[TMP3]], -9.000000e+00
>> >>>> +; CHECK-NEXT:    store float [[TMP4]], float* [[INCDEC_PTR4]], align 4
>> >>>> +; CHECK-NEXT:    [[TMP5:%.*]] = load float, float* [[INCDEC_PTR5]], align 4
>> >>>> +; CHECK-NEXT:    [[SUB9:%.*]] = fmul fast float [[TMP5]], -9.000000e+00
>> >>>>  ; CHECK-NEXT:    store float [[SUB9]], float* [[INCDEC_PTR7]], align 4
>> >>>>  ; CHECK-NEXT:    ret void
>> >>>>  ;
>> >>>>
>> >>>> Modified: llvm/trunk/test/Transforms/SLPVectorizer/X86/zext.ll
>> >>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/SLPVectorizer/X86/zext.ll?rev=353923&r1=353922&r2=353923&view=diff
>> >>>> ==============================================================================
>> >>>> --- llvm/trunk/test/Transforms/SLPVectorizer/X86/zext.ll (original)
>> >>>> +++ llvm/trunk/test/Transforms/SLPVectorizer/X86/zext.ll Wed Feb 13 00:26:43 2019
>> >>>> @@ -682,18 +682,20 @@ define <4 x i64> @loadext_4i32_to_4i64(i
>> >>>>  ; SSE2-NEXT:    [[P1:%.*]] = getelementptr inbounds i32, i32* [[P0:%.*]], i64 1
>> >>>>  ; SSE2-NEXT:    [[P2:%.*]] = getelementptr inbounds i32, i32* [[P0]], i64 2
>> >>>>  ; SSE2-NEXT:    [[P3:%.*]] = getelementptr inbounds i32, i32* [[P0]], i64 3
>> >>>> -; SSE2-NEXT:    [[I0:%.*]] = load i32, i32* [[P0]], align 1
>> >>>> -; SSE2-NEXT:    [[I1:%.*]] = load i32, i32* [[P1]], align 1
>> >>>> -; SSE2-NEXT:    [[I2:%.*]] = load i32, i32* [[P2]], align 1
>> >>>> -; SSE2-NEXT:    [[I3:%.*]] = load i32, i32* [[P3]], align 1
>> >>>> -; SSE2-NEXT:    [[X0:%.*]] = zext i32 [[I0]] to i64
>> >>>> -; SSE2-NEXT:    [[X1:%.*]] = zext i32 [[I1]] to i64
>> >>>> -; SSE2-NEXT:    [[X2:%.*]] = zext i32 [[I2]] to i64
>> >>>> -; SSE2-NEXT:    [[X3:%.*]] = zext i32 [[I3]] to i64
>> >>>> -; SSE2-NEXT:    [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[X0]], i32 0
>> >>>> -; SSE2-NEXT:    [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[X1]], i32 1
>> >>>> -; SSE2-NEXT:    [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
>> >>>> -; SSE2-NEXT:    [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
>> >>>> +; SSE2-NEXT:    [[TMP1:%.*]] = bitcast i32* [[P0]] to <2 x i32>*
>> >>>> +; SSE2-NEXT:    [[TMP2:%.*]] = load <2 x i32>, <2 x i32>* [[TMP1]], align 1
>> >>>> +; SSE2-NEXT:    [[TMP3:%.*]] = bitcast i32* [[P2]] to <2 x i32>*
>> >>>> +; SSE2-NEXT:    [[TMP4:%.*]] = load <2 x i32>, <2 x i32>* [[TMP3]], align 1
>> >>>> +; SSE2-NEXT:    [[TMP5:%.*]] = zext <2 x i32> [[TMP2]] to <2 x i64>
>> >>>> +; SSE2-NEXT:    [[TMP6:%.*]] = zext <2 x i32> [[TMP4]] to <2 x i64>
>> >>>> +; SSE2-NEXT:    [[TMP7:%.*]] = extractelement <2 x i64> [[TMP5]], i32 0
>> >>>> +; SSE2-NEXT:    [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP7]], i32 0
>> >>>> +; SSE2-NEXT:    [[TMP8:%.*]] = extractelement <2 x i64> [[TMP5]], i32 1
>> >>>> +; SSE2-NEXT:    [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP8]], i32 1
>> >>>> +; SSE2-NEXT:    [[TMP9:%.*]] = extractelement <2 x i64> [[TMP6]], i32 0
>> >>>> +; SSE2-NEXT:    [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP9]], i32 2
>> >>>> +; SSE2-NEXT:    [[TMP10:%.*]] = extractelement <2 x i64> [[TMP6]], i32 1
>> >>>> +; SSE2-NEXT:    [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP10]], i32 3
>> >>>>  ; SSE2-NEXT:    ret <4 x i64> [[V3]]
>> >>>>  ;
>> >>>>  ; SLM-LABEL: @loadext_4i32_to_4i64(
>> >>>> @@ -719,17 +721,18 @@ define <4 x i64> @loadext_4i32_to_4i64(i
>> >>>>  ; AVX1-NEXT:    [[P3:%.*]] = getelementptr inbounds i32, i32* [[P0]], i64 3
>> >>>>  ; AVX1-NEXT:    [[TMP1:%.*]] = bitcast i32* [[P0]] to <2 x i32>*
>> >>>>  ; AVX1-NEXT:    [[TMP2:%.*]] = load <2 x i32>, <2 x i32>* [[TMP1]], align 1
>> >>>> -; AVX1-NEXT:    [[I2:%.*]] = load i32, i32* [[P2]], align 1
>> >>>> -; AVX1-NEXT:    [[I3:%.*]] = load i32, i32* [[P3]], align 1
>> >>>> -; AVX1-NEXT:    [[TMP3:%.*]] = zext <2 x i32> [[TMP2]] to <2 x i64>
>> >>>> -; AVX1-NEXT:    [[X2:%.*]] = zext i32 [[I2]] to i64
>> >>>> -; AVX1-NEXT:    [[X3:%.*]] = zext i32 [[I3]] to i64
>> >>>> -; AVX1-NEXT:    [[TMP4:%.*]] = extractelement <2 x i64> [[TMP3]], i32 0
>> >>>> -; AVX1-NEXT:    [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP4]], i32 0
>> >>>> -; AVX1-NEXT:    [[TMP5:%.*]] = extractelement <2 x i64> [[TMP3]], i32 1
>> >>>> -; AVX1-NEXT:    [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP5]], i32 1
>> >>>> -; AVX1-NEXT:    [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[X2]], i32 2
>> >>>> -; AVX1-NEXT:    [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[X3]], i32 3
>> >>>> +; AVX1-NEXT:    [[TMP3:%.*]] = bitcast i32* [[P2]] to <2 x i32>*
>> >>>> +; AVX1-NEXT:    [[TMP4:%.*]] = load <2 x i32>, <2 x i32>* [[TMP3]], align 1
>> >>>> +; AVX1-NEXT:    [[TMP5:%.*]] = zext <2 x i32> [[TMP2]] to <2 x i64>
>> >>>> +; AVX1-NEXT:    [[TMP6:%.*]] = zext <2 x i32> [[TMP4]] to <2 x i64>
>> >>>> +; AVX1-NEXT:    [[TMP7:%.*]] = extractelement <2 x i64> [[TMP5]], i32 0
>> >>>> +; AVX1-NEXT:    [[V0:%.*]] = insertelement <4 x i64> undef, i64 [[TMP7]], i32 0
>> >>>> +; AVX1-NEXT:    [[TMP8:%.*]] = extractelement <2 x i64> [[TMP5]], i32 1
>> >>>> +; AVX1-NEXT:    [[V1:%.*]] = insertelement <4 x i64> [[V0]], i64 [[TMP8]], i32 1
>> >>>> +; AVX1-NEXT:    [[TMP9:%.*]] = extractelement <2 x i64> [[TMP6]], i32 0
>> >>>> +; AVX1-NEXT:    [[V2:%.*]] = insertelement <4 x i64> [[V1]], i64 [[TMP9]], i32 2
>> >>>> +; AVX1-NEXT:    [[TMP10:%.*]] = extractelement <2 x i64> [[TMP6]], i32 1
>> >>>> +; AVX1-NEXT:    [[V3:%.*]] = insertelement <4 x i64> [[V2]], i64 [[TMP10]], i32 3
>> >>>>  ; AVX1-NEXT:    ret <4 x i64> [[V3]]
>> >>>>  ;
>> >>>>  ; AVX2-LABEL: @loadext_4i32_to_4i64(
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> llvm-commits mailing list
>> >>>> llvm-commits at lists.llvm.org
>> >>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>> >
>> > _______________________________________________
>> > llvm-commits mailing list
>> > llvm-commits at lists.llvm.org
>> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits