[llvm] r368183 - Recommit r367901 "[X86] Enable -x86-experimental-vector-widening-legalization by default."

Jordan Rupprecht via llvm-commits llvm-commits at lists.llvm.org
Thu Aug 29 11:44:07 PDT 2019


Your r369628 patch is vindicated, our regressions seem to be coming from
r369664 (assuming this patch affects all the regressed benchmarks we saw)

On Thu, Aug 29, 2019 at 9:33 AM Jordan Rupprecht <rupprecht at google.com>
wrote:

> Hi Craig,
>
> It looks like we don't see any regressions in polybench with a
> release @r369822.
>
> However, we have a new regression (-20% in stanford puzzle/shootout sieve
> in llvm singlesource benchmarks) somewhere between r369600 and r369679. I
> hope it's not r369628 :(
>
> I'll be continuing the root causing today
>
> On Fri, Aug 23, 2019 at 6:42 PM Craig Topper via llvm-commits <
> llvm-commits at lists.llvm.org> wrote:
>
>> Eric,
>>
>> Did that fix recover any of the performance issues? I think it fixed the
>> big issue in polybench.
>>
>> ~Craig
>>
>>
>> On Thu, Aug 22, 2019 at 1:19 AM Craig Topper <craig.topper at gmail.com>
>> wrote:
>>
>>> I just commited r369628 which should hopefully fix trisolv from that set
>>> of tests. I haven't investigated any other tests yet.
>>>
>>> ~Craig
>>>
>>>
>>> On Wed, Aug 21, 2019 at 5:48 PM Eric Christopher <echristo at gmail.com>
>>> wrote:
>>>
>>>> FWIW one set of benchmarks that regressed for us were the polybench
>>>> ones in the testsuite on Haswell and Sandybridge (Skylake was a net
>>>> win iirc). Something immediately public you could take a look at.
>>>>
>>>> -eric
>>>>
>>>> On Tue, Aug 20, 2019 at 10:11 AM Eric Christopher <echristo at gmail.com>
>>>> wrote:
>>>> >
>>>> > Thanks Craig!
>>>> >
>>>> > -eric
>>>> >
>>>> > On Mon, Aug 19, 2019 at 11:59 PM Craig Topper <craig.topper at gmail.com>
>>>> wrote:
>>>> > >
>>>> > > -x86-experimental-vector-widening-legalization command line flag
>>>> has been restored in r369332. It defaults to true which enables the new
>>>> behavior. Setting it to false should hopefully restore the old behavior. I
>>>> can't guarantee that though since we no tests for it. And I have no way of
>>>> knowing if we break anything going foward. So hopefully we can get the
>>>> issues resolved quickly and not have to depend on this for long.
>>>> > >
>>>> > > ~Craig
>>>> > >
>>>> > >
>>>> > > On Mon, Aug 19, 2019 at 10:28 PM Eric Christopher <
>>>> echristo at gmail.com> wrote:
>>>> > >>
>>>> > >> I do apologize, we've only just gotten to performance testing and
>>>> the
>>>> > >> numbers are pretty exciting, but unfortunately many more negatives
>>>> > >> than positives. We'll definitely work with you on testing and
>>>> > >> performance analysis if that will help?
>>>> > >>
>>>> > >> Thanks!
>>>> > >>
>>>> > >> -eric
>>>> > >>
>>>> > >> On Mon, Aug 19, 2019 at 10:23 PM Craig Topper <
>>>> craig.topper at gmail.com> wrote:
>>>> > >> >
>>>> > >> > There have been quite a lot of follow on patches to this. A lot
>>>> of them would need be reverted to get back to the old state. I can start
>>>> trying to put that together.
>>>> > >> >
>>>> > >> > ~Craig
>>>> > >> >
>>>> > >> >
>>>> > >> > On Mon, Aug 19, 2019 at 9:55 PM Eric Christopher via
>>>> llvm-commits <llvm-commits at lists.llvm.org> wrote:
>>>> > >> >>
>>>> > >> >> HI Craig,
>>>> > >> >>
>>>> > >> >> We're seeing a rather lot of performance regressions with this
>>>> enabled
>>>> > >> >> by default. Is it possible to get it turned on under a command
>>>> flag
>>>> > >> >> for the near term while we work on getting you a pile of
>>>> testcases
>>>> > >> >> (some of it is Eigen and those will at least be easier as you
>>>> have
>>>> > >> >> access to that source :)
>>>> > >> >>
>>>> > >> >> Thoughts?
>>>> > >> >>
>>>> > >> >> Thanks!
>>>> > >> >>
>>>> > >> >> -eric
>>>> > >> >>
>>>> > >> >> On Wed, Aug 7, 2019 at 9:23 AM Craig Topper via llvm-commits
>>>> > >> >> <llvm-commits at lists.llvm.org> wrote:
>>>> > >> >> >
>>>> > >> >> > Author: ctopper
>>>> > >> >> > Date: Wed Aug  7 09:24:26 2019
>>>> > >> >> > New Revision: 368183
>>>> > >> >> >
>>>> > >> >> > URL: http://llvm.org/viewvc/llvm-project?rev=368183&view=rev
>>>> > >> >> > Log:
>>>> > >> >> > Recommit r367901 "[X86] Enable
>>>> -x86-experimental-vector-widening-legalization by default."
>>>> > >> >> >
>>>> > >> >> > The assert that caused this to be reverted should be fixed
>>>> now.
>>>> > >> >> >
>>>> > >> >> > Original commit message:
>>>> > >> >> >
>>>> > >> >> > This patch changes our defualt legalization behavior for 16,
>>>> 32, and
>>>> > >> >> > 64 bit vectors with i8/i16/i32/i64 scalar types from
>>>> promotion to
>>>> > >> >> > widening. For example, v8i8 will now be widened to v16i8
>>>> instead of
>>>> > >> >> > promoted to v8i16. This keeps the elements widths the same
>>>> and pads
>>>> > >> >> > with undef elements. We believe this is a better legalization
>>>> strategy.
>>>> > >> >> > But it carries some issues due to the fragmented vector ISA.
>>>> For
>>>> > >> >> > example, i8 shifts and multiplies get widened and then later
>>>> have
>>>> > >> >> > to be promoted/split into vXi16 vectors.
>>>> > >> >> >
>>>> > >> >> > This has the potential to cause regressions so we wanted to
>>>> get
>>>> > >> >> > it in early in the 10.0 cycle so we have plenty of time to
>>>> > >> >> > address them.
>>>> > >> >> >
>>>> > >> >> > Next steps will be to merge tests that explicitly test the
>>>> command
>>>> > >> >> > line option. And then we can remove the option and its
>>>> associated
>>>> > >> >> > code.
>>>> > >> >> >
>>>> > >> >> > Removed:
>>>> > >> >> >     llvm/trunk/test/Analysis/CostModel/X86/reduce-add-widen.ll
>>>> > >> >> > Modified:
>>>> > >> >> >     llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>>>> > >> >> >     llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp
>>>> > >> >> >
>>>>  llvm/trunk/test/Analysis/CostModel/X86/alternate-shuffle-cost.ll
>>>> > >> >> >     llvm/trunk/test/Analysis/CostModel/X86/arith.ll
>>>> > >> >> >     llvm/trunk/test/Analysis/CostModel/X86/cast.ll
>>>> > >> >> >     llvm/trunk/test/Analysis/CostModel/X86/fptosi.ll
>>>> > >> >> >     llvm/trunk/test/Analysis/CostModel/X86/fptoui.ll
>>>> > >> >> >
>>>>  llvm/trunk/test/Analysis/CostModel/X86/masked-intrinsic-cost.ll
>>>> > >> >> >     llvm/trunk/test/Analysis/CostModel/X86/reduce-add.ll
>>>> > >> >> >     llvm/trunk/test/Analysis/CostModel/X86/reduce-and.ll
>>>> > >> >> >     llvm/trunk/test/Analysis/CostModel/X86/reduce-mul.ll
>>>> > >> >> >     llvm/trunk/test/Analysis/CostModel/X86/reduce-or.ll
>>>> > >> >> >     llvm/trunk/test/Analysis/CostModel/X86/reduce-smax.ll
>>>> > >> >> >     llvm/trunk/test/Analysis/CostModel/X86/reduce-smin.ll
>>>> > >> >> >     llvm/trunk/test/Analysis/CostModel/X86/reduce-umax.ll
>>>> > >> >> >     llvm/trunk/test/Analysis/CostModel/X86/reduce-umin.ll
>>>> > >> >> >     llvm/trunk/test/Analysis/CostModel/X86/reduce-xor.ll
>>>> > >> >> >
>>>>  llvm/trunk/test/Analysis/CostModel/X86/shuffle-transpose.ll
>>>> > >> >> >     llvm/trunk/test/Analysis/CostModel/X86/sitofp.ll
>>>> > >> >> >     llvm/trunk/test/Analysis/CostModel/X86/slm-arith-costs.ll
>>>> > >> >> >     llvm/trunk/test/Analysis/CostModel/X86/testshiftashr.ll
>>>> > >> >> >     llvm/trunk/test/Analysis/CostModel/X86/testshiftlshr.ll
>>>> > >> >> >     llvm/trunk/test/Analysis/CostModel/X86/testshiftshl.ll
>>>> > >> >> >     llvm/trunk/test/Analysis/CostModel/X86/uitofp.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/2008-09-05-sinttofp-2xi32.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/2009-06-05-VZextByteShort.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/2011-10-19-LegelizeLoad.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/2011-12-28-vselecti8.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/2011-12-8-bitcastintprom.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/2012-01-18-vbitcast.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/2012-03-15-build_vector_wl.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/2012-07-10-extload64.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/3dnow-intrinsics.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/4char-promote.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/and-load-fold.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/atomic-unordered.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/avg.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/avx-cvt-2.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/avx-fp2int.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/avx2-conversions.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/avx2-masked-gather.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/avx2-vbroadcast.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/avx512-any_extend_load.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/avx512-cvt.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/avx512-ext.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/avx512-intrinsics-upgrade.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/avx512-mask-op.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/avx512-trunc.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/avx512-vec-cmp.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/avx512-vec3-crash.ll
>>>> > >> >> >
>>>>  llvm/trunk/test/CodeGen/X86/avx512bwvl-intrinsics-upgrade.ll
>>>> > >> >> >
>>>>  llvm/trunk/test/CodeGen/X86/avx512vl-intrinsics-fast-isel.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/avx512vl-intrinsics-upgrade.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/bitcast-and-setcc-128.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/bitcast-setcc-128.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/bitcast-vector-bool.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/bitreverse.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/bswap-vector.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/buildvec-insertvec.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/combine-64bit-vec-binop.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/combine-or.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/complex-fastmath.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/cvtv2f32.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/extract-concat.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/extract-insert.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/f16c-intrinsics.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/fold-vector-sext-zext.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/insertelement-shuffle.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/known-bits.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/load-partial.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/lower-bitcast.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/madd.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/masked_compressstore.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/masked_expandload.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/masked_gather_scatter.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/masked_gather_scatter_widen.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/masked_load.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/masked_store.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/masked_store_trunc.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/masked_store_trunc_ssat.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/masked_store_trunc_usat.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/merge-consecutive-loads-256.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/mmx-arg-passing-x86-64.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/mmx-arith.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/mmx-cvt.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/mulvi32.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/oddshuffles.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/oddsubvector.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/pmaddubsw.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/pmulh.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/pointer-vector.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/pr14161.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/pr35918.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/pr40994.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/promote-vec3.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/promote.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/psubus.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/ret-mmx.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/sad.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/sadd_sat_vec.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/scalar_widen_div.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/select.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/shift-combine.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/shrink_vmul.ll
>>>> > >> >> >
>>>>  llvm/trunk/test/CodeGen/X86/shuffle-strided-with-offset-128.ll
>>>> > >> >> >
>>>>  llvm/trunk/test/CodeGen/X86/shuffle-strided-with-offset-256.ll
>>>> > >> >> >
>>>>  llvm/trunk/test/CodeGen/X86/shuffle-strided-with-offset-512.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/shuffle-vs-trunc-128.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/shuffle-vs-trunc-256.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/shuffle-vs-trunc-512.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/slow-pmulld.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/sse2-intrinsics-canonical.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/sse2-vector-shifts.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/ssub_sat_vec.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/test-shrink-bug.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/trunc-ext-ld-st.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/trunc-subvector.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/uadd_sat_vec.ll
>>>> > >> >> >
>>>>  llvm/trunk/test/CodeGen/X86/unfold-masked-merge-vector-variablemask.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/usub_sat_vec.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vec_cast2.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vec_cast3.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vec_ctbits.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vec_extract-mmx.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vec_fp_to_int.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vec_insert-5.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vec_insert-7.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vec_insert-mmx.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vec_int_to_fp.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vec_saddo.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vec_smulo.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vec_ssubo.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vec_uaddo.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vec_umulo.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vec_usubo.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vector-blend.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vector-ext-logic.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vector-gep.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vector-half-conversions.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vector-idiv-v2i32.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vector-narrow-binop.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vector-reduce-add.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vector-reduce-and-bool.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vector-reduce-and.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vector-reduce-mul.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vector-reduce-or-bool.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vector-reduce-or.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vector-reduce-smax.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vector-reduce-smin.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vector-reduce-umax.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vector-reduce-umin.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vector-reduce-xor-bool.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vector-reduce-xor.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vector-sext.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vector-shift-ashr-sub128.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vector-shift-by-select-loop.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vector-shift-lshr-sub128.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vector-shift-shl-sub128.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vector-shuffle-128-v16.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vector-shuffle-combining.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vector-trunc-packus.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vector-trunc-ssat.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vector-trunc-usat.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vector-trunc.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vector-truncate-combine.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vector-zext.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vsel-cmp-load.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vselect-avx.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vselect.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/vshift-4.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/widen_arith-1.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/widen_arith-2.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/widen_arith-3.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/widen_bitops-0.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/widen_cast-1.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/widen_cast-2.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/widen_cast-3.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/widen_cast-4.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/widen_cast-5.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/widen_cast-6.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/widen_compare-1.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/widen_conv-1.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/widen_conv-2.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/widen_conv-3.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/widen_conv-4.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/widen_load-2.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/widen_shuffle-1.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/x86-interleaved-access.ll
>>>> > >> >> >     llvm/trunk/test/CodeGen/X86/x86-shifts.ll
>>>> > >> >> >
>>>>  llvm/trunk/test/Transforms/SLPVectorizer/X86/blending-shuffle.ll
>>>> > >> >> >     llvm/trunk/test/Transforms/SLPVectorizer/X86/fptosi.ll
>>>> > >> >> >     llvm/trunk/test/Transforms/SLPVectorizer/X86/fptoui.ll
>>>> > >> >> >
>>>>  llvm/trunk/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll
>>>> > >> >> >     llvm/trunk/test/Transforms/SLPVectorizer/X86/sitofp.ll
>>>> > >> >> >     llvm/trunk/test/Transforms/SLPVectorizer/X86/uitofp.ll
>>>> > >> >> >
>>>> > >> >> > Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>>>> > >> >> > URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=368183&r1=368182&r2=368183&view=diff
>>>> > >> >> >
>>>> ==============================================================================
>>>> > >> >> > --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
>>>> > >> >> > +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Wed Aug  7
>>>> 09:24:26 2019
>>>> > >> >> > @@ -66,7 +66,7 @@ using namespace llvm;
>>>> > >> >> >  STATISTIC(NumTailCalls, "Number of tail calls");
>>>> > >> >> >
>>>> > >> >> >  static cl::opt<bool> ExperimentalVectorWideningLegalization(
>>>> > >> >> > -    "x86-experimental-vector-widening-legalization",
>>>> cl::init(false),
>>>> > >> >> > +    "x86-experimental-vector-widening-legalization",
>>>> cl::init(true),
>>>> > >> >> >      cl::desc("Enable an experimental vector type
>>>> legalization through widening "
>>>> > >> >> >               "rather than promotion."),
>>>> > >> >> >      cl::Hidden);
>>>> > >> >> > @@ -40453,8 +40453,7 @@ static SDValue combineStore(SDNode
>>>> *N, S
>>>> > >> >> >    bool NoImplicitFloatOps =
>>>> F.hasFnAttribute(Attribute::NoImplicitFloat);
>>>> > >> >> >    bool F64IsLegal =
>>>> > >> >> >        !Subtarget.useSoftFloat() && !NoImplicitFloatOps &&
>>>> Subtarget.hasSSE2();
>>>> > >> >> > -  if (((VT.isVector() && !VT.isFloatingPoint()) ||
>>>> > >> >> > -       (VT == MVT::i64 && F64IsLegal &&
>>>> !Subtarget.is64Bit())) &&
>>>> > >> >> > +  if ((VT == MVT::i64 && F64IsLegal && !Subtarget.is64Bit())
>>>> &&
>>>> > >> >> >        isa<LoadSDNode>(St->getValue()) &&
>>>> > >> >> >        !cast<LoadSDNode>(St->getValue())->isVolatile() &&
>>>> > >> >> >        St->getChain().hasOneUse() && !St->isVolatile()) {
>>>> > >> >> >
>>>> > >> >> > Modified: llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp
>>>> > >> >> > URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp?rev=368183&r1=368182&r2=368183&view=diff
>>>> > >> >> >
>>>> ==============================================================================
>>>> > >> >> > --- llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp
>>>> (original)
>>>> > >> >> > +++ llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp Wed
>>>> Aug  7 09:24:26 2019
>>>> > >> >> > @@ -887,7 +887,7 @@ int X86TTIImpl::getArithmeticInstrCost(
>>>> > >> >> >  int X86TTIImpl::getShuffleCost(TTI::ShuffleKind Kind, Type
>>>> *Tp, int Index,
>>>> > >> >> >                                 Type *SubTp) {
>>>> > >> >> >    // 64-bit packed float vectors (v2f32) are widened to type
>>>> v4f32.
>>>> > >> >> > -  // 64-bit packed integer vectors (v2i32) are promoted to
>>>> type v2i64.
>>>> > >> >> > +  // 64-bit packed integer vectors (v2i32) are widened to
>>>> type v4i32.
>>>> > >> >> >    std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL,
>>>> Tp);
>>>> > >> >> >
>>>> > >> >> >    // Treat Transpose as 2-op shuffles - there's no
>>>> difference in lowering.
>>>> > >> >> > @@ -2425,14 +2425,6 @@ int
>>>> X86TTIImpl::getAddressComputationCos
>>>> > >> >> >
>>>> > >> >> >  int X86TTIImpl::getArithmeticReductionCost(unsigned Opcode,
>>>> Type *ValTy,
>>>> > >> >> >                                             bool IsPairwise) {
>>>> > >> >> > -
>>>> > >> >> > -  std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL,
>>>> ValTy);
>>>> > >> >> > -
>>>> > >> >> > -  MVT MTy = LT.second;
>>>> > >> >> > -
>>>> > >> >> > -  int ISD = TLI->InstructionOpcodeToISD(Opcode);
>>>> > >> >> > -  assert(ISD && "Invalid opcode");
>>>> > >> >> > -
>>>> > >> >> >    // We use the Intel Architecture Code Analyzer(IACA) to
>>>> measure the throughput
>>>> > >> >> >    // and make it as the cost.
>>>> > >> >> >
>>>> > >> >> > @@ -2440,7 +2432,10 @@ int
>>>> X86TTIImpl::getArithmeticReductionCo
>>>> > >> >> >      { ISD::FADD,  MVT::v2f64,   2 },
>>>> > >> >> >      { ISD::FADD,  MVT::v4f32,   4 },
>>>> > >> >> >      { ISD::ADD,   MVT::v2i64,   2 },      // The data
>>>> reported by the IACA tool is "1.6".
>>>> > >> >> > +    { ISD::ADD,   MVT::v2i32,   2 }, // FIXME: chosen to be
>>>> less than v4i32.
>>>> > >> >> >      { ISD::ADD,   MVT::v4i32,   3 },      // The data
>>>> reported by the IACA tool is "3.5".
>>>> > >> >> > +    { ISD::ADD,   MVT::v2i16,   3 }, // FIXME: chosen to be
>>>> less than v4i16
>>>> > >> >> > +    { ISD::ADD,   MVT::v4i16,   4 }, // FIXME: chosen to be
>>>> less than v8i16
>>>> > >> >> >      { ISD::ADD,   MVT::v8i16,   5 },
>>>> > >> >> >    };
>>>> > >> >> >
>>>> > >> >> > @@ -2449,8 +2444,11 @@ int
>>>> X86TTIImpl::getArithmeticReductionCo
>>>> > >> >> >      { ISD::FADD,  MVT::v4f64,   5 },
>>>> > >> >> >      { ISD::FADD,  MVT::v8f32,   7 },
>>>> > >> >> >      { ISD::ADD,   MVT::v2i64,   1 },      // The data
>>>> reported by the IACA tool is "1.5".
>>>> > >> >> > +    { ISD::ADD,   MVT::v2i32,   2 }, // FIXME: chosen to be
>>>> less than v4i32
>>>> > >> >> >      { ISD::ADD,   MVT::v4i32,   3 },      // The data
>>>> reported by the IACA tool is "3.5".
>>>> > >> >> >      { ISD::ADD,   MVT::v4i64,   5 },      // The data
>>>> reported by the IACA tool is "4.8".
>>>> > >> >> > +    { ISD::ADD,   MVT::v2i16,   3 }, // FIXME: chosen to be
>>>> less than v4i16
>>>> > >> >> > +    { ISD::ADD,   MVT::v4i16,   4 }, // FIXME: chosen to be
>>>> less than v8i16
>>>> > >> >> >      { ISD::ADD,   MVT::v8i16,   5 },
>>>> > >> >> >      { ISD::ADD,   MVT::v8i32,   5 },
>>>> > >> >> >    };
>>>> > >> >> > @@ -2459,7 +2457,10 @@ int
>>>> X86TTIImpl::getArithmeticReductionCo
>>>> > >> >> >      { ISD::FADD,  MVT::v2f64,   2 },
>>>> > >> >> >      { ISD::FADD,  MVT::v4f32,   4 },
>>>> > >> >> >      { ISD::ADD,   MVT::v2i64,   2 },      // The data
>>>> reported by the IACA tool is "1.6".
>>>> > >> >> > +    { ISD::ADD,   MVT::v2i32,   2 }, // FIXME: chosen to be
>>>> less than v4i32
>>>> > >> >> >      { ISD::ADD,   MVT::v4i32,   3 },      // The data
>>>> reported by the IACA tool is "3.3".
>>>> > >> >> > +    { ISD::ADD,   MVT::v2i16,   2 },      // The data
>>>> reported by the IACA tool is "4.3".
>>>> > >> >> > +    { ISD::ADD,   MVT::v4i16,   3 },      // The data
>>>> reported by the IACA tool is "4.3".
>>>> > >> >> >      { ISD::ADD,   MVT::v8i16,   4 },      // The data
>>>> reported by the IACA tool is "4.3".
>>>> > >> >> >    };
>>>> > >> >> >
>>>> > >> >> > @@ -2468,12 +2469,47 @@ int
>>>> X86TTIImpl::getArithmeticReductionCo
>>>> > >> >> >      { ISD::FADD,  MVT::v4f64,   3 },
>>>> > >> >> >      { ISD::FADD,  MVT::v8f32,   4 },
>>>> > >> >> >      { ISD::ADD,   MVT::v2i64,   1 },      // The data
>>>> reported by the IACA tool is "1.5".
>>>> > >> >> > +    { ISD::ADD,   MVT::v2i32,   2 }, // FIXME: chosen to be
>>>> less than v4i32
>>>> > >> >> >      { ISD::ADD,   MVT::v4i32,   3 },      // The data
>>>> reported by the IACA tool is "2.8".
>>>> > >> >> >      { ISD::ADD,   MVT::v4i64,   3 },
>>>> > >> >> > +    { ISD::ADD,   MVT::v2i16,   2 },      // The data
>>>> reported by the IACA tool is "4.3".
>>>> > >> >> > +    { ISD::ADD,   MVT::v4i16,   3 },      // The data
>>>> reported by the IACA tool is "4.3".
>>>> > >> >> >      { ISD::ADD,   MVT::v8i16,   4 },
>>>> > >> >> >      { ISD::ADD,   MVT::v8i32,   5 },
>>>> > >> >> >    };
>>>> > >> >> >
>>>> > >> >> > +  int ISD = TLI->InstructionOpcodeToISD(Opcode);
>>>> > >> >> > +  assert(ISD && "Invalid opcode");
>>>> > >> >> > +
>>>> > >> >> > +  // Before legalizing the type, give a chance to look up
>>>> illegal narrow types
>>>> > >> >> > +  // in the table.
>>>> > >> >> > +  // FIXME: Is there a better way to do this?
>>>> > >> >> > +  EVT VT = TLI->getValueType(DL, ValTy);
>>>> > >> >> > +  if (VT.isSimple()) {
>>>> > >> >> > +    MVT MTy = VT.getSimpleVT();
>>>> > >> >> > +    if (IsPairwise) {
>>>> > >> >> > +      if (ST->hasAVX())
>>>> > >> >> > +        if (const auto *Entry =
>>>> CostTableLookup(AVX1CostTblPairWise, ISD, MTy))
>>>> > >> >> > +          return Entry->Cost;
>>>> > >> >> > +
>>>> > >> >> > +      if (ST->hasSSE42())
>>>> > >> >> > +        if (const auto *Entry =
>>>> CostTableLookup(SSE42CostTblPairWise, ISD, MTy))
>>>> > >> >> > +          return Entry->Cost;
>>>> > >> >> > +    } else {
>>>> > >> >> > +      if (ST->hasAVX())
>>>> > >> >> > +        if (const auto *Entry =
>>>> CostTableLookup(AVX1CostTblNoPairWise, ISD, MTy))
>>>> > >> >> > +          return Entry->Cost;
>>>> > >> >> > +
>>>> > >> >> > +      if (ST->hasSSE42())
>>>> > >> >> > +        if (const auto *Entry =
>>>> CostTableLookup(SSE42CostTblNoPairWise, ISD, MTy))
>>>> > >> >> > +          return Entry->Cost;
>>>> > >> >> > +    }
>>>> > >> >> > +  }
>>>> > >> >> > +
>>>> > >> >> > +  std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL,
>>>> ValTy);
>>>> > >> >> > +
>>>> > >> >> > +  MVT MTy = LT.second;
>>>> > >> >> > +
>>>> > >> >> >    if (IsPairwise) {
>>>> > >> >> >      if (ST->hasAVX())
>>>> > >> >> >        if (const auto *Entry =
>>>> CostTableLookup(AVX1CostTblPairWise, ISD, MTy))
>>>> > >> >> >
>>>> > >> >> > Modified:
>>>> llvm/trunk/test/Analysis/CostModel/X86/alternate-shuffle-cost.ll
>>>> > >> >> > URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Analysis/CostModel/X86/alternate-shuffle-cost.ll?rev=368183&r1=368182&r2=368183&view=diff
>>>> > >> >> >
>>>> ==============================================================================
>>>> > >> >> > ---
>>>> llvm/trunk/test/Analysis/CostModel/X86/alternate-shuffle-cost.ll (original)
>>>> > >> >> > +++
>>>> llvm/trunk/test/Analysis/CostModel/X86/alternate-shuffle-cost.ll Wed Aug  7
>>>> 09:24:26 2019
>>>> > >> >> > @@ -18,9 +18,21 @@
>>>> > >> >> >  ; 64-bit packed float vectors (v2f32) are widened to type
>>>> v4f32.
>>>> > >> >> >
>>>> > >> >> >  define <2 x i32> @test_v2i32(<2 x i32> %a, <2 x i32> %b) {
>>>> > >> >> > -; CHECK-LABEL: 'test_v2i32'
>>>> > >> >> > -; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32
>>>> 0, i32 3>
>>>> > >> >> > -; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret <2 x i32> %1
>>>> > >> >> > +; SSE2-LABEL: 'test_v2i32'
>>>> > >> >> > +; SSE2-NEXT:  Cost Model: Found an estimated cost of 2 for
>>>> instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32
>>>> 0, i32 3>
>>>> > >> >> > +; SSE2-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret <2 x i32> %1
>>>> > >> >> > +;
>>>> > >> >> > +; SSSE3-LABEL: 'test_v2i32'
>>>> > >> >> > +; SSSE3-NEXT:  Cost Model: Found an estimated cost of 2 for
>>>> instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32
>>>> 0, i32 3>
>>>> > >> >> > +; SSSE3-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret <2 x i32> %1
>>>> > >> >> > +;
>>>> > >> >> > +; SSE42-LABEL: 'test_v2i32'
>>>> > >> >> > +; SSE42-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32
>>>> 0, i32 3>
>>>> > >> >> > +; SSE42-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret <2 x i32> %1
>>>> > >> >> > +;
>>>> > >> >> > +; AVX-LABEL: 'test_v2i32'
>>>> > >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32
>>>> 0, i32 3>
>>>> > >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret <2 x i32> %1
>>>> > >> >> >  ;
>>>> > >> >> >  ; BTVER2-LABEL: 'test_v2i32'
>>>> > >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32
>>>> 0, i32 3>
>>>> > >> >> > @@ -56,9 +68,21 @@ define <2 x float> @test_v2f32(<2 x floa
>>>> > >> >> >  }
>>>> > >> >> >
>>>> > >> >> >  define <2 x i32> @test_v2i32_2(<2 x i32> %a, <2 x i32> %b) {
>>>> > >> >> > -; CHECK-LABEL: 'test_v2i32_2'
>>>> > >> >> > -; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32
>>>> 2, i32 1>
>>>> > >> >> > -; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret <2 x i32> %1
>>>> > >> >> > +; SSE2-LABEL: 'test_v2i32_2'
>>>> > >> >> > +; SSE2-NEXT:  Cost Model: Found an estimated cost of 2 for
>>>> instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32
>>>> 2, i32 1>
>>>> > >> >> > +; SSE2-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret <2 x i32> %1
>>>> > >> >> > +;
>>>> > >> >> > +; SSSE3-LABEL: 'test_v2i32_2'
>>>> > >> >> > +; SSSE3-NEXT:  Cost Model: Found an estimated cost of 2 for
>>>> instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32
>>>> 2, i32 1>
>>>> > >> >> > +; SSSE3-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret <2 x i32> %1
>>>> > >> >> > +;
>>>> > >> >> > +; SSE42-LABEL: 'test_v2i32_2'
>>>> > >> >> > +; SSE42-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32
>>>> 2, i32 1>
>>>> > >> >> > +; SSE42-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret <2 x i32> %1
>>>> > >> >> > +;
>>>> > >> >> > +; AVX-LABEL: 'test_v2i32_2'
>>>> > >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32
>>>> 2, i32 1>
>>>> > >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret <2 x i32> %1
>>>> > >> >> >  ;
>>>> > >> >> >  ; BTVER2-LABEL: 'test_v2i32_2'
>>>> > >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32
>>>> 2, i32 1>
>>>> > >> >> >
>>>> > >> >> > Modified: llvm/trunk/test/Analysis/CostModel/X86/arith.ll
>>>> > >> >> > URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Analysis/CostModel/X86/arith.ll?rev=368183&r1=368182&r2=368183&view=diff
>>>> > >> >> >
>>>> ==============================================================================
>>>> > >> >> > --- llvm/trunk/test/Analysis/CostModel/X86/arith.ll (original)
>>>> > >> >> > +++ llvm/trunk/test/Analysis/CostModel/X86/arith.ll Wed Aug
>>>> 7 09:24:26 2019
>>>> > >> >> > @@ -1342,36 +1342,32 @@ define i32 @mul(i32 %arg) {
>>>> > >> >> >  ; A <2 x i64> vector multiply is implemented using
>>>> > >> >> >  ; 3 PMULUDQ and 2 PADDS and 4 shifts.
>>>> > >> >> >  define void @mul_2i32() {
>>>> > >> >> > -; SSE-LABEL: 'mul_2i32'
>>>> > >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 8 for
>>>> instruction: %A0 = mul <2 x i32> undef, undef
>>>> > >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret void
>>>> > >> >> > +; SSSE3-LABEL: 'mul_2i32'
>>>> > >> >> > +; SSSE3-NEXT:  Cost Model: Found an estimated cost of 6 for
>>>> instruction: %A0 = mul <2 x i32> undef, undef
>>>> > >> >> > +; SSSE3-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret void
>>>> > >> >> > +;
>>>> > >> >> > +; SSE42-LABEL: 'mul_2i32'
>>>> > >> >> > +; SSE42-NEXT:  Cost Model: Found an estimated cost of 2 for
>>>> instruction: %A0 = mul <2 x i32> undef, undef
>>>> > >> >> > +; SSE42-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret void
>>>> > >> >> >  ;
>>>> > >> >> >  ; AVX-LABEL: 'mul_2i32'
>>>> > >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 8 for
>>>> instruction: %A0 = mul <2 x i32> undef, undef
>>>> > >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 2 for
>>>> instruction: %A0 = mul <2 x i32> undef, undef
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret void
>>>> > >> >> >  ;
>>>> > >> >> > -; AVX512F-LABEL: 'mul_2i32'
>>>> > >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 8
>>>> for instruction: %A0 = mul <2 x i32> undef, undef
>>>> > >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 0
>>>> for instruction: ret void
>>>> > >> >> > -;
>>>> > >> >> > -; AVX512BW-LABEL: 'mul_2i32'
>>>> > >> >> > -; AVX512BW-NEXT:  Cost Model: Found an estimated cost of 8
>>>> for instruction: %A0 = mul <2 x i32> undef, undef
>>>> > >> >> > -; AVX512BW-NEXT:  Cost Model: Found an estimated cost of 0
>>>> for instruction: ret void
>>>> > >> >> > -;
>>>> > >> >> > -; AVX512DQ-LABEL: 'mul_2i32'
>>>> > >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1
>>>> for instruction: %A0 = mul <2 x i32> undef, undef
>>>> > >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 0
>>>> for instruction: ret void
>>>> > >> >> > +; AVX512-LABEL: 'mul_2i32'
>>>> > >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %A0 = mul <2 x i32> undef, undef
>>>> > >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret void
>>>> > >> >> >  ;
>>>> > >> >> >  ; SLM-LABEL: 'mul_2i32'
>>>> > >> >> > -; SLM-NEXT:  Cost Model: Found an estimated cost of 17 for
>>>> instruction: %A0 = mul <2 x i32> undef, undef
>>>> > >> >> > +; SLM-NEXT:  Cost Model: Found an estimated cost of 11 for
>>>> instruction: %A0 = mul <2 x i32> undef, undef
>>>> > >> >> >  ; SLM-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret void
>>>> > >> >> >  ;
>>>> > >> >> >  ; GLM-LABEL: 'mul_2i32'
>>>> > >> >> > -; GLM-NEXT:  Cost Model: Found an estimated cost of 8 for
>>>> instruction: %A0 = mul <2 x i32> undef, undef
>>>> > >> >> > +; GLM-NEXT:  Cost Model: Found an estimated cost of 2 for
>>>> instruction: %A0 = mul <2 x i32> undef, undef
>>>> > >> >> >  ; GLM-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret void
>>>> > >> >> >  ;
>>>> > >> >> >  ; BTVER2-LABEL: 'mul_2i32'
>>>> > >> >> > -; BTVER2-NEXT:  Cost Model: Found an estimated cost of 8 for
>>>> instruction: %A0 = mul <2 x i32> undef, undef
>>>> > >> >> > +; BTVER2-NEXT:  Cost Model: Found an estimated cost of 2 for
>>>> instruction: %A0 = mul <2 x i32> undef, undef
>>>> > >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret void
>>>> > >> >> >  ;
>>>> > >> >> >    %A0 = mul <2 x i32> undef, undef
>>>> > >> >> >
>>>> > >> >> > Modified: llvm/trunk/test/Analysis/CostModel/X86/cast.ll
>>>> > >> >> > URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Analysis/CostModel/X86/cast.ll?rev=368183&r1=368182&r2=368183&view=diff
>>>> > >> >> >
>>>> ==============================================================================
>>>> > >> >> > --- llvm/trunk/test/Analysis/CostModel/X86/cast.ll (original)
>>>> > >> >> > +++ llvm/trunk/test/Analysis/CostModel/X86/cast.ll Wed Aug  7
>>>> 09:24:26 2019
>>>> > >> >> > @@ -315,10 +315,10 @@ define void @sitofp4(<4 x i1> %a, <4 x i
>>>> > >> >> >  ; SSE-LABEL: 'sitofp4'
>>>> > >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 5 for
>>>> instruction: %A1 = sitofp <4 x i1> %a to <4 x float>
>>>> > >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 40 for
>>>> instruction: %A2 = sitofp <4 x i1> %a to <4 x double>
>>>> > >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 5 for
>>>> instruction: %B1 = sitofp <4 x i8> %b to <4 x float>
>>>> > >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 40 for
>>>> instruction: %B2 = sitofp <4 x i8> %b to <4 x double>
>>>> > >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 5 for
>>>> instruction: %C1 = sitofp <4 x i16> %c to <4 x float>
>>>> > >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 40 for
>>>> instruction: %C2 = sitofp <4 x i16> %c to <4 x double>
>>>> > >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 8 for
>>>> instruction: %B1 = sitofp <4 x i8> %b to <4 x float>
>>>> > >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 160 for
>>>> instruction: %B2 = sitofp <4 x i8> %b to <4 x double>
>>>> > >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 15 for
>>>> instruction: %C1 = sitofp <4 x i16> %c to <4 x float>
>>>> > >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 80 for
>>>> instruction: %C2 = sitofp <4 x i16> %c to <4 x double>
>>>> > >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 5 for
>>>> instruction: %D1 = sitofp <4 x i32> %d to <4 x float>
>>>> > >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 40 for
>>>> instruction: %D2 = sitofp <4 x i32> %d to <4 x double>
>>>> > >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret void
>>>> > >> >> > @@ -359,7 +359,7 @@ define void @sitofp4(<4 x i1> %a, <4 x i
>>>> > >> >> >  define void @sitofp8(<8 x i1> %a, <8 x i8> %b, <8 x i16> %c,
>>>> <8 x i32> %d) {
>>>> > >> >> >  ; SSE-LABEL: 'sitofp8'
>>>> > >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 15 for
>>>> instruction: %A1 = sitofp <8 x i1> %a to <8 x float>
>>>> > >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 15 for
>>>> instruction: %B1 = sitofp <8 x i8> %b to <8 x float>
>>>> > >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 8 for
>>>> instruction: %B1 = sitofp <8 x i8> %b to <8 x float>
>>>> > >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 15 for
>>>> instruction: %C1 = sitofp <8 x i16> %c to <8 x float>
>>>> > >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 10 for
>>>> instruction: %D1 = sitofp <8 x i32> %d to <8 x float>
>>>> > >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret void
>>>> > >> >> > @@ -390,9 +390,9 @@ define void @uitofp4(<4 x i1> %a, <4 x i
>>>> > >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 8 for
>>>> instruction: %A1 = uitofp <4 x i1> %a to <4 x float>
>>>> > >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 40 for
>>>> instruction: %A2 = uitofp <4 x i1> %a to <4 x double>
>>>> > >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 8 for
>>>> instruction: %B1 = uitofp <4 x i8> %b to <4 x float>
>>>> > >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 40 for
>>>> instruction: %B2 = uitofp <4 x i8> %b to <4 x double>
>>>> > >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 8 for
>>>> instruction: %C1 = uitofp <4 x i16> %c to <4 x float>
>>>> > >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 40 for
>>>> instruction: %C2 = uitofp <4 x i16> %c to <4 x double>
>>>> > >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 160 for
>>>> instruction: %B2 = uitofp <4 x i8> %b to <4 x double>
>>>> > >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 15 for
>>>> instruction: %C1 = uitofp <4 x i16> %c to <4 x float>
>>>> > >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 80 for
>>>> instruction: %C2 = uitofp <4 x i16> %c to <4 x double>
>>>> > >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 8 for
>>>> instruction: %D1 = uitofp <4 x i32> %d to <4 x float>
>>>> > >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 40 for
>>>> instruction: %D2 = uitofp <4 x i32> %d to <4 x double>
>>>> > >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret void
>>>> > >> >> > @@ -433,7 +433,7 @@ define void @uitofp4(<4 x i1> %a, <4 x i
>>>> > >> >> >  define void @uitofp8(<8 x i1> %a, <8 x i8> %b, <8 x i16> %c,
>>>> <8 x i32> %d) {
>>>> > >> >> >  ; SSE-LABEL: 'uitofp8'
>>>> > >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 15 for
>>>> instruction: %A1 = uitofp <8 x i1> %a to <8 x float>
>>>> > >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 15 for
>>>> instruction: %B1 = uitofp <8 x i8> %b to <8 x float>
>>>> > >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 8 for
>>>> instruction: %B1 = uitofp <8 x i8> %b to <8 x float>
>>>> > >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 15 for
>>>> instruction: %C1 = uitofp <8 x i16> %c to <8 x float>
>>>> > >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 16 for
>>>> instruction: %D1 = uitofp <8 x i32> %d to <8 x float>
>>>> > >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret void
>>>> > >> >> >
>>>> > >> >> > Modified: llvm/trunk/test/Analysis/CostModel/X86/fptosi.ll
>>>> > >> >> > URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Analysis/CostModel/X86/fptosi.ll?rev=368183&r1=368182&r2=368183&view=diff
>>>> > >> >> >
>>>> ==============================================================================
>>>> > >> >> > --- llvm/trunk/test/Analysis/CostModel/X86/fptosi.ll
>>>> (original)
>>>> > >> >> > +++ llvm/trunk/test/Analysis/CostModel/X86/fptosi.ll Wed Aug
>>>> 7 09:24:26 2019
>>>> > >> >> > @@ -92,35 +92,28 @@ define i32 @fptosi_double_i32(i32 %arg)
>>>> > >> >> >  define i32 @fptosi_double_i16(i32 %arg) {
>>>> > >> >> >  ; SSE-LABEL: 'fptosi_double_i16'
>>>> > >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %I16 = fptosi double undef to i16
>>>> > >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 6 for
>>>> instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>
>>>> > >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 13 for
>>>> instruction: %V4I16 = fptosi <4 x double> undef to <4 x i16>
>>>> > >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 27 for
>>>> instruction: %V8I16 = fptosi <8 x double> undef to <8 x i16>
>>>> > >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>
>>>> > >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: %V4I16 = fptosi <4 x double> undef to <4 x i16>
>>>> > >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 7 for
>>>> instruction: %V8I16 = fptosi <8 x double> undef to <8 x i16>
>>>> > >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret i32 undef
>>>> > >> >> >  ;
>>>> > >> >> >  ; AVX-LABEL: 'fptosi_double_i16'
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %I16 = fptosi double undef to i16
>>>> > >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 6 for
>>>> instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>
>>>> > >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V4I16 = fptosi <4 x double> undef to <4 x i16>
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: %V8I16 = fptosi <8 x double> undef to <8 x i16>
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret i32 undef
>>>> > >> >> >  ;
>>>> > >> >> > -; AVX512F-LABEL: 'fptosi_double_i16'
>>>> > >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1
>>>> for instruction: %I16 = fptosi double undef to i16
>>>> > >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 6
>>>> for instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>
>>>> > >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1
>>>> for instruction: %V4I16 = fptosi <4 x double> undef to <4 x i16>
>>>> > >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1
>>>> for instruction: %V8I16 = fptosi <8 x double> undef to <8 x i16>
>>>> > >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 0
>>>> for instruction: ret i32 undef
>>>> > >> >> > -;
>>>> > >> >> > -; AVX512DQ-LABEL: 'fptosi_double_i16'
>>>> > >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1
>>>> for instruction: %I16 = fptosi double undef to i16
>>>> > >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1
>>>> for instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>
>>>> > >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1
>>>> for instruction: %V4I16 = fptosi <4 x double> undef to <4 x i16>
>>>> > >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1
>>>> for instruction: %V8I16 = fptosi <8 x double> undef to <8 x i16>
>>>> > >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 0
>>>> for instruction: ret i32 undef
>>>> > >> >> > +; AVX512-LABEL: 'fptosi_double_i16'
>>>> > >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %I16 = fptosi double undef to i16
>>>> > >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>
>>>> > >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V4I16 = fptosi <4 x double> undef to <4 x i16>
>>>> > >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V8I16 = fptosi <8 x double> undef to <8 x i16>
>>>> > >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret i32 undef
>>>> > >> >> >  ;
>>>> > >> >> >  ; BTVER2-LABEL: 'fptosi_double_i16'
>>>> > >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %I16 = fptosi double undef to i16
>>>> > >> >> > -; BTVER2-NEXT:  Cost Model: Found an estimated cost of 6 for
>>>> instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>
>>>> > >> >> > +; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>
>>>> > >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V4I16 = fptosi <4 x double> undef to <4 x i16>
>>>> > >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: %V8I16 = fptosi <8 x double> undef to <8 x i16>
>>>> > >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret i32 undef
>>>> > >> >> > @@ -143,29 +136,22 @@ define i32 @fptosi_double_i8(i32 %arg) {
>>>> > >> >> >  ; AVX-LABEL: 'fptosi_double_i8'
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %I8 = fptosi double undef to i8
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 6 for
>>>> instruction: %V2I8 = fptosi <2 x double> undef to <2 x i8>
>>>> > >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V4I8 = fptosi <4 x double> undef to <4 x i8>
>>>> > >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: %V8I8 = fptosi <8 x double> undef to <8 x i8>
>>>> > >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 12 for
>>>> instruction: %V4I8 = fptosi <4 x double> undef to <4 x i8>
>>>> > >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 25 for
>>>> instruction: %V8I8 = fptosi <8 x double> undef to <8 x i8>
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret i32 undef
>>>> > >> >> >  ;
>>>> > >> >> > -; AVX512F-LABEL: 'fptosi_double_i8'
>>>> > >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1
>>>> for instruction: %I8 = fptosi double undef to i8
>>>> > >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 6
>>>> for instruction: %V2I8 = fptosi <2 x double> undef to <2 x i8>
>>>> > >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1
>>>> for instruction: %V4I8 = fptosi <4 x double> undef to <4 x i8>
>>>> > >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1
>>>> for instruction: %V8I8 = fptosi <8 x double> undef to <8 x i8>
>>>> > >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 0
>>>> for instruction: ret i32 undef
>>>> > >> >> > -;
>>>> > >> >> > -; AVX512DQ-LABEL: 'fptosi_double_i8'
>>>> > >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1
>>>> for instruction: %I8 = fptosi double undef to i8
>>>> > >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1
>>>> for instruction: %V2I8 = fptosi <2 x double> undef to <2 x i8>
>>>> > >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1
>>>> for instruction: %V4I8 = fptosi <4 x double> undef to <4 x i8>
>>>> > >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1
>>>> for instruction: %V8I8 = fptosi <8 x double> undef to <8 x i8>
>>>> > >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 0
>>>> for instruction: ret i32 undef
>>>> > >> >> > +; AVX512-LABEL: 'fptosi_double_i8'
>>>> > >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %I8 = fptosi double undef to i8
>>>> > >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V2I8 = fptosi <2 x double> undef to <2 x i8>
>>>> > >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V4I8 = fptosi <4 x double> undef to <4 x i8>
>>>> > >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V8I8 = fptosi <8 x double> undef to <8 x i8>
>>>> > >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret i32 undef
>>>> > >> >> >  ;
>>>> > >> >> >  ; BTVER2-LABEL: 'fptosi_double_i8'
>>>> > >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %I8 = fptosi double undef to i8
>>>> > >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 6 for
>>>> instruction: %V2I8 = fptosi <2 x double> undef to <2 x i8>
>>>> > >> >> > -; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V4I8 = fptosi <4 x double> undef to <4 x i8>
>>>> > >> >> > -; BTVER2-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: %V8I8 = fptosi <8 x double> undef to <8 x i8>
>>>> > >> >> > +; BTVER2-NEXT:  Cost Model: Found an estimated cost of 12
>>>> for instruction: %V4I8 = fptosi <4 x double> undef to <4 x i8>
>>>> > >> >> > +; BTVER2-NEXT:  Cost Model: Found an estimated cost of 25
>>>> for instruction: %V8I8 = fptosi <8 x double> undef to <8 x i8>
>>>> > >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret i32 undef
>>>> > >> >> >  ;
>>>> > >> >> >    %I8 = fptosi double undef to i8
>>>> > >> >> > @@ -285,9 +271,9 @@ define i32 @fptosi_float_i16(i32 %arg) {
>>>> > >> >> >  define i32 @fptosi_float_i8(i32 %arg) {
>>>> > >> >> >  ; SSE-LABEL: 'fptosi_float_i8'
>>>> > >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %I8 = fptosi float undef to i8
>>>> > >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V4I8 = fptosi <4 x float> undef to <4 x i8>
>>>> > >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: %V8I8 = fptosi <8 x float> undef to <8 x i8>
>>>> > >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 7 for
>>>> instruction: %V16I8 = fptosi <16 x float> undef to <16 x i8>
>>>> > >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 12 for
>>>> instruction: %V4I8 = fptosi <4 x float> undef to <4 x i8>
>>>> > >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 25 for
>>>> instruction: %V8I8 = fptosi <8 x float> undef to <8 x i8>
>>>> > >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 51 for
>>>> instruction: %V16I8 = fptosi <16 x float> undef to <16 x i8>
>>>> > >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret i32 undef
>>>> > >> >> >  ;
>>>> > >> >> >  ; AVX-LABEL: 'fptosi_float_i8'
>>>> > >> >> >
>>>> > >> >> > Modified: llvm/trunk/test/Analysis/CostModel/X86/fptoui.ll
>>>> > >> >> > URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Analysis/CostModel/X86/fptoui.ll?rev=368183&r1=368182&r2=368183&view=diff
>>>> > >> >> >
>>>> ==============================================================================
>>>> > >> >> > --- llvm/trunk/test/Analysis/CostModel/X86/fptoui.ll
>>>> (original)
>>>> > >> >> > +++ llvm/trunk/test/Analysis/CostModel/X86/fptoui.ll Wed Aug
>>>> 7 09:24:26 2019
>>>> > >> >> > @@ -68,19 +68,12 @@ define i32 @fptoui_double_i32(i32 %arg)
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 33 for
>>>> instruction: %V8I32 = fptoui <8 x double> undef to <8 x i32>
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret i32 undef
>>>> > >> >> >  ;
>>>> > >> >> > -; AVX512F-LABEL: 'fptoui_double_i32'
>>>> > >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1
>>>> for instruction: %I32 = fptoui double undef to i32
>>>> > >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 6
>>>> for instruction: %V2I32 = fptoui <2 x double> undef to <2 x i32>
>>>> > >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1
>>>> for instruction: %V4I32 = fptoui <4 x double> undef to <4 x i32>
>>>> > >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1
>>>> for instruction: %V8I32 = fptoui <8 x double> undef to <8 x i32>
>>>> > >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 0
>>>> for instruction: ret i32 undef
>>>> > >> >> > -;
>>>> > >> >> > -; AVX512DQ-LABEL: 'fptoui_double_i32'
>>>> > >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1
>>>> for instruction: %I32 = fptoui double undef to i32
>>>> > >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1
>>>> for instruction: %V2I32 = fptoui <2 x double> undef to <2 x i32>
>>>> > >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1
>>>> for instruction: %V4I32 = fptoui <4 x double> undef to <4 x i32>
>>>> > >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1
>>>> for instruction: %V8I32 = fptoui <8 x double> undef to <8 x i32>
>>>> > >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 0
>>>> for instruction: ret i32 undef
>>>> > >> >> > +; AVX512-LABEL: 'fptoui_double_i32'
>>>> > >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %I32 = fptoui double undef to i32
>>>> > >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V2I32 = fptoui <2 x double> undef to <2 x i32>
>>>> > >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V4I32 = fptoui <4 x double> undef to <4 x i32>
>>>> > >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V8I32 = fptoui <8 x double> undef to <8 x i32>
>>>> > >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret i32 undef
>>>> > >> >> >  ;
>>>> > >> >> >  ; BTVER2-LABEL: 'fptoui_double_i32'
>>>> > >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %I32 = fptoui double undef to i32
>>>> > >> >> > @@ -106,30 +99,23 @@ define i32 @fptoui_double_i16(i32 %arg)
>>>> > >> >> >  ;
>>>> > >> >> >  ; AVX-LABEL: 'fptoui_double_i16'
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %I16 = fptoui double undef to i16
>>>> > >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 6 for
>>>> instruction: %V2I16 = fptoui <2 x double> undef to <2 x i16>
>>>> > >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 12 for
>>>> instruction: %V4I16 = fptoui <4 x double> undef to <4 x i16>
>>>> > >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 25 for
>>>> instruction: %V8I16 = fptoui <8 x double> undef to <8 x i16>
>>>> > >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V2I16 = fptoui <2 x double> undef to <2 x i16>
>>>> > >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V4I16 = fptoui <4 x double> undef to <4 x i16>
>>>> > >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: %V8I16 = fptoui <8 x double> undef to <8 x i16>
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret i32 undef
>>>> > >> >> >  ;
>>>> > >> >> > -; AVX512F-LABEL: 'fptoui_double_i16'
>>>> > >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1
>>>> for instruction: %I16 = fptoui double undef to i16
>>>> > >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 6
>>>> for instruction: %V2I16 = fptoui <2 x double> undef to <2 x i16>
>>>> > >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1
>>>> for instruction: %V4I16 = fptoui <4 x double> undef to <4 x i16>
>>>> > >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 2
>>>> for instruction: %V8I16 = fptoui <8 x double> undef to <8 x i16>
>>>> > >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 0
>>>> for instruction: ret i32 undef
>>>> > >> >> > -;
>>>> > >> >> > -; AVX512DQ-LABEL: 'fptoui_double_i16'
>>>> > >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1
>>>> for instruction: %I16 = fptoui double undef to i16
>>>> > >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1
>>>> for instruction: %V2I16 = fptoui <2 x double> undef to <2 x i16>
>>>> > >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1
>>>> for instruction: %V4I16 = fptoui <4 x double> undef to <4 x i16>
>>>> > >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 2
>>>> for instruction: %V8I16 = fptoui <8 x double> undef to <8 x i16>
>>>> > >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 0
>>>> for instruction: ret i32 undef
>>>> > >> >> > +; AVX512-LABEL: 'fptoui_double_i16'
>>>> > >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %I16 = fptoui double undef to i16
>>>> > >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V2I16 = fptoui <2 x double> undef to <2 x i16>
>>>> > >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V4I16 = fptoui <4 x double> undef to <4 x i16>
>>>> > >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 2 for
>>>> instruction: %V8I16 = fptoui <8 x double> undef to <8 x i16>
>>>> > >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret i32 undef
>>>> > >> >> >  ;
>>>> > >> >> >  ; BTVER2-LABEL: 'fptoui_double_i16'
>>>> > >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %I16 = fptoui double undef to i16
>>>> > >> >> > -; BTVER2-NEXT:  Cost Model: Found an estimated cost of 6 for
>>>> instruction: %V2I16 = fptoui <2 x double> undef to <2 x i16>
>>>> > >> >> > -; BTVER2-NEXT:  Cost Model: Found an estimated cost of 12
>>>> for instruction: %V4I16 = fptoui <4 x double> undef to <4 x i16>
>>>> > >> >> > -; BTVER2-NEXT:  Cost Model: Found an estimated cost of 25
>>>> for instruction: %V8I16 = fptoui <8 x double> undef to <8 x i16>
>>>> > >> >> > +; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V2I16 = fptoui <2 x double> undef to <2 x i16>
>>>> > >> >> > +; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V4I16 = fptoui <4 x double> undef to <4 x i16>
>>>> > >> >> > +; BTVER2-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: %V8I16 = fptoui <8 x double> undef to <8 x i16>
>>>> > >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret i32 undef
>>>> > >> >> >  ;
>>>> > >> >> >    %I16 = fptoui double undef to i16
>>>> > >> >> > @@ -154,19 +140,12 @@ define i32 @fptoui_double_i8(i32 %arg) {
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 25 for
>>>> instruction: %V8I8 = fptoui <8 x double> undef to <8 x i8>
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret i32 undef
>>>> > >> >> >  ;
>>>> > >> >> > -; AVX512F-LABEL: 'fptoui_double_i8'
>>>> > >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1
>>>> for instruction: %I8 = fptoui double undef to i8
>>>> > >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 6
>>>> for instruction: %V2I8 = fptoui <2 x double> undef to <2 x i8>
>>>> > >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1
>>>> for instruction: %V4I8 = fptoui <4 x double> undef to <4 x i8>
>>>> > >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 2
>>>> for instruction: %V8I8 = fptoui <8 x double> undef to <8 x i8>
>>>> > >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 0
>>>> for instruction: ret i32 undef
>>>> > >> >> > -;
>>>> > >> >> > -; AVX512DQ-LABEL: 'fptoui_double_i8'
>>>> > >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1
>>>> for instruction: %I8 = fptoui double undef to i8
>>>> > >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1
>>>> for instruction: %V2I8 = fptoui <2 x double> undef to <2 x i8>
>>>> > >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1
>>>> for instruction: %V4I8 = fptoui <4 x double> undef to <4 x i8>
>>>> > >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 2
>>>> for instruction: %V8I8 = fptoui <8 x double> undef to <8 x i8>
>>>> > >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 0
>>>> for instruction: ret i32 undef
>>>> > >> >> > +; AVX512-LABEL: 'fptoui_double_i8'
>>>> > >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %I8 = fptoui double undef to i8
>>>> > >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V2I8 = fptoui <2 x double> undef to <2 x i8>
>>>> > >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V4I8 = fptoui <4 x double> undef to <4 x i8>
>>>> > >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 2 for
>>>> instruction: %V8I8 = fptoui <8 x double> undef to <8 x i8>
>>>> > >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret i32 undef
>>>> > >> >> >  ;
>>>> > >> >> >  ; BTVER2-LABEL: 'fptoui_double_i8'
>>>> > >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %I8 = fptoui double undef to i8
>>>> > >> >> > @@ -277,7 +256,7 @@ define i32 @fptoui_float_i16(i32 %arg) {
>>>> > >> >> >  ;
>>>> > >> >> >  ; AVX-LABEL: 'fptoui_float_i16'
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %I16 = fptoui float undef to i16
>>>> > >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 12 for
>>>> instruction: %V4I16 = fptoui <4 x float> undef to <4 x i16>
>>>> > >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V4I16 = fptoui <4 x float> undef to <4 x i16>
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V8I16 = fptoui <8 x float> undef to <8 x i16>
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: %V16I16 = fptoui <16 x float> undef to <16 x i16>
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret i32 undef
>>>> > >> >> > @@ -291,7 +270,7 @@ define i32 @fptoui_float_i16(i32 %arg) {
>>>> > >> >> >  ;
>>>> > >> >> >  ; BTVER2-LABEL: 'fptoui_float_i16'
>>>> > >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %I16 = fptoui float undef to i16
>>>> > >> >> > -; BTVER2-NEXT:  Cost Model: Found an estimated cost of 12
>>>> for instruction: %V4I16 = fptoui <4 x float> undef to <4 x i16>
>>>> > >> >> > +; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V4I16 = fptoui <4 x float> undef to <4 x i16>
>>>> > >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V8I16 = fptoui <8 x float> undef to <8 x i16>
>>>> > >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: %V16I16 = fptoui <16 x float> undef to <16 x i16>
>>>> > >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret i32 undef
>>>> > >> >> > @@ -314,8 +293,8 @@ define i32 @fptoui_float_i8(i32 %arg) {
>>>> > >> >> >  ; AVX-LABEL: 'fptoui_float_i8'
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %I8 = fptoui float undef to i8
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 12 for
>>>> instruction: %V4I8 = fptoui <4 x float> undef to <4 x i8>
>>>> > >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V8I8 = fptoui <8 x float> undef to <8 x i8>
>>>> > >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: %V16I8 = fptoui <16 x float> undef to <16 x i8>
>>>> > >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 24 for
>>>> instruction: %V8I8 = fptoui <8 x float> undef to <8 x i8>
>>>> > >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 49 for
>>>> instruction: %V16I8 = fptoui <16 x float> undef to <16 x i8>
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret i32 undef
>>>> > >> >> >  ;
>>>> > >> >> >  ; AVX512-LABEL: 'fptoui_float_i8'
>>>> > >> >> > @@ -328,8 +307,8 @@ define i32 @fptoui_float_i8(i32 %arg) {
>>>> > >> >> >  ; BTVER2-LABEL: 'fptoui_float_i8'
>>>> > >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %I8 = fptoui float undef to i8
>>>> > >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 12
>>>> for instruction: %V4I8 = fptoui <4 x float> undef to <4 x i8>
>>>> > >> >> > -; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V8I8 = fptoui <8 x float> undef to <8 x i8>
>>>> > >> >> > -; BTVER2-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: %V16I8 = fptoui <16 x float> undef to <16 x i8>
>>>> > >> >> > +; BTVER2-NEXT:  Cost Model: Found an estimated cost of 24
>>>> for instruction: %V8I8 = fptoui <8 x float> undef to <8 x i8>
>>>> > >> >> > +; BTVER2-NEXT:  Cost Model: Found an estimated cost of 49
>>>> for instruction: %V16I8 = fptoui <16 x float> undef to <16 x i8>
>>>> > >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret i32 undef
>>>> > >> >> >  ;
>>>> > >> >> >    %I8 = fptoui float undef to i8
>>>> > >> >> >
>>>> > >> >> > Modified:
>>>> llvm/trunk/test/Analysis/CostModel/X86/masked-intrinsic-cost.ll
>>>> > >> >> > URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Analysis/CostModel/X86/masked-intrinsic-cost.ll?rev=368183&r1=368182&r2=368183&view=diff
>>>> > >> >> >
>>>> ==============================================================================
>>>> > >> >> > ---
>>>> llvm/trunk/test/Analysis/CostModel/X86/masked-intrinsic-cost.ll (original)
>>>> > >> >> > +++
>>>> llvm/trunk/test/Analysis/CostModel/X86/masked-intrinsic-cost.ll Wed Aug  7
>>>> 09:24:26 2019
>>>> > >> >> > @@ -52,7 +52,7 @@ define i32 @masked_load() {
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 4 for
>>>> instruction: %V16I32 = call <16 x i32>
>>>> @llvm.masked.load.v16i32.p0v16i32(<16 x i32>* undef, i32 1, <16 x i1>
>>>> undef, <16 x i32> undef)
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 2 for
>>>> instruction: %V8I32 = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x
>>>> i32>* undef, i32 1, <8 x i1> undef, <8 x i32> undef)
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 2 for
>>>> instruction: %V4I32 = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x
>>>> i32>* undef, i32 1, <4 x i1> undef, <4 x i32> undef)
>>>> > >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 4 for
>>>> instruction: %V2I32 = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x
>>>> i32>* undef, i32 1, <2 x i1> undef, <2 x i32> undef)
>>>> > >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 6 for
>>>> instruction: %V2I32 = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x
>>>> i32>* undef, i32 1, <2 x i1> undef, <2 x i32> undef)
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 128 for
>>>> instruction: %V32I16 = call <32 x i16>
>>>> @llvm.masked.load.v32i16.p0v32i16(<32 x i16>* undef, i32 1, <32 x i1>
>>>> undef, <32 x i16> undef)
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 64 for
>>>> instruction: %V16I16 = call <16 x i16>
>>>> @llvm.masked.load.v16i16.p0v16i16(<16 x i16>* undef, i32 1, <16 x i1>
>>>> undef, <16 x i16> undef)
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 32 for
>>>> instruction: %V8I16 = call <8 x i16> @llvm.masked.load.v8i16.p0v8i16(<8 x
>>>> i16>* undef, i32 1, <8 x i1> undef, <8 x i16> undef)
>>>> > >> >> > @@ -79,7 +79,7 @@ define i32 @masked_load() {
>>>> > >> >> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V16I32 = call <16 x i32>
>>>> @llvm.masked.load.v16i32.p0v16i32(<16 x i32>* undef, i32 1, <16 x i1>
>>>> undef, <16 x i32> undef)
>>>> > >> >> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V8I32 = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x
>>>> i32>* undef, i32 1, <8 x i1> undef, <8 x i32> undef)
>>>> > >> >> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V4I32 = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x
>>>> i32>* undef, i32 1, <4 x i1> undef, <4 x i32> undef)
>>>> > >> >> > -; KNL-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: %V2I32 = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x
>>>> i32>* undef, i32 1, <2 x i1> undef, <2 x i32> undef)
>>>> > >> >> > +; KNL-NEXT:  Cost Model: Found an estimated cost of 5 for
>>>> instruction: %V2I32 = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x
>>>> i32>* undef, i32 1, <2 x i1> undef, <2 x i32> undef)
>>>> > >> >> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 128 for
>>>> instruction: %V32I16 = call <32 x i16>
>>>> @llvm.masked.load.v32i16.p0v32i16(<32 x i16>* undef, i32 1, <32 x i1>
>>>> undef, <32 x i16> undef)
>>>> > >> >> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 64 for
>>>> instruction: %V16I16 = call <16 x i16>
>>>> @llvm.masked.load.v16i16.p0v16i16(<16 x i16>* undef, i32 1, <16 x i1>
>>>> undef, <16 x i16> undef)
>>>> > >> >> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 32 for
>>>> instruction: %V8I16 = call <8 x i16> @llvm.masked.load.v8i16.p0v8i16(<8 x
>>>> i16>* undef, i32 1, <8 x i1> undef, <8 x i16> undef)
>>>> > >> >> > @@ -106,15 +106,15 @@ define i32 @masked_load() {
>>>> > >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V16I32 = call <16 x i32>
>>>> @llvm.masked.load.v16i32.p0v16i32(<16 x i32>* undef, i32 1, <16 x i1>
>>>> undef, <16 x i32> undef)
>>>> > >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V8I32 = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x
>>>> i32>* undef, i32 1, <8 x i1> undef, <8 x i32> undef)
>>>> > >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V4I32 = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x
>>>> i32>* undef, i32 1, <4 x i1> undef, <4 x i32> undef)
>>>> > >> >> > -; SKX-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: %V2I32 = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x
>>>> i32>* undef, i32 1, <2 x i1> undef, <2 x i32> undef)
>>>> > >> >> > +; SKX-NEXT:  Cost Model: Found an estimated cost of 5 for
>>>> instruction: %V2I32 = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x
>>>> i32>* undef, i32 1, <2 x i1> undef, <2 x i32> undef)
>>>> > >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V32I16 = call <32 x i16>
>>>> @llvm.masked.load.v32i16.p0v32i16(<32 x i16>* undef, i32 1, <32 x i1>
>>>> undef, <32 x i16> undef)
>>>> > >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V16I16 = call <16 x i16>
>>>> @llvm.masked.load.v16i16.p0v16i16(<16 x i16>* undef, i32 1, <16 x i1>
>>>> undef, <16 x i16> undef)
>>>> > >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V8I16 = call <8 x i16> @llvm.masked.load.v8i16.p0v8i16(<8 x
>>>> i16>* undef, i32 1, <8 x i1> undef, <8 x i16> undef)
>>>> > >> >> > -; SKX-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: %V4I16 = call <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x
>>>> i16>* undef, i32 1, <4 x i1> undef, <4 x i16> undef)
>>>> > >> >> > +; SKX-NEXT:  Cost Model: Found an estimated cost of 9 for
>>>> instruction: %V4I16 = call <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x
>>>> i16>* undef, i32 1, <4 x i1> undef, <4 x i16> undef)
>>>> > >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V64I8 = call <64 x i8> @llvm.masked.load.v64i8.p0v64i8(<64 x
>>>> i8>* undef, i32 1, <64 x i1> undef, <64 x i8> undef)
>>>> > >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V32I8 = call <32 x i8> @llvm.masked.load.v32i8.p0v32i8(<32 x
>>>> i8>* undef, i32 1, <32 x i1> undef, <32 x i8> undef)
>>>> > >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V16I8 = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x
>>>> i8>* undef, i32 1, <16 x i1> undef, <16 x i8> undef)
>>>> > >> >> > -; SKX-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: %V8I8 = call <8 x i8> @llvm.masked.load.v8i8.p0v8i8(<8 x i8>*
>>>> undef, i32 1, <8 x i1> undef, <8 x i8> undef)
>>>> > >> >> > +; SKX-NEXT:  Cost Model: Found an estimated cost of 17 for
>>>> instruction: %V8I8 = call <8 x i8> @llvm.masked.load.v8i8.p0v8i8(<8 x i8>*
>>>> undef, i32 1, <8 x i1> undef, <8 x i8> undef)
>>>> > >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret i32 0
>>>> > >> >> >  ;
>>>> > >> >> >    %V8F64 = call <8 x double>
>>>> @llvm.masked.load.v8f64.p0v8f64(<8 x double>* undef, i32 1, <8 x i1> undef,
>>>> <8 x double> undef)
>>>> > >> >> > @@ -194,7 +194,7 @@ define i32 @masked_store() {
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 16 for
>>>> instruction: call void @llvm.masked.store.v16i32.p0v16i32(<16 x i32> undef,
>>>> <16 x i32>* undef, i32 1, <16 x i1> undef)
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 8 for
>>>> instruction: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> undef, <8
>>>> x i32>* undef, i32 1, <8 x i1> undef)
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 8 for
>>>> instruction: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> undef, <4
>>>> x i32>* undef, i32 1, <4 x i1> undef)
>>>> > >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 10 for
>>>> instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> undef, <2
>>>> x i32>* undef, i32 1, <2 x i1> undef)
>>>> > >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 12 for
>>>> instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> undef, <2
>>>> x i32>* undef, i32 1, <2 x i1> undef)
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 128 for
>>>> instruction: call void @llvm.masked.store.v32i16.p0v32i16(<32 x i16> undef,
>>>> <32 x i16>* undef, i32 1, <32 x i1> undef)
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 64 for
>>>> instruction: call void @llvm.masked.store.v16i16.p0v16i16(<16 x i16> undef,
>>>> <16 x i16>* undef, i32 1, <16 x i1> undef)
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 32 for
>>>> instruction: call void @llvm.masked.store.v8i16.p0v8i16(<8 x i16> undef, <8
>>>> x i16>* undef, i32 1, <8 x i1> undef)
>>>> > >> >> > @@ -221,7 +221,7 @@ define i32 @masked_store() {
>>>> > >> >> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: call void @llvm.masked.store.v16i32.p0v16i32(<16 x i32> undef,
>>>> <16 x i32>* undef, i32 1, <16 x i1> undef)
>>>> > >> >> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> undef, <8
>>>> x i32>* undef, i32 1, <8 x i1> undef)
>>>> > >> >> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> undef, <4
>>>> x i32>* undef, i32 1, <4 x i1> undef)
>>>> > >> >> > -; KNL-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> undef, <2
>>>> x i32>* undef, i32 1, <2 x i1> undef)
>>>> > >> >> > +; KNL-NEXT:  Cost Model: Found an estimated cost of 5 for
>>>> instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> undef, <2
>>>> x i32>* undef, i32 1, <2 x i1> undef)
>>>> > >> >> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 128 for
>>>> instruction: call void @llvm.masked.store.v32i16.p0v32i16(<32 x i16> undef,
>>>> <32 x i16>* undef, i32 1, <32 x i1> undef)
>>>> > >> >> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 64 for
>>>> instruction: call void @llvm.masked.store.v16i16.p0v16i16(<16 x i16> undef,
>>>> <16 x i16>* undef, i32 1, <16 x i1> undef)
>>>> > >> >> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 32 for
>>>> instruction: call void @llvm.masked.store.v8i16.p0v8i16(<8 x i16> undef, <8
>>>> x i16>* undef, i32 1, <8 x i1> undef)
>>>> > >> >> > @@ -248,15 +248,15 @@ define i32 @masked_store() {
>>>> > >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: call void @llvm.masked.store.v16i32.p0v16i32(<16 x i32> undef,
>>>> <16 x i32>* undef, i32 1, <16 x i1> undef)
>>>> > >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> undef, <8
>>>> x i32>* undef, i32 1, <8 x i1> undef)
>>>> > >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> undef, <4
>>>> x i32>* undef, i32 1, <4 x i1> undef)
>>>> > >> >> > -; SKX-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> undef, <2
>>>> x i32>* undef, i32 1, <2 x i1> undef)
>>>> > >> >> > +; SKX-NEXT:  Cost Model: Found an estimated cost of 5 for
>>>> instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> undef, <2
>>>> x i32>* undef, i32 1, <2 x i1> undef)
>>>> > >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: call void @llvm.masked.store.v32i16.p0v32i16(<32 x i16> undef,
>>>> <32 x i16>* undef, i32 1, <32 x i1> undef)
>>>> > >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: call void @llvm.masked.store.v16i16.p0v16i16(<16 x i16> undef,
>>>> <16 x i16>* undef, i32 1, <16 x i1> undef)
>>>> > >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: call void @llvm.masked.store.v8i16.p0v8i16(<8 x i16> undef, <8
>>>> x i16>* undef, i32 1, <8 x i1> undef)
>>>> > >> >> > -; SKX-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: call void @llvm.masked.store.v4i16.p0v4i16(<4 x i16> undef, <4
>>>> x i16>* undef, i32 1, <4 x i1> undef)
>>>> > >> >> > +; SKX-NEXT:  Cost Model: Found an estimated cost of 9 for
>>>> instruction: call void @llvm.masked.store.v4i16.p0v4i16(<4 x i16> undef, <4
>>>> x i16>* undef, i32 1, <4 x i1> undef)
>>>> > >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: call void @llvm.masked.store.v64i8.p0v64i8(<64 x i8> undef,
>>>> <64 x i8>* undef, i32 1, <64 x i1> undef)
>>>> > >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: call void @llvm.masked.store.v32i8.p0v32i8(<32 x i8> undef,
>>>> <32 x i8>* undef, i32 1, <32 x i1> undef)
>>>> > >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> undef,
>>>> <16 x i8>* undef, i32 1, <16 x i1> undef)
>>>> > >> >> > -; SKX-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: call void @llvm.masked.store.v8i8.p0v8i8(<8 x i8> undef, <8 x
>>>> i8>* undef, i32 1, <8 x i1> undef)
>>>> > >> >> > +; SKX-NEXT:  Cost Model: Found an estimated cost of 17 for
>>>> instruction: call void @llvm.masked.store.v8i8.p0v8i8(<8 x i8> undef, <8 x
>>>> i8>* undef, i32 1, <8 x i1> undef)
>>>> > >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret i32 0
>>>> > >> >> >  ;
>>>> > >> >> >    call void @llvm.masked.store.v8f64.p0v8f64(<8 x double>
>>>> undef, <8 x double>* undef, i32 1, <8 x i1> undef)
>>>> > >> >> > @@ -960,15 +960,10 @@ define <8 x float> @test4(<8 x i32> %tri
>>>> > >> >> >  }
>>>> > >> >> >
>>>> > >> >> >  define void @test5(<2 x i32> %trigger, <2 x float>* %addr,
>>>> <2 x float> %val) {
>>>> > >> >> > -; SSE2-LABEL: 'test5'
>>>> > >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 8 for
>>>> instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>>>> > >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 7 for
>>>> instruction: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> %val,
>>>> <2 x float>* %addr, i32 4, <2 x i1> %mask)
>>>> > >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret void
>>>> > >> >> > -;
>>>> > >> >> > -; SSE42-LABEL: 'test5'
>>>> > >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>>>> > >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 7 for
>>>> instruction: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> %val,
>>>> <2 x float>* %addr, i32 4, <2 x i1> %mask)
>>>> > >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret void
>>>> > >> >> > +; SSE-LABEL: 'test5'
>>>> > >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>>>> > >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 7 for
>>>> instruction: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> %val,
>>>> <2 x float>* %addr, i32 4, <2 x i1> %mask)
>>>> > >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret void
>>>> > >> >> >  ;
>>>> > >> >> >  ; AVX-LABEL: 'test5'
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>>>> > >> >> > @@ -986,24 +981,19 @@ define void @test5(<2 x i32> %trigger, <
>>>> > >> >> >  }
>>>> > >> >> >
>>>> > >> >> >  define void @test6(<2 x i32> %trigger, <2 x i32>* %addr, <2
>>>> x i32> %val) {
>>>> > >> >> > -; SSE2-LABEL: 'test6'
>>>> > >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 8 for
>>>> instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>>>> > >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 8 for
>>>> instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2
>>>> x i32>* %addr, i32 4, <2 x i1> %mask)
>>>> > >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret void
>>>> > >> >> > -;
>>>> > >> >> > -; SSE42-LABEL: 'test6'
>>>> > >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>>>> > >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 8 for
>>>> instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2
>>>> x i32>* %addr, i32 4, <2 x i1> %mask)
>>>> > >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret void
>>>> > >> >> > +; SSE-LABEL: 'test6'
>>>> > >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>>>> > >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 8 for
>>>> instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2
>>>> x i32>* %addr, i32 4, <2 x i1> %mask)
>>>> > >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret void
>>>> > >> >> >  ;
>>>> > >> >> >  ; AVX-LABEL: 'test6'
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>>>> > >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 10 for
>>>> instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2
>>>> x i32>* %addr, i32 4, <2 x i1> %mask)
>>>> > >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 12 for
>>>> instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2
>>>> x i32>* %addr, i32 4, <2 x i1> %mask)
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret void
>>>> > >> >> >  ;
>>>> > >> >> >  ; AVX512-LABEL: 'test6'
>>>> > >> >> >  ; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>>>> > >> >> > -; AVX512-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2
>>>> x i32>* %addr, i32 4, <2 x i1> %mask)
>>>> > >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 5 for
>>>> instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2
>>>> x i32>* %addr, i32 4, <2 x i1> %mask)
>>>> > >> >> >  ; AVX512-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret void
>>>> > >> >> >  ;
>>>> > >> >> >    %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>>>> > >> >> > @@ -1012,15 +1002,10 @@ define void @test6(<2 x i32>
>>>> %trigger, <
>>>> > >> >> >  }
>>>> > >> >> >
>>>> > >> >> >  define <2 x float> @test7(<2 x i32> %trigger, <2 x float>*
>>>> %addr, <2 x float> %dst) {
>>>> > >> >> > -; SSE2-LABEL: 'test7'
>>>> > >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 8 for
>>>> instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>>>> > >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 7 for
>>>> instruction: %res = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x
>>>> float>* %addr, i32 4, <2 x i1> %mask, <2 x float> %dst)
>>>> > >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret <2 x float> %res
>>>> > >> >> > -;
>>>> > >> >> > -; SSE42-LABEL: 'test7'
>>>> > >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>>>> > >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 7 for
>>>> instruction: %res = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x
>>>> float>* %addr, i32 4, <2 x i1> %mask, <2 x float> %dst)
>>>> > >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret <2 x float> %res
>>>> > >> >> > +; SSE-LABEL: 'test7'
>>>> > >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>>>> > >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 7 for
>>>> instruction: %res = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x
>>>> float>* %addr, i32 4, <2 x i1> %mask, <2 x float> %dst)
>>>> > >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret <2 x float> %res
>>>> > >> >> >  ;
>>>> > >> >> >  ; AVX-LABEL: 'test7'
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>>>> > >> >> > @@ -1038,24 +1023,19 @@ define <2 x float> @test7(<2 x i32>
>>>> %tri
>>>> > >> >> >  }
>>>> > >> >> >
>>>> > >> >> >  define <2 x i32> @test8(<2 x i32> %trigger, <2 x i32>*
>>>> %addr, <2 x i32> %dst) {
>>>> > >> >> > -; SSE2-LABEL: 'test8'
>>>> > >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 8 for
>>>> instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>>>> > >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 8 for
>>>> instruction: %res = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x
>>>> i32>* %addr, i32 4, <2 x i1> %mask, <2 x i32> %dst)
>>>> > >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret <2 x i32> %res
>>>> > >> >> > -;
>>>> > >> >> > -; SSE42-LABEL: 'test8'
>>>> > >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>>>> > >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 8 for
>>>> instruction: %res = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x
>>>> i32>* %addr, i32 4, <2 x i1> %mask, <2 x i32> %dst)
>>>> > >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret <2 x i32> %res
>>>> > >> >> > +; SSE-LABEL: 'test8'
>>>> > >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>>>> > >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 8 for
>>>> instruction: %res = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x
>>>> i32>* %addr, i32 4, <2 x i1> %mask, <2 x i32> %dst)
>>>> > >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret <2 x i32> %res
>>>> > >> >> >  ;
>>>> > >> >> >  ; AVX-LABEL: 'test8'
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>>>> > >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 4 for
>>>> instruction: %res = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x
>>>> i32>* %addr, i32 4, <2 x i1> %mask, <2 x i32> %dst)
>>>> > >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 6 for
>>>> instruction: %res = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x
>>>> i32>* %addr, i32 4, <2 x i1> %mask, <2 x i32> %dst)
>>>> > >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret <2 x i32> %res
>>>> > >> >> >  ;
>>>> > >> >> >  ; AVX512-LABEL: 'test8'
>>>> > >> >> >  ; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>>>> > >> >> > -; AVX512-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: %res = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x
>>>> i32>* %addr, i32 4, <2 x i1> %mask, <2 x i32> %dst)
>>>> > >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 5 for
>>>> instruction: %res = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x
>>>> i32>* %addr, i32 4, <2 x i1> %mask, <2 x i32> %dst)
>>>> > >> >> >  ; AVX512-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret <2 x i32> %res
>>>> > >> >> >  ;
>>>> > >> >> >    %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>>>> > >> >> >
>>>> > >> >> > Removed:
>>>> llvm/trunk/test/Analysis/CostModel/X86/reduce-add-widen.ll
>>>> > >> >> > URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Analysis/CostModel/X86/reduce-add-widen.ll?rev=368182&view=auto
>>>> > >> >> >
>>>> ==============================================================================
>>>> > >> >> > ---
>>>> llvm/trunk/test/Analysis/CostModel/X86/reduce-add-widen.ll (original)
>>>> > >> >> > +++
>>>> llvm/trunk/test/Analysis/CostModel/X86/reduce-add-widen.ll (removed)
>>>> > >> >> > @@ -1,307 +0,0 @@
>>>> > >> >> > -; NOTE: Assertions have been autogenerated by
>>>> utils/update_analyze_test_checks.py
>>>> > >> >> > -; RUN: opt < %s
>>>> -x86-experimental-vector-widening-legalization -cost-model
>>>> -mtriple=x86_64-apple-darwin -analyze -mattr=+sse2 | FileCheck %s
>>>> --check-prefixes=CHECK,SSE,SSE2
>>>> > >> >> > -; RUN: opt < %s
>>>> -x86-experimental-vector-widening-legalization -cost-model
>>>> -mtriple=x86_64-apple-darwin -analyze -mattr=+ssse3 | FileCheck %s
>>>> --check-prefixes=CHECK,SSE,SSSE3
>>>> > >> >> > -; RUN: opt < %s
>>>> -x86-experimental-vector-widening-legalization -cost-model
>>>> -mtriple=x86_64-apple-darwin -analyze -mattr=+sse4.2 | FileCheck %s
>>>> --check-prefixes=CHECK,SSE,SSE42
>>>> > >> >> > -; RUN: opt < %s
>>>> -x86-experimental-vector-widening-legalization -cost-model
>>>> -mtriple=x86_64-apple-darwin -analyze -mattr=+avx | FileCheck %s
>>>> --check-prefixes=CHECK,AVX,AVX1
>>>> > >> >> > -; RUN: opt < %s
>>>> -x86-experimental-vector-widening-legalization -cost-model
>>>> -mtriple=x86_64-apple-darwin -analyze -mattr=+avx2 | FileCheck %s
>>>> --check-prefixes=CHECK,AVX,AVX2
>>>> > >> >> > -; RUN: opt < %s
>>>> -x86-experimental-vector-widening-legalization -cost-model
>>>> -mtriple=x86_64-apple-darwin -analyze -mattr=+avx512f | FileCheck %s
>>>> --check-prefixes=CHECK,AVX512,AVX512F
>>>> > >> >> > -; RUN: opt < %s
>>>> -x86-experimental-vector-widening-legalization -cost-model
>>>> -mtriple=x86_64-apple-darwin -analyze -mattr=+avx512f,+avx512bw | FileCheck
>>>> %s --check-prefixes=CHECK,AVX512,AVX512BW
>>>> > >> >> > -; RUN: opt < %s
>>>> -x86-experimental-vector-widening-legalization -cost-model
>>>> -mtriple=x86_64-apple-darwin -analyze -mattr=+avx512f,+avx512dq | FileCheck
>>>> %s --check-prefixes=CHECK,AVX512,AVX512DQ
>>>> > >> >> > -
>>>> > >> >> > -define i32 @reduce_i64(i32 %arg) {
>>>> > >> >> > -; SSE2-LABEL: 'reduce_i64'
>>>> > >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: %V1 = call i64 @llvm.experimental.vector.reduce.add.v1i64(<1 x
>>>> i64> undef)
>>>> > >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: %V2 = call i64 @llvm.experimental.vector.reduce.add.v2i64(<2 x
>>>> i64> undef)
>>>> > >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 4 for
>>>> instruction: %V4 = call i64 @llvm.experimental.vector.reduce.add.v4i64(<4 x
>>>> i64> undef)
>>>> > >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 6 for
>>>> instruction: %V8 = call i64 @llvm.experimental.vector.reduce.add.v8i64(<8 x
>>>> i64> undef)
>>>> > >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 10 for
>>>> instruction: %V16 = call i64
>>>> @llvm.experimental.vector.reduce.add.v16i64(<16 x i64> undef)
>>>> > >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret i32 undef
>>>> > >> >> > -;
>>>> > >> >> > -; SSSE3-LABEL: 'reduce_i64'
>>>> > >> >> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: %V1 = call i64 @llvm.experimental.vector.reduce.add.v1i64(<1 x
>>>> i64> undef)
>>>> > >> >> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: %V2 = call i64 @llvm.experimental.vector.reduce.add.v2i64(<2 x
>>>> i64> undef)
>>>> > >> >> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 4 for
>>>> instruction: %V4 = call i64 @llvm.experimental.vector.reduce.add.v4i64(<4 x
>>>> i64> undef)
>>>> > >> >> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 6 for
>>>> instruction: %V8 = call i64 @llvm.experimental.vector.reduce.add.v8i64(<8 x
>>>> i64> undef)
>>>> > >> >> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 10 for
>>>> instruction: %V16 = call i64
>>>> @llvm.experimental.vector.reduce.add.v16i64(<16 x i64> undef)
>>>> > >> >> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret i32 undef
>>>> > >> >> > -;
>>>> > >> >> > -; SSE42-LABEL: 'reduce_i64'
>>>> > >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: %V1 = call i64 @llvm.experimental.vector.reduce.add.v1i64(<1 x
>>>> i64> undef)
>>>> > >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 2 for
>>>> instruction: %V2 = call i64 @llvm.experimental.vector.reduce.add.v2i64(<2 x
>>>> i64> undef)
>>>> > >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 4 for
>>>> instruction: %V4 = call i64 @llvm.experimental.vector.reduce.add.v4i64(<4 x
>>>> i64> undef)
>>>> > >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 8 for
>>>> instruction: %V8 = call i64 @llvm.experimental.vector.reduce.add.v8i64(<8 x
>>>> i64> undef)
>>>> > >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 16 for
>>>> instruction: %V16 = call i64
>>>> @llvm.experimental.vector.reduce.add.v16i64(<16 x i64> undef)
>>>> > >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret i32 undef
>>>> > >> >> > -;
>>>> > >> >> > -; AVX-LABEL: 'reduce_i64'
>>>> > >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: %V1 = call i64 @llvm.experimental.vector.reduce.add.v1i64(<1 x
>>>> i64> undef)
>>>> > >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V2 = call i64 @llvm.experimental.vector.reduce.add.v2i64(<2 x
>>>> i64> undef)
>>>> > >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: %V4 = call i64 @llvm.experimental.vector.reduce.add.v4i64(<4 x
>>>> i64> undef)
>>>> > >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 6 for
>>>> instruction: %V8 = call i64 @llvm.experimental.vector.reduce.add.v8i64(<8 x
>>>> i64> undef)
>>>> > >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 12 for
>>>> instruction: %V16 = call i64
>>>> @llvm.experimental.vector.reduce.add.v16i64(<16 x i64> undef)
>>>> > >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret i32 undef
>>>> > >> >> > -;
>>>> > >> >> > -; AVX512-LABEL: 'reduce_i64'
>>>> > >> >> > -; AVX512-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: %V1 = call i64 @llvm.experimental.vector.reduce.add.v1i64(<1 x
>>>> i64> undef)
>>>> > >> >> > -; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for
>>>> instruction: %V2 = call i64 @llvm.experimental.vector.reduce.add.v2i64(<2 x
>>>> i64> undef)
>>>> > >> >> > -; AVX512-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: %V4 = call i64 @llvm.experimental.vector.reduce.add.v4i64(<4 x
>>>> i64> undef)
>>>> > >> >> > -; AVX512-NEXT:  Cost Model: Found an estimated cost of 7 for
>>>> instruction: %V8 = call i64 @llvm.experimental.vector.reduce.add.v8i64(<8 x
>>>> i64> undef)
>>>> > >> >> > -; AVX512-NEXT:  Cost Model: Found an estimated cost of 8 for
>>>> instruction: %V16 = call i64
>>>> @llvm.experimental.vector.reduce.add.v16i64(<16 x i64> undef)
>>>> > >> >> > -; AVX512-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret i32 undef
>>>> > >> >> > -;
>>>> > >> >> > -  %V1  = call i64
>>>> @llvm.experimental.vector.reduce.add.v1i64(<1 x i64> undef)
>>>> > >> >> > -  %V2  = call i64
>>>> @llvm.experimental.vector.reduce.add.v2i64(<2 x i64> undef)
>>>> > >> >> > -  %V4  = call i64
>>>> @llvm.experimental.vector.reduce.add.v4i64(<4 x i64> undef)
>>>> > >> >> > -  %V8  = call i64
>>>> @llvm.experimental.vector.reduce.add.v8i64(<8 x i64> undef)
>>>> > >> >> > -  %V16 = call i64
>>>> @llvm.experimental.vector.reduce.add.v16i64(<16 x i64> undef)
>>>> > >> >> > -  ret i32 undef
>>>> > >> >> > -}
>>>> > >> >> > -
>>>> > >> >> > -define i32 @reduce_i32(i32 %arg) {
>>>> > >> >> > -; SSE2-LABEL: 'reduce_i32'
>>>> > >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: %V2 = call i32 @llvm.experimental.vector.reduce.add.v2i32(<2 x
>>>> i32> undef)
>>>> > >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 5 for
>>>> instruction: %V4 = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x
>>>> i32> undef)
>>>> > >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 6 for
>>>> instruction: %V8 = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x
>>>> i32> undef)
>>>> > >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 8 for
>>>> instruction: %V16 = call i32
>>>> @llvm.experimental.vector.reduce.add.v16i32(<16 x i32> undef)
>>>> > >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 12 for
>>>> instruction: %V32 = call i32
>>>> @llvm.experimental.vector.reduce.add.v32i32(<32 x i32> undef)
>>>> > >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret i32 undef
>>>> > >> >> > -;
>>>> > >> >> > -; SSSE3-LABEL: 'reduce_i32'
>>>> > >> >> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: %V2 = call i32 @llvm.experimental.vector.reduce.add.v2i32(<2 x
>>>> i32> undef)
>>>> > >> >> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 5 for
>>>> instruction: %V4 = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x
>>>> i32> undef)
>>>> > >> >> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 6 for
>>>> instruction: %V8 = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x
>>>> i32> undef)
>>>> > >> >> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 8 for
>>>> instruction: %V16 = call i32
>>>> @llvm.experimental.vector.reduce.add.v16i32(<16 x i32> undef)
>>>> > >> >> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 12 for
>>>> instruction: %V32 = call i32
>>>> @llvm.experimental.vector.reduce.add.v32i32(<32 x i32> undef)
>>>> > >> >> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret i32 undef
>>>> > >> >> > -;
>>>> > >> >> > -; SSE42-LABEL: 'reduce_i32'
>>>> > >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: %V2 = call i32 @llvm.experimental.vector.reduce.add.v2i32(<2 x
>>>> i32> undef)
>>>> > >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: %V4 = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x
>>>> i32> undef)
>>>> > >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 6 for
>>>> instruction: %V8 = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x
>>>> i32> undef)
>>>> > >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 12 for
>>>> instruction: %V16 = call i32
>>>> @llvm.experimental.vector.reduce.add.v16i32(<16 x i32> undef)
>>>> > >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 24 for
>>>> instruction: %V32 = call i32
>>>> @llvm.experimental.vector.reduce.add.v32i32(<32 x i32> undef)
>>>> > >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret i32 undef
>>>> > >> >> > -;
>>>> > >> >> > -; AVX-LABEL: 'reduce_i32'
>>>> > >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: %V2 = call i32 @llvm.experimental.vector.reduce.add.v2i32(<2 x
>>>> i32> undef)
>>>> > >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: %V4 = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x
>>>> i32> undef)
>>>> > >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 5 for
>>>> instruction: %V8 = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x
>>>> i32> undef)
>>>> > >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 10 for
>>>> instruction: %V16 = call i32
>>>> @llvm.experimental.vector.reduce.add.v16i32(<16 x i32> undef)
>>>> > >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 20 for
>>>> instruction: %V32 = call i32
>>>> @llvm.experimental.vector.reduce.add.v32i32(<32 x i32> undef)
>>>> > >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for
>>>> instruction: ret i32 undef
>>>> > >> >> > -;
>>>> > >> >> > -; AVX512-LABEL: 'reduce_i32'
>>>> > >> >> > -; AVX512-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> instruction: %V2 = call i32 @llvm.experimental.vector.reduce.add.v2i32(<2 x
>>>> i32> undef)
>>>> > >> >> > -; AVX512-NEXT:  Cost Model: Found an estimated cost of 3 for
>>>> i
>>>>
>>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20190829/d14547a9/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4849 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20190829/d14547a9/attachment-0001.bin>


More information about the llvm-commits mailing list