[llvm] r368183 - Recommit r367901 "[X86] Enable -x86-experimental-vector-widening-legalization by default."

Topper, Craig via llvm-commits llvm-commits at lists.llvm.org
Thu Aug 29 12:41:00 PDT 2019


That’s good news. For me anyway.   Are you still having other performance issues from the widening patch that weren’t fixed by r369628?

From: Jordan Rupprecht <rupprecht at google.com>
Sent: Thursday, August 29, 2019 11:44 AM
To: Craig Topper <craig.topper at gmail.com>
Cc: Eric Christopher <echristo at gmail.com>; llvm-commits <llvm-commits at lists.llvm.org>; Topper, Craig <craig.topper at intel.com>
Subject: Re: [llvm] r368183 - Recommit r367901 "[X86] Enable -x86-experimental-vector-widening-legalization by default."

Your r369628 patch is vindicated, our regressions seem to be coming from r369664 (assuming this patch affects all the regressed benchmarks we saw)

On Thu, Aug 29, 2019 at 9:33 AM Jordan Rupprecht <rupprecht at google.com<mailto:rupprecht at google.com>> wrote:
Hi Craig,

It looks like we don't see any regressions in polybench with a release @r369822.

However, we have a new regression (-20% in stanford puzzle/shootout sieve in llvm singlesource benchmarks) somewhere between r369600 and r369679. I hope it's not r369628 :(

I'll be continuing the root causing today

On Fri, Aug 23, 2019 at 6:42 PM Craig Topper via llvm-commits <llvm-commits at lists.llvm.org<mailto:llvm-commits at lists.llvm.org>> wrote:
Eric,

Did that fix recover any of the performance issues? I think it fixed the big issue in polybench.

~Craig


On Thu, Aug 22, 2019 at 1:19 AM Craig Topper <craig.topper at gmail.com<mailto:craig.topper at gmail.com>> wrote:
I just commited r369628 which should hopefully fix trisolv from that set of tests. I haven't investigated any other tests yet.

~Craig


On Wed, Aug 21, 2019 at 5:48 PM Eric Christopher <echristo at gmail.com<mailto:echristo at gmail.com>> wrote:
FWIW one set of benchmarks that regressed for us were the polybench
ones in the testsuite on Haswell and Sandybridge (Skylake was a net
win iirc). Something immediately public you could take a look at.

-eric

On Tue, Aug 20, 2019 at 10:11 AM Eric Christopher <echristo at gmail.com<mailto:echristo at gmail.com>> wrote:
>
> Thanks Craig!
>
> -eric
>
> On Mon, Aug 19, 2019 at 11:59 PM Craig Topper <craig.topper at gmail.com<mailto:craig.topper at gmail.com>> wrote:
> >
> > -x86-experimental-vector-widening-legalization command line flag has been restored in r369332. It defaults to true which enables the new behavior. Setting it to false should hopefully restore the old behavior. I can't guarantee that though since we no tests for it. And I have no way of knowing if we break anything going foward. So hopefully we can get the issues resolved quickly and not have to depend on this for long.
> >
> > ~Craig
> >
> >
> > On Mon, Aug 19, 2019 at 10:28 PM Eric Christopher <echristo at gmail.com<mailto:echristo at gmail.com>> wrote:
> >>
> >> I do apologize, we've only just gotten to performance testing and the
> >> numbers are pretty exciting, but unfortunately many more negatives
> >> than positives. We'll definitely work with you on testing and
> >> performance analysis if that will help?
> >>
> >> Thanks!
> >>
> >> -eric
> >>
> >> On Mon, Aug 19, 2019 at 10:23 PM Craig Topper <craig.topper at gmail.com<mailto:craig.topper at gmail.com>> wrote:
> >> >
> >> > There have been quite a lot of follow on patches to this. A lot of them would need be reverted to get back to the old state. I can start trying to put that together.
> >> >
> >> > ~Craig
> >> >
> >> >
> >> > On Mon, Aug 19, 2019 at 9:55 PM Eric Christopher via llvm-commits <llvm-commits at lists.llvm.org<mailto:llvm-commits at lists.llvm.org>> wrote:
> >> >>
> >> >> HI Craig,
> >> >>
> >> >> We're seeing a rather lot of performance regressions with this enabled
> >> >> by default. Is it possible to get it turned on under a command flag
> >> >> for the near term while we work on getting you a pile of testcases
> >> >> (some of it is Eigen and those will at least be easier as you have
> >> >> access to that source :)
> >> >>
> >> >> Thoughts?
> >> >>
> >> >> Thanks!
> >> >>
> >> >> -eric
> >> >>
> >> >> On Wed, Aug 7, 2019 at 9:23 AM Craig Topper via llvm-commits
> >> >> <llvm-commits at lists.llvm.org<mailto:llvm-commits at lists.llvm.org>> wrote:
> >> >> >
> >> >> > Author: ctopper
> >> >> > Date: Wed Aug  7 09:24:26 2019
> >> >> > New Revision: 368183
> >> >> >
> >> >> > URL: http://llvm.org/viewvc/llvm-project?rev=368183&view=rev
> >> >> > Log:
> >> >> > Recommit r367901 "[X86] Enable -x86-experimental-vector-widening-legalization by default."
> >> >> >
> >> >> > The assert that caused this to be reverted should be fixed now.
> >> >> >
> >> >> > Original commit message:
> >> >> >
> >> >> > This patch changes our defualt legalization behavior for 16, 32, and
> >> >> > 64 bit vectors with i8/i16/i32/i64 scalar types from promotion to
> >> >> > widening. For example, v8i8 will now be widened to v16i8 instead of
> >> >> > promoted to v8i16. This keeps the elements widths the same and pads
> >> >> > with undef elements. We believe this is a better legalization strategy.
> >> >> > But it carries some issues due to the fragmented vector ISA. For
> >> >> > example, i8 shifts and multiplies get widened and then later have
> >> >> > to be promoted/split into vXi16 vectors.
> >> >> >
> >> >> > This has the potential to cause regressions so we wanted to get
> >> >> > it in early in the 10.0 cycle so we have plenty of time to
> >> >> > address them.
> >> >> >
> >> >> > Next steps will be to merge tests that explicitly test the command
> >> >> > line option. And then we can remove the option and its associated
> >> >> > code.
> >> >> >
> >> >> > Removed:
> >> >> >     llvm/trunk/test/Analysis/CostModel/X86/reduce-add-widen.ll
> >> >> > Modified:
> >> >> >     llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
> >> >> >     llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp
> >> >> >     llvm/trunk/test/Analysis/CostModel/X86/alternate-shuffle-cost.ll
> >> >> >     llvm/trunk/test/Analysis/CostModel/X86/arith.ll
> >> >> >     llvm/trunk/test/Analysis/CostModel/X86/cast.ll
> >> >> >     llvm/trunk/test/Analysis/CostModel/X86/fptosi.ll
> >> >> >     llvm/trunk/test/Analysis/CostModel/X86/fptoui.ll
> >> >> >     llvm/trunk/test/Analysis/CostModel/X86/masked-intrinsic-cost.ll
> >> >> >     llvm/trunk/test/Analysis/CostModel/X86/reduce-add.ll
> >> >> >     llvm/trunk/test/Analysis/CostModel/X86/reduce-and.ll
> >> >> >     llvm/trunk/test/Analysis/CostModel/X86/reduce-mul.ll
> >> >> >     llvm/trunk/test/Analysis/CostModel/X86/reduce-or.ll
> >> >> >     llvm/trunk/test/Analysis/CostModel/X86/reduce-smax.ll
> >> >> >     llvm/trunk/test/Analysis/CostModel/X86/reduce-smin.ll
> >> >> >     llvm/trunk/test/Analysis/CostModel/X86/reduce-umax.ll
> >> >> >     llvm/trunk/test/Analysis/CostModel/X86/reduce-umin.ll
> >> >> >     llvm/trunk/test/Analysis/CostModel/X86/reduce-xor.ll
> >> >> >     llvm/trunk/test/Analysis/CostModel/X86/shuffle-transpose.ll
> >> >> >     llvm/trunk/test/Analysis/CostModel/X86/sitofp.ll
> >> >> >     llvm/trunk/test/Analysis/CostModel/X86/slm-arith-costs.ll
> >> >> >     llvm/trunk/test/Analysis/CostModel/X86/testshiftashr.ll
> >> >> >     llvm/trunk/test/Analysis/CostModel/X86/testshiftlshr.ll
> >> >> >     llvm/trunk/test/Analysis/CostModel/X86/testshiftshl.ll
> >> >> >     llvm/trunk/test/Analysis/CostModel/X86/uitofp.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/2008-09-05-sinttofp-2xi32.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/2009-06-05-VZextByteShort.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/2011-10-19-LegelizeLoad.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/2011-12-28-vselecti8.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/2011-12-8-bitcastintprom.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/2012-01-18-vbitcast.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/2012-03-15-build_vector_wl.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/2012-07-10-extload64.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/3dnow-intrinsics.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/4char-promote.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/and-load-fold.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/atomic-unordered.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/avg.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/avx-cvt-2.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/avx-fp2int.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/avx2-conversions.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/avx2-masked-gather.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/avx2-vbroadcast.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/avx512-any_extend_load.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/avx512-cvt.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/avx512-ext.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/avx512-intrinsics-upgrade.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/avx512-mask-op.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/avx512-trunc.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/avx512-vec-cmp.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/avx512-vec3-crash.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/avx512bwvl-intrinsics-upgrade.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/avx512vl-intrinsics-fast-isel.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/avx512vl-intrinsics-upgrade.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/bitcast-and-setcc-128.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/bitcast-setcc-128.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/bitcast-vector-bool.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/bitreverse.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/bswap-vector.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/buildvec-insertvec.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/combine-64bit-vec-binop.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/combine-or.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/complex-fastmath.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/cvtv2f32.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/extract-concat.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/extract-insert.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/f16c-intrinsics.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/fold-vector-sext-zext.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/insertelement-shuffle.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/known-bits.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/load-partial.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/lower-bitcast.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/madd.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/masked_compressstore.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/masked_expandload.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/masked_gather_scatter.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/masked_gather_scatter_widen.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/masked_load.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/masked_store.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/masked_store_trunc.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/masked_store_trunc_ssat.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/masked_store_trunc_usat.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/merge-consecutive-loads-256.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/mmx-arg-passing-x86-64.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/mmx-arith.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/mmx-cvt.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/mulvi32.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/oddshuffles.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/oddsubvector.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/pmaddubsw.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/pmulh.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/pointer-vector.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/pr14161.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/pr35918.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/pr40994.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/promote-vec3.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/promote.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/psubus.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/ret-mmx.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/sad.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/sadd_sat_vec.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/scalar_widen_div.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/select.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/shift-combine.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/shrink_vmul.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/shuffle-strided-with-offset-128.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/shuffle-strided-with-offset-256.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/shuffle-strided-with-offset-512.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/shuffle-vs-trunc-128.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/shuffle-vs-trunc-256.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/shuffle-vs-trunc-512.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/slow-pmulld.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/sse2-intrinsics-canonical.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/sse2-vector-shifts.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/ssub_sat_vec.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/test-shrink-bug.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/trunc-ext-ld-st.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/trunc-subvector.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/uadd_sat_vec.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/unfold-masked-merge-vector-variablemask.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/usub_sat_vec.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vec_cast2.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vec_cast3.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vec_ctbits.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vec_extract-mmx.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vec_fp_to_int.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vec_insert-5.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vec_insert-7.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vec_insert-mmx.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vec_int_to_fp.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vec_saddo.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vec_smulo.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vec_ssubo.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vec_uaddo.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vec_umulo.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vec_usubo.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vector-blend.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vector-ext-logic.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vector-gep.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vector-half-conversions.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vector-idiv-v2i32.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vector-narrow-binop.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vector-reduce-add.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vector-reduce-and-bool.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vector-reduce-and.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vector-reduce-mul.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vector-reduce-or-bool.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vector-reduce-or.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vector-reduce-smax.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vector-reduce-smin.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vector-reduce-umax.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vector-reduce-umin.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vector-reduce-xor-bool.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vector-reduce-xor.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vector-sext.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vector-shift-ashr-sub128.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vector-shift-by-select-loop.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vector-shift-lshr-sub128.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vector-shift-shl-sub128.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vector-shuffle-128-v16.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vector-shuffle-combining.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vector-trunc-packus.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vector-trunc-ssat.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vector-trunc-usat.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vector-trunc.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vector-truncate-combine.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vector-zext.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vsel-cmp-load.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vselect-avx.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vselect.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/vshift-4.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/widen_arith-1.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/widen_arith-2.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/widen_arith-3.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/widen_bitops-0.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/widen_cast-1.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/widen_cast-2.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/widen_cast-3.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/widen_cast-4.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/widen_cast-5.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/widen_cast-6.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/widen_compare-1.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/widen_conv-1.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/widen_conv-2.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/widen_conv-3.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/widen_conv-4.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/widen_load-2.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/widen_shuffle-1.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/x86-interleaved-access.ll
> >> >> >     llvm/trunk/test/CodeGen/X86/x86-shifts.ll
> >> >> >     llvm/trunk/test/Transforms/SLPVectorizer/X86/blending-shuffle.ll
> >> >> >     llvm/trunk/test/Transforms/SLPVectorizer/X86/fptosi.ll
> >> >> >     llvm/trunk/test/Transforms/SLPVectorizer/X86/fptoui.ll
> >> >> >     llvm/trunk/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll
> >> >> >     llvm/trunk/test/Transforms/SLPVectorizer/X86/sitofp.ll
> >> >> >     llvm/trunk/test/Transforms/SLPVectorizer/X86/uitofp.ll
> >> >> >
> >> >> > Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
> >> >> > URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=368183&r1=368182&r2=368183&view=diff
> >> >> > ==============================================================================
> >> >> > --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
> >> >> > +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Wed Aug  7 09:24:26 2019
> >> >> > @@ -66,7 +66,7 @@ using namespace llvm;
> >> >> >  STATISTIC(NumTailCalls, "Number of tail calls");
> >> >> >
> >> >> >  static cl::opt<bool> ExperimentalVectorWideningLegalization(
> >> >> > -    "x86-experimental-vector-widening-legalization", cl::init(false),
> >> >> > +    "x86-experimental-vector-widening-legalization", cl::init(true),
> >> >> >      cl::desc("Enable an experimental vector type legalization through widening "
> >> >> >               "rather than promotion."),
> >> >> >      cl::Hidden);
> >> >> > @@ -40453,8 +40453,7 @@ static SDValue combineStore(SDNode *N, S
> >> >> >    bool NoImplicitFloatOps = F.hasFnAttribute(Attribute::NoImplicitFloat);
> >> >> >    bool F64IsLegal =
> >> >> >        !Subtarget.useSoftFloat() && !NoImplicitFloatOps && Subtarget.hasSSE2();
> >> >> > -  if (((VT.isVector() && !VT.isFloatingPoint()) ||
> >> >> > -       (VT == MVT::i64 && F64IsLegal && !Subtarget.is64Bit())) &&
> >> >> > +  if ((VT == MVT::i64 && F64IsLegal && !Subtarget.is64Bit()) &&
> >> >> >        isa<LoadSDNode>(St->getValue()) &&
> >> >> >        !cast<LoadSDNode>(St->getValue())->isVolatile() &&
> >> >> >        St->getChain().hasOneUse() && !St->isVolatile()) {
> >> >> >
> >> >> > Modified: llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp
> >> >> > URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp?rev=368183&r1=368182&r2=368183&view=diff
> >> >> > ==============================================================================
> >> >> > --- llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp (original)
> >> >> > +++ llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp Wed Aug  7 09:24:26 2019
> >> >> > @@ -887,7 +887,7 @@ int X86TTIImpl::getArithmeticInstrCost(
> >> >> >  int X86TTIImpl::getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index,
> >> >> >                                 Type *SubTp) {
> >> >> >    // 64-bit packed float vectors (v2f32) are widened to type v4f32.
> >> >> > -  // 64-bit packed integer vectors (v2i32) are promoted to type v2i64.
> >> >> > +  // 64-bit packed integer vectors (v2i32) are widened to type v4i32.
> >> >> >    std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Tp);
> >> >> >
> >> >> >    // Treat Transpose as 2-op shuffles - there's no difference in lowering.
> >> >> > @@ -2425,14 +2425,6 @@ int X86TTIImpl::getAddressComputationCos
> >> >> >
> >> >> >  int X86TTIImpl::getArithmeticReductionCost(unsigned Opcode, Type *ValTy,
> >> >> >                                             bool IsPairwise) {
> >> >> > -
> >> >> > -  std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, ValTy);
> >> >> > -
> >> >> > -  MVT MTy = LT.second;
> >> >> > -
> >> >> > -  int ISD = TLI->InstructionOpcodeToISD(Opcode);
> >> >> > -  assert(ISD && "Invalid opcode");
> >> >> > -
> >> >> >    // We use the Intel Architecture Code Analyzer(IACA) to measure the throughput
> >> >> >    // and make it as the cost.
> >> >> >
> >> >> > @@ -2440,7 +2432,10 @@ int X86TTIImpl::getArithmeticReductionCo
> >> >> >      { ISD::FADD,  MVT::v2f64,   2 },
> >> >> >      { ISD::FADD,  MVT::v4f32,   4 },
> >> >> >      { ISD::ADD,   MVT::v2i64,   2 },      // The data reported by the IACA tool is "1.6".
> >> >> > +    { ISD::ADD,   MVT::v2i32,   2 }, // FIXME: chosen to be less than v4i32.
> >> >> >      { ISD::ADD,   MVT::v4i32,   3 },      // The data reported by the IACA tool is "3.5".
> >> >> > +    { ISD::ADD,   MVT::v2i16,   3 }, // FIXME: chosen to be less than v4i16
> >> >> > +    { ISD::ADD,   MVT::v4i16,   4 }, // FIXME: chosen to be less than v8i16
> >> >> >      { ISD::ADD,   MVT::v8i16,   5 },
> >> >> >    };
> >> >> >
> >> >> > @@ -2449,8 +2444,11 @@ int X86TTIImpl::getArithmeticReductionCo
> >> >> >      { ISD::FADD,  MVT::v4f64,   5 },
> >> >> >      { ISD::FADD,  MVT::v8f32,   7 },
> >> >> >      { ISD::ADD,   MVT::v2i64,   1 },      // The data reported by the IACA tool is "1.5".
> >> >> > +    { ISD::ADD,   MVT::v2i32,   2 }, // FIXME: chosen to be less than v4i32
> >> >> >      { ISD::ADD,   MVT::v4i32,   3 },      // The data reported by the IACA tool is "3.5".
> >> >> >      { ISD::ADD,   MVT::v4i64,   5 },      // The data reported by the IACA tool is "4.8".
> >> >> > +    { ISD::ADD,   MVT::v2i16,   3 }, // FIXME: chosen to be less than v4i16
> >> >> > +    { ISD::ADD,   MVT::v4i16,   4 }, // FIXME: chosen to be less than v8i16
> >> >> >      { ISD::ADD,   MVT::v8i16,   5 },
> >> >> >      { ISD::ADD,   MVT::v8i32,   5 },
> >> >> >    };
> >> >> > @@ -2459,7 +2457,10 @@ int X86TTIImpl::getArithmeticReductionCo
> >> >> >      { ISD::FADD,  MVT::v2f64,   2 },
> >> >> >      { ISD::FADD,  MVT::v4f32,   4 },
> >> >> >      { ISD::ADD,   MVT::v2i64,   2 },      // The data reported by the IACA tool is "1.6".
> >> >> > +    { ISD::ADD,   MVT::v2i32,   2 }, // FIXME: chosen to be less than v4i32
> >> >> >      { ISD::ADD,   MVT::v4i32,   3 },      // The data reported by the IACA tool is "3.3".
> >> >> > +    { ISD::ADD,   MVT::v2i16,   2 },      // The data reported by the IACA tool is "4.3".
> >> >> > +    { ISD::ADD,   MVT::v4i16,   3 },      // The data reported by the IACA tool is "4.3".
> >> >> >      { ISD::ADD,   MVT::v8i16,   4 },      // The data reported by the IACA tool is "4.3".
> >> >> >    };
> >> >> >
> >> >> > @@ -2468,12 +2469,47 @@ int X86TTIImpl::getArithmeticReductionCo
> >> >> >      { ISD::FADD,  MVT::v4f64,   3 },
> >> >> >      { ISD::FADD,  MVT::v8f32,   4 },
> >> >> >      { ISD::ADD,   MVT::v2i64,   1 },      // The data reported by the IACA tool is "1.5".
> >> >> > +    { ISD::ADD,   MVT::v2i32,   2 }, // FIXME: chosen to be less than v4i32
> >> >> >      { ISD::ADD,   MVT::v4i32,   3 },      // The data reported by the IACA tool is "2.8".
> >> >> >      { ISD::ADD,   MVT::v4i64,   3 },
> >> >> > +    { ISD::ADD,   MVT::v2i16,   2 },      // The data reported by the IACA tool is "4.3".
> >> >> > +    { ISD::ADD,   MVT::v4i16,   3 },      // The data reported by the IACA tool is "4.3".
> >> >> >      { ISD::ADD,   MVT::v8i16,   4 },
> >> >> >      { ISD::ADD,   MVT::v8i32,   5 },
> >> >> >    };
> >> >> >
> >> >> > +  int ISD = TLI->InstructionOpcodeToISD(Opcode);
> >> >> > +  assert(ISD && "Invalid opcode");
> >> >> > +
> >> >> > +  // Before legalizing the type, give a chance to look up illegal narrow types
> >> >> > +  // in the table.
> >> >> > +  // FIXME: Is there a better way to do this?
> >> >> > +  EVT VT = TLI->getValueType(DL, ValTy);
> >> >> > +  if (VT.isSimple()) {
> >> >> > +    MVT MTy = VT.getSimpleVT();
> >> >> > +    if (IsPairwise) {
> >> >> > +      if (ST->hasAVX())
> >> >> > +        if (const auto *Entry = CostTableLookup(AVX1CostTblPairWise, ISD, MTy))
> >> >> > +          return Entry->Cost;
> >> >> > +
> >> >> > +      if (ST->hasSSE42())
> >> >> > +        if (const auto *Entry = CostTableLookup(SSE42CostTblPairWise, ISD, MTy))
> >> >> > +          return Entry->Cost;
> >> >> > +    } else {
> >> >> > +      if (ST->hasAVX())
> >> >> > +        if (const auto *Entry = CostTableLookup(AVX1CostTblNoPairWise, ISD, MTy))
> >> >> > +          return Entry->Cost;
> >> >> > +
> >> >> > +      if (ST->hasSSE42())
> >> >> > +        if (const auto *Entry = CostTableLookup(SSE42CostTblNoPairWise, ISD, MTy))
> >> >> > +          return Entry->Cost;
> >> >> > +    }
> >> >> > +  }
> >> >> > +
> >> >> > +  std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, ValTy);
> >> >> > +
> >> >> > +  MVT MTy = LT.second;
> >> >> > +
> >> >> >    if (IsPairwise) {
> >> >> >      if (ST->hasAVX())
> >> >> >        if (const auto *Entry = CostTableLookup(AVX1CostTblPairWise, ISD, MTy))
> >> >> >
> >> >> > Modified: llvm/trunk/test/Analysis/CostModel/X86/alternate-shuffle-cost.ll
> >> >> > URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Analysis/CostModel/X86/alternate-shuffle-cost.ll?rev=368183&r1=368182&r2=368183&view=diff
> >> >> > ==============================================================================
> >> >> > --- llvm/trunk/test/Analysis/CostModel/X86/alternate-shuffle-cost.ll (original)
> >> >> > +++ llvm/trunk/test/Analysis/CostModel/X86/alternate-shuffle-cost.ll Wed Aug  7 09:24:26 2019
> >> >> > @@ -18,9 +18,21 @@
> >> >> >  ; 64-bit packed float vectors (v2f32) are widened to type v4f32.
> >> >> >
> >> >> >  define <2 x i32> @test_v2i32(<2 x i32> %a, <2 x i32> %b) {
> >> >> > -; CHECK-LABEL: 'test_v2i32'
> >> >> > -; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32 0, i32 3>
> >> >> > -; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %1
> >> >> > +; SSE2-LABEL: 'test_v2i32'
> >> >> > +; SSE2-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32 0, i32 3>
> >> >> > +; SSE2-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %1
> >> >> > +;
> >> >> > +; SSSE3-LABEL: 'test_v2i32'
> >> >> > +; SSSE3-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32 0, i32 3>
> >> >> > +; SSSE3-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %1
> >> >> > +;
> >> >> > +; SSE42-LABEL: 'test_v2i32'
> >> >> > +; SSE42-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32 0, i32 3>
> >> >> > +; SSE42-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %1
> >> >> > +;
> >> >> > +; AVX-LABEL: 'test_v2i32'
> >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32 0, i32 3>
> >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %1
> >> >> >  ;
> >> >> >  ; BTVER2-LABEL: 'test_v2i32'
> >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32 0, i32 3>
> >> >> > @@ -56,9 +68,21 @@ define <2 x float> @test_v2f32(<2 x floa
> >> >> >  }
> >> >> >
> >> >> >  define <2 x i32> @test_v2i32_2(<2 x i32> %a, <2 x i32> %b) {
> >> >> > -; CHECK-LABEL: 'test_v2i32_2'
> >> >> > -; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32 2, i32 1>
> >> >> > -; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %1
> >> >> > +; SSE2-LABEL: 'test_v2i32_2'
> >> >> > +; SSE2-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32 2, i32 1>
> >> >> > +; SSE2-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %1
> >> >> > +;
> >> >> > +; SSSE3-LABEL: 'test_v2i32_2'
> >> >> > +; SSSE3-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32 2, i32 1>
> >> >> > +; SSSE3-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %1
> >> >> > +;
> >> >> > +; SSE42-LABEL: 'test_v2i32_2'
> >> >> > +; SSE42-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32 2, i32 1>
> >> >> > +; SSE42-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %1
> >> >> > +;
> >> >> > +; AVX-LABEL: 'test_v2i32_2'
> >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32 2, i32 1>
> >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %1
> >> >> >  ;
> >> >> >  ; BTVER2-LABEL: 'test_v2i32_2'
> >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32 2, i32 1>
> >> >> >
> >> >> > Modified: llvm/trunk/test/Analysis/CostModel/X86/arith.ll
> >> >> > URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Analysis/CostModel/X86/arith.ll?rev=368183&r1=368182&r2=368183&view=diff
> >> >> > ==============================================================================
> >> >> > --- llvm/trunk/test/Analysis/CostModel/X86/arith.ll (original)
> >> >> > +++ llvm/trunk/test/Analysis/CostModel/X86/arith.ll Wed Aug  7 09:24:26 2019
> >> >> > @@ -1342,36 +1342,32 @@ define i32 @mul(i32 %arg) {
> >> >> >  ; A <2 x i64> vector multiply is implemented using
> >> >> >  ; 3 PMULUDQ and 2 PADDS and 4 shifts.
> >> >> >  define void @mul_2i32() {
> >> >> > -; SSE-LABEL: 'mul_2i32'
> >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %A0 = mul <2 x i32> undef, undef
> >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
> >> >> > +; SSSE3-LABEL: 'mul_2i32'
> >> >> > +; SSSE3-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %A0 = mul <2 x i32> undef, undef
> >> >> > +; SSSE3-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
> >> >> > +;
> >> >> > +; SSE42-LABEL: 'mul_2i32'
> >> >> > +; SSE42-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %A0 = mul <2 x i32> undef, undef
> >> >> > +; SSE42-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
> >> >> >  ;
> >> >> >  ; AVX-LABEL: 'mul_2i32'
> >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %A0 = mul <2 x i32> undef, undef
> >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %A0 = mul <2 x i32> undef, undef
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
> >> >> >  ;
> >> >> > -; AVX512F-LABEL: 'mul_2i32'
> >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %A0 = mul <2 x i32> undef, undef
> >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
> >> >> > -;
> >> >> > -; AVX512BW-LABEL: 'mul_2i32'
> >> >> > -; AVX512BW-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %A0 = mul <2 x i32> undef, undef
> >> >> > -; AVX512BW-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
> >> >> > -;
> >> >> > -; AVX512DQ-LABEL: 'mul_2i32'
> >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %A0 = mul <2 x i32> undef, undef
> >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
> >> >> > +; AVX512-LABEL: 'mul_2i32'
> >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %A0 = mul <2 x i32> undef, undef
> >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
> >> >> >  ;
> >> >> >  ; SLM-LABEL: 'mul_2i32'
> >> >> > -; SLM-NEXT:  Cost Model: Found an estimated cost of 17 for instruction: %A0 = mul <2 x i32> undef, undef
> >> >> > +; SLM-NEXT:  Cost Model: Found an estimated cost of 11 for instruction: %A0 = mul <2 x i32> undef, undef
> >> >> >  ; SLM-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
> >> >> >  ;
> >> >> >  ; GLM-LABEL: 'mul_2i32'
> >> >> > -; GLM-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %A0 = mul <2 x i32> undef, undef
> >> >> > +; GLM-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %A0 = mul <2 x i32> undef, undef
> >> >> >  ; GLM-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
> >> >> >  ;
> >> >> >  ; BTVER2-LABEL: 'mul_2i32'
> >> >> > -; BTVER2-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %A0 = mul <2 x i32> undef, undef
> >> >> > +; BTVER2-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %A0 = mul <2 x i32> undef, undef
> >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
> >> >> >  ;
> >> >> >    %A0 = mul <2 x i32> undef, undef
> >> >> >
> >> >> > Modified: llvm/trunk/test/Analysis/CostModel/X86/cast.ll
> >> >> > URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Analysis/CostModel/X86/cast.ll?rev=368183&r1=368182&r2=368183&view=diff
> >> >> > ==============================================================================
> >> >> > --- llvm/trunk/test/Analysis/CostModel/X86/cast.ll (original)
> >> >> > +++ llvm/trunk/test/Analysis/CostModel/X86/cast.ll Wed Aug  7 09:24:26 2019
> >> >> > @@ -315,10 +315,10 @@ define void @sitofp4(<4 x i1> %a, <4 x i
> >> >> >  ; SSE-LABEL: 'sitofp4'
> >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %A1 = sitofp <4 x i1> %a to <4 x float>
> >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 40 for instruction: %A2 = sitofp <4 x i1> %a to <4 x double>
> >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %B1 = sitofp <4 x i8> %b to <4 x float>
> >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 40 for instruction: %B2 = sitofp <4 x i8> %b to <4 x double>
> >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %C1 = sitofp <4 x i16> %c to <4 x float>
> >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 40 for instruction: %C2 = sitofp <4 x i16> %c to <4 x double>
> >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %B1 = sitofp <4 x i8> %b to <4 x float>
> >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 160 for instruction: %B2 = sitofp <4 x i8> %b to <4 x double>
> >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %C1 = sitofp <4 x i16> %c to <4 x float>
> >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 80 for instruction: %C2 = sitofp <4 x i16> %c to <4 x double>
> >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %D1 = sitofp <4 x i32> %d to <4 x float>
> >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 40 for instruction: %D2 = sitofp <4 x i32> %d to <4 x double>
> >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
> >> >> > @@ -359,7 +359,7 @@ define void @sitofp4(<4 x i1> %a, <4 x i
> >> >> >  define void @sitofp8(<8 x i1> %a, <8 x i8> %b, <8 x i16> %c, <8 x i32> %d) {
> >> >> >  ; SSE-LABEL: 'sitofp8'
> >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %A1 = sitofp <8 x i1> %a to <8 x float>
> >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %B1 = sitofp <8 x i8> %b to <8 x float>
> >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %B1 = sitofp <8 x i8> %b to <8 x float>
> >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %C1 = sitofp <8 x i16> %c to <8 x float>
> >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %D1 = sitofp <8 x i32> %d to <8 x float>
> >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
> >> >> > @@ -390,9 +390,9 @@ define void @uitofp4(<4 x i1> %a, <4 x i
> >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %A1 = uitofp <4 x i1> %a to <4 x float>
> >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 40 for instruction: %A2 = uitofp <4 x i1> %a to <4 x double>
> >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %B1 = uitofp <4 x i8> %b to <4 x float>
> >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 40 for instruction: %B2 = uitofp <4 x i8> %b to <4 x double>
> >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %C1 = uitofp <4 x i16> %c to <4 x float>
> >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 40 for instruction: %C2 = uitofp <4 x i16> %c to <4 x double>
> >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 160 for instruction: %B2 = uitofp <4 x i8> %b to <4 x double>
> >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %C1 = uitofp <4 x i16> %c to <4 x float>
> >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 80 for instruction: %C2 = uitofp <4 x i16> %c to <4 x double>
> >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %D1 = uitofp <4 x i32> %d to <4 x float>
> >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 40 for instruction: %D2 = uitofp <4 x i32> %d to <4 x double>
> >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
> >> >> > @@ -433,7 +433,7 @@ define void @uitofp4(<4 x i1> %a, <4 x i
> >> >> >  define void @uitofp8(<8 x i1> %a, <8 x i8> %b, <8 x i16> %c, <8 x i32> %d) {
> >> >> >  ; SSE-LABEL: 'uitofp8'
> >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %A1 = uitofp <8 x i1> %a to <8 x float>
> >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %B1 = uitofp <8 x i8> %b to <8 x float>
> >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %B1 = uitofp <8 x i8> %b to <8 x float>
> >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %C1 = uitofp <8 x i16> %c to <8 x float>
> >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %D1 = uitofp <8 x i32> %d to <8 x float>
> >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
> >> >> >
> >> >> > Modified: llvm/trunk/test/Analysis/CostModel/X86/fptosi.ll
> >> >> > URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Analysis/CostModel/X86/fptosi.ll?rev=368183&r1=368182&r2=368183&view=diff
> >> >> > ==============================================================================
> >> >> > --- llvm/trunk/test/Analysis/CostModel/X86/fptosi.ll (original)
> >> >> > +++ llvm/trunk/test/Analysis/CostModel/X86/fptosi.ll Wed Aug  7 09:24:26 2019
> >> >> > @@ -92,35 +92,28 @@ define i32 @fptosi_double_i32(i32 %arg)
> >> >> >  define i32 @fptosi_double_i16(i32 %arg) {
> >> >> >  ; SSE-LABEL: 'fptosi_double_i16'
> >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptosi double undef to i16
> >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>
> >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 13 for instruction: %V4I16 = fptosi <4 x double> undef to <4 x i16>
> >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 27 for instruction: %V8I16 = fptosi <8 x double> undef to <8 x i16>
> >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>
> >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V4I16 = fptosi <4 x double> undef to <4 x i16>
> >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %V8I16 = fptosi <8 x double> undef to <8 x i16>
> >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> >  ;
> >> >> >  ; AVX-LABEL: 'fptosi_double_i16'
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptosi double undef to i16
> >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>
> >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptosi <4 x double> undef to <4 x i16>
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V8I16 = fptosi <8 x double> undef to <8 x i16>
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> >  ;
> >> >> > -; AVX512F-LABEL: 'fptosi_double_i16'
> >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptosi double undef to i16
> >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>
> >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptosi <4 x double> undef to <4 x i16>
> >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = fptosi <8 x double> undef to <8 x i16>
> >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> > -;
> >> >> > -; AVX512DQ-LABEL: 'fptosi_double_i16'
> >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptosi double undef to i16
> >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>
> >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptosi <4 x double> undef to <4 x i16>
> >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = fptosi <8 x double> undef to <8 x i16>
> >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> > +; AVX512-LABEL: 'fptosi_double_i16'
> >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptosi double undef to i16
> >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>
> >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptosi <4 x double> undef to <4 x i16>
> >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = fptosi <8 x double> undef to <8 x i16>
> >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> >  ;
> >> >> >  ; BTVER2-LABEL: 'fptosi_double_i16'
> >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptosi double undef to i16
> >> >> > -; BTVER2-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>
> >> >> > +; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>
> >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptosi <4 x double> undef to <4 x i16>
> >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V8I16 = fptosi <8 x double> undef to <8 x i16>
> >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> > @@ -143,29 +136,22 @@ define i32 @fptosi_double_i8(i32 %arg) {
> >> >> >  ; AVX-LABEL: 'fptosi_double_i8'
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptosi double undef to i8
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V2I8 = fptosi <2 x double> undef to <2 x i8>
> >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I8 = fptosi <4 x double> undef to <4 x i8>
> >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V8I8 = fptosi <8 x double> undef to <8 x i8>
> >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V4I8 = fptosi <4 x double> undef to <4 x i8>
> >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 25 for instruction: %V8I8 = fptosi <8 x double> undef to <8 x i8>
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> >  ;
> >> >> > -; AVX512F-LABEL: 'fptosi_double_i8'
> >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptosi double undef to i8
> >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V2I8 = fptosi <2 x double> undef to <2 x i8>
> >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I8 = fptosi <4 x double> undef to <4 x i8>
> >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I8 = fptosi <8 x double> undef to <8 x i8>
> >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> > -;
> >> >> > -; AVX512DQ-LABEL: 'fptosi_double_i8'
> >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptosi double undef to i8
> >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I8 = fptosi <2 x double> undef to <2 x i8>
> >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I8 = fptosi <4 x double> undef to <4 x i8>
> >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I8 = fptosi <8 x double> undef to <8 x i8>
> >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> > +; AVX512-LABEL: 'fptosi_double_i8'
> >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptosi double undef to i8
> >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I8 = fptosi <2 x double> undef to <2 x i8>
> >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I8 = fptosi <4 x double> undef to <4 x i8>
> >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I8 = fptosi <8 x double> undef to <8 x i8>
> >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> >  ;
> >> >> >  ; BTVER2-LABEL: 'fptosi_double_i8'
> >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptosi double undef to i8
> >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V2I8 = fptosi <2 x double> undef to <2 x i8>
> >> >> > -; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I8 = fptosi <4 x double> undef to <4 x i8>
> >> >> > -; BTVER2-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V8I8 = fptosi <8 x double> undef to <8 x i8>
> >> >> > +; BTVER2-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V4I8 = fptosi <4 x double> undef to <4 x i8>
> >> >> > +; BTVER2-NEXT:  Cost Model: Found an estimated cost of 25 for instruction: %V8I8 = fptosi <8 x double> undef to <8 x i8>
> >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> >  ;
> >> >> >    %I8 = fptosi double undef to i8
> >> >> > @@ -285,9 +271,9 @@ define i32 @fptosi_float_i16(i32 %arg) {
> >> >> >  define i32 @fptosi_float_i8(i32 %arg) {
> >> >> >  ; SSE-LABEL: 'fptosi_float_i8'
> >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptosi float undef to i8
> >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I8 = fptosi <4 x float> undef to <4 x i8>
> >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V8I8 = fptosi <8 x float> undef to <8 x i8>
> >> >> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %V16I8 = fptosi <16 x float> undef to <16 x i8>
> >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V4I8 = fptosi <4 x float> undef to <4 x i8>
> >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 25 for instruction: %V8I8 = fptosi <8 x float> undef to <8 x i8>
> >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 51 for instruction: %V16I8 = fptosi <16 x float> undef to <16 x i8>
> >> >> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> >  ;
> >> >> >  ; AVX-LABEL: 'fptosi_float_i8'
> >> >> >
> >> >> > Modified: llvm/trunk/test/Analysis/CostModel/X86/fptoui.ll
> >> >> > URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Analysis/CostModel/X86/fptoui.ll?rev=368183&r1=368182&r2=368183&view=diff
> >> >> > ==============================================================================
> >> >> > --- llvm/trunk/test/Analysis/CostModel/X86/fptoui.ll (original)
> >> >> > +++ llvm/trunk/test/Analysis/CostModel/X86/fptoui.ll Wed Aug  7 09:24:26 2019
> >> >> > @@ -68,19 +68,12 @@ define i32 @fptoui_double_i32(i32 %arg)
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 33 for instruction: %V8I32 = fptoui <8 x double> undef to <8 x i32>
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> >  ;
> >> >> > -; AVX512F-LABEL: 'fptoui_double_i32'
> >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui double undef to i32
> >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V2I32 = fptoui <2 x double> undef to <2 x i32>
> >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = fptoui <4 x double> undef to <4 x i32>
> >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I32 = fptoui <8 x double> undef to <8 x i32>
> >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> > -;
> >> >> > -; AVX512DQ-LABEL: 'fptoui_double_i32'
> >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui double undef to i32
> >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I32 = fptoui <2 x double> undef to <2 x i32>
> >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = fptoui <4 x double> undef to <4 x i32>
> >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I32 = fptoui <8 x double> undef to <8 x i32>
> >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> > +; AVX512-LABEL: 'fptoui_double_i32'
> >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui double undef to i32
> >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I32 = fptoui <2 x double> undef to <2 x i32>
> >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = fptoui <4 x double> undef to <4 x i32>
> >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I32 = fptoui <8 x double> undef to <8 x i32>
> >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> >  ;
> >> >> >  ; BTVER2-LABEL: 'fptoui_double_i32'
> >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui double undef to i32
> >> >> > @@ -106,30 +99,23 @@ define i32 @fptoui_double_i16(i32 %arg)
> >> >> >  ;
> >> >> >  ; AVX-LABEL: 'fptoui_double_i16'
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptoui double undef to i16
> >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V2I16 = fptoui <2 x double> undef to <2 x i16>
> >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V4I16 = fptoui <4 x double> undef to <4 x i16>
> >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 25 for instruction: %V8I16 = fptoui <8 x double> undef to <8 x i16>
> >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I16 = fptoui <2 x double> undef to <2 x i16>
> >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptoui <4 x double> undef to <4 x i16>
> >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V8I16 = fptoui <8 x double> undef to <8 x i16>
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> >  ;
> >> >> > -; AVX512F-LABEL: 'fptoui_double_i16'
> >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptoui double undef to i16
> >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V2I16 = fptoui <2 x double> undef to <2 x i16>
> >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptoui <4 x double> undef to <4 x i16>
> >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8I16 = fptoui <8 x double> undef to <8 x i16>
> >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> > -;
> >> >> > -; AVX512DQ-LABEL: 'fptoui_double_i16'
> >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptoui double undef to i16
> >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I16 = fptoui <2 x double> undef to <2 x i16>
> >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptoui <4 x double> undef to <4 x i16>
> >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8I16 = fptoui <8 x double> undef to <8 x i16>
> >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> > +; AVX512-LABEL: 'fptoui_double_i16'
> >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptoui double undef to i16
> >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I16 = fptoui <2 x double> undef to <2 x i16>
> >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptoui <4 x double> undef to <4 x i16>
> >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8I16 = fptoui <8 x double> undef to <8 x i16>
> >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> >  ;
> >> >> >  ; BTVER2-LABEL: 'fptoui_double_i16'
> >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptoui double undef to i16
> >> >> > -; BTVER2-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V2I16 = fptoui <2 x double> undef to <2 x i16>
> >> >> > -; BTVER2-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V4I16 = fptoui <4 x double> undef to <4 x i16>
> >> >> > -; BTVER2-NEXT:  Cost Model: Found an estimated cost of 25 for instruction: %V8I16 = fptoui <8 x double> undef to <8 x i16>
> >> >> > +; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I16 = fptoui <2 x double> undef to <2 x i16>
> >> >> > +; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptoui <4 x double> undef to <4 x i16>
> >> >> > +; BTVER2-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V8I16 = fptoui <8 x double> undef to <8 x i16>
> >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> >  ;
> >> >> >    %I16 = fptoui double undef to i16
> >> >> > @@ -154,19 +140,12 @@ define i32 @fptoui_double_i8(i32 %arg) {
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 25 for instruction: %V8I8 = fptoui <8 x double> undef to <8 x i8>
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> >  ;
> >> >> > -; AVX512F-LABEL: 'fptoui_double_i8'
> >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptoui double undef to i8
> >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V2I8 = fptoui <2 x double> undef to <2 x i8>
> >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I8 = fptoui <4 x double> undef to <4 x i8>
> >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8I8 = fptoui <8 x double> undef to <8 x i8>
> >> >> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> > -;
> >> >> > -; AVX512DQ-LABEL: 'fptoui_double_i8'
> >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptoui double undef to i8
> >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I8 = fptoui <2 x double> undef to <2 x i8>
> >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I8 = fptoui <4 x double> undef to <4 x i8>
> >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8I8 = fptoui <8 x double> undef to <8 x i8>
> >> >> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> > +; AVX512-LABEL: 'fptoui_double_i8'
> >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptoui double undef to i8
> >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I8 = fptoui <2 x double> undef to <2 x i8>
> >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I8 = fptoui <4 x double> undef to <4 x i8>
> >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8I8 = fptoui <8 x double> undef to <8 x i8>
> >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> >  ;
> >> >> >  ; BTVER2-LABEL: 'fptoui_double_i8'
> >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptoui double undef to i8
> >> >> > @@ -277,7 +256,7 @@ define i32 @fptoui_float_i16(i32 %arg) {
> >> >> >  ;
> >> >> >  ; AVX-LABEL: 'fptoui_float_i16'
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptoui float undef to i16
> >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V4I16 = fptoui <4 x float> undef to <4 x i16>
> >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptoui <4 x float> undef to <4 x i16>
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = fptoui <8 x float> undef to <8 x i16>
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V16I16 = fptoui <16 x float> undef to <16 x i16>
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> > @@ -291,7 +270,7 @@ define i32 @fptoui_float_i16(i32 %arg) {
> >> >> >  ;
> >> >> >  ; BTVER2-LABEL: 'fptoui_float_i16'
> >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptoui float undef to i16
> >> >> > -; BTVER2-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V4I16 = fptoui <4 x float> undef to <4 x i16>
> >> >> > +; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptoui <4 x float> undef to <4 x i16>
> >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = fptoui <8 x float> undef to <8 x i16>
> >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V16I16 = fptoui <16 x float> undef to <16 x i16>
> >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> > @@ -314,8 +293,8 @@ define i32 @fptoui_float_i8(i32 %arg) {
> >> >> >  ; AVX-LABEL: 'fptoui_float_i8'
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptoui float undef to i8
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V4I8 = fptoui <4 x float> undef to <4 x i8>
> >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I8 = fptoui <8 x float> undef to <8 x i8>
> >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V16I8 = fptoui <16 x float> undef to <16 x i8>
> >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 24 for instruction: %V8I8 = fptoui <8 x float> undef to <8 x i8>
> >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 49 for instruction: %V16I8 = fptoui <16 x float> undef to <16 x i8>
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> >  ;
> >> >> >  ; AVX512-LABEL: 'fptoui_float_i8'
> >> >> > @@ -328,8 +307,8 @@ define i32 @fptoui_float_i8(i32 %arg) {
> >> >> >  ; BTVER2-LABEL: 'fptoui_float_i8'
> >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptoui float undef to i8
> >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V4I8 = fptoui <4 x float> undef to <4 x i8>
> >> >> > -; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I8 = fptoui <8 x float> undef to <8 x i8>
> >> >> > -; BTVER2-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V16I8 = fptoui <16 x float> undef to <16 x i8>
> >> >> > +; BTVER2-NEXT:  Cost Model: Found an estimated cost of 24 for instruction: %V8I8 = fptoui <8 x float> undef to <8 x i8>
> >> >> > +; BTVER2-NEXT:  Cost Model: Found an estimated cost of 49 for instruction: %V16I8 = fptoui <16 x float> undef to <16 x i8>
> >> >> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> >  ;
> >> >> >    %I8 = fptoui float undef to i8
> >> >> >
> >> >> > Modified: llvm/trunk/test/Analysis/CostModel/X86/masked-intrinsic-cost.ll
> >> >> > URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Analysis/CostModel/X86/masked-intrinsic-cost.ll?rev=368183&r1=368182&r2=368183&view=diff
> >> >> > ==============================================================================
> >> >> > --- llvm/trunk/test/Analysis/CostModel/X86/masked-intrinsic-cost.ll (original)
> >> >> > +++ llvm/trunk/test/Analysis/CostModel/X86/masked-intrinsic-cost.ll Wed Aug  7 09:24:26 2019
> >> >> > @@ -52,7 +52,7 @@ define i32 @masked_load() {
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V16I32 = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32>* undef, i32 1, <16 x i1> undef, <16 x i32> undef)
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8I32 = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32>* undef, i32 1, <8 x i1> undef, <8 x i32> undef)
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V4I32 = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* undef, i32 1, <4 x i1> undef, <4 x i32> undef)
> >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V2I32 = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* undef, i32 1, <2 x i1> undef, <2 x i32> undef)
> >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V2I32 = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* undef, i32 1, <2 x i1> undef, <2 x i32> undef)
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 128 for instruction: %V32I16 = call <32 x i16> @llvm.masked.load.v32i16.p0v32i16(<32 x i16>* undef, i32 1, <32 x i1> undef, <32 x i16> undef)
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 64 for instruction: %V16I16 = call <16 x i16> @llvm.masked.load.v16i16.p0v16i16(<16 x i16>* undef, i32 1, <16 x i1> undef, <16 x i16> undef)
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 32 for instruction: %V8I16 = call <8 x i16> @llvm.masked.load.v8i16.p0v8i16(<8 x i16>* undef, i32 1, <8 x i1> undef, <8 x i16> undef)
> >> >> > @@ -79,7 +79,7 @@ define i32 @masked_load() {
> >> >> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V16I32 = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32>* undef, i32 1, <16 x i1> undef, <16 x i32> undef)
> >> >> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I32 = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32>* undef, i32 1, <8 x i1> undef, <8 x i32> undef)
> >> >> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* undef, i32 1, <4 x i1> undef, <4 x i32> undef)
> >> >> > -; KNL-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V2I32 = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* undef, i32 1, <2 x i1> undef, <2 x i32> undef)
> >> >> > +; KNL-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %V2I32 = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* undef, i32 1, <2 x i1> undef, <2 x i32> undef)
> >> >> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 128 for instruction: %V32I16 = call <32 x i16> @llvm.masked.load.v32i16.p0v32i16(<32 x i16>* undef, i32 1, <32 x i1> undef, <32 x i16> undef)
> >> >> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 64 for instruction: %V16I16 = call <16 x i16> @llvm.masked.load.v16i16.p0v16i16(<16 x i16>* undef, i32 1, <16 x i1> undef, <16 x i16> undef)
> >> >> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 32 for instruction: %V8I16 = call <8 x i16> @llvm.masked.load.v8i16.p0v8i16(<8 x i16>* undef, i32 1, <8 x i1> undef, <8 x i16> undef)
> >> >> > @@ -106,15 +106,15 @@ define i32 @masked_load() {
> >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V16I32 = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32>* undef, i32 1, <16 x i1> undef, <16 x i32> undef)
> >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I32 = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32>* undef, i32 1, <8 x i1> undef, <8 x i32> undef)
> >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* undef, i32 1, <4 x i1> undef, <4 x i32> undef)
> >> >> > -; SKX-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V2I32 = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* undef, i32 1, <2 x i1> undef, <2 x i32> undef)
> >> >> > +; SKX-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %V2I32 = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* undef, i32 1, <2 x i1> undef, <2 x i32> undef)
> >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V32I16 = call <32 x i16> @llvm.masked.load.v32i16.p0v32i16(<32 x i16>* undef, i32 1, <32 x i1> undef, <32 x i16> undef)
> >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V16I16 = call <16 x i16> @llvm.masked.load.v16i16.p0v16i16(<16 x i16>* undef, i32 1, <16 x i1> undef, <16 x i16> undef)
> >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = call <8 x i16> @llvm.masked.load.v8i16.p0v8i16(<8 x i16>* undef, i32 1, <8 x i1> undef, <8 x i16> undef)
> >> >> > -; SKX-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V4I16 = call <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16>* undef, i32 1, <4 x i1> undef, <4 x i16> undef)
> >> >> > +; SKX-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %V4I16 = call <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16>* undef, i32 1, <4 x i1> undef, <4 x i16> undef)
> >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V64I8 = call <64 x i8> @llvm.masked.load.v64i8.p0v64i8(<64 x i8>* undef, i32 1, <64 x i1> undef, <64 x i8> undef)
> >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V32I8 = call <32 x i8> @llvm.masked.load.v32i8.p0v32i8(<32 x i8>* undef, i32 1, <32 x i1> undef, <32 x i8> undef)
> >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V16I8 = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8>* undef, i32 1, <16 x i1> undef, <16 x i8> undef)
> >> >> > -; SKX-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V8I8 = call <8 x i8> @llvm.masked.load.v8i8.p0v8i8(<8 x i8>* undef, i32 1, <8 x i1> undef, <8 x i8> undef)
> >> >> > +; SKX-NEXT:  Cost Model: Found an estimated cost of 17 for instruction: %V8I8 = call <8 x i8> @llvm.masked.load.v8i8.p0v8i8(<8 x i8>* undef, i32 1, <8 x i1> undef, <8 x i8> undef)
> >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 0
> >> >> >  ;
> >> >> >    %V8F64 = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double>* undef, i32 1, <8 x i1> undef, <8 x double> undef)
> >> >> > @@ -194,7 +194,7 @@ define i32 @masked_store() {
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: call void @llvm.masked.store.v16i32.p0v16i32(<16 x i32> undef, <16 x i32>* undef, i32 1, <16 x i1> undef)
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> undef, <8 x i32>* undef, i32 1, <8 x i1> undef)
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> undef, <4 x i32>* undef, i32 1, <4 x i1> undef)
> >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> undef, <2 x i32>* undef, i32 1, <2 x i1> undef)
> >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> undef, <2 x i32>* undef, i32 1, <2 x i1> undef)
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 128 for instruction: call void @llvm.masked.store.v32i16.p0v32i16(<32 x i16> undef, <32 x i16>* undef, i32 1, <32 x i1> undef)
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 64 for instruction: call void @llvm.masked.store.v16i16.p0v16i16(<16 x i16> undef, <16 x i16>* undef, i32 1, <16 x i1> undef)
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 32 for instruction: call void @llvm.masked.store.v8i16.p0v8i16(<8 x i16> undef, <8 x i16>* undef, i32 1, <8 x i1> undef)
> >> >> > @@ -221,7 +221,7 @@ define i32 @masked_store() {
> >> >> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v16i32.p0v16i32(<16 x i32> undef, <16 x i32>* undef, i32 1, <16 x i1> undef)
> >> >> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> undef, <8 x i32>* undef, i32 1, <8 x i1> undef)
> >> >> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> undef, <4 x i32>* undef, i32 1, <4 x i1> undef)
> >> >> > -; KNL-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> undef, <2 x i32>* undef, i32 1, <2 x i1> undef)
> >> >> > +; KNL-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> undef, <2 x i32>* undef, i32 1, <2 x i1> undef)
> >> >> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 128 for instruction: call void @llvm.masked.store.v32i16.p0v32i16(<32 x i16> undef, <32 x i16>* undef, i32 1, <32 x i1> undef)
> >> >> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 64 for instruction: call void @llvm.masked.store.v16i16.p0v16i16(<16 x i16> undef, <16 x i16>* undef, i32 1, <16 x i1> undef)
> >> >> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 32 for instruction: call void @llvm.masked.store.v8i16.p0v8i16(<8 x i16> undef, <8 x i16>* undef, i32 1, <8 x i1> undef)
> >> >> > @@ -248,15 +248,15 @@ define i32 @masked_store() {
> >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v16i32.p0v16i32(<16 x i32> undef, <16 x i32>* undef, i32 1, <16 x i1> undef)
> >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> undef, <8 x i32>* undef, i32 1, <8 x i1> undef)
> >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> undef, <4 x i32>* undef, i32 1, <4 x i1> undef)
> >> >> > -; SKX-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> undef, <2 x i32>* undef, i32 1, <2 x i1> undef)
> >> >> > +; SKX-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> undef, <2 x i32>* undef, i32 1, <2 x i1> undef)
> >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v32i16.p0v32i16(<32 x i16> undef, <32 x i16>* undef, i32 1, <32 x i1> undef)
> >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v16i16.p0v16i16(<16 x i16> undef, <16 x i16>* undef, i32 1, <16 x i1> undef)
> >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v8i16.p0v8i16(<8 x i16> undef, <8 x i16>* undef, i32 1, <8 x i1> undef)
> >> >> > -; SKX-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: call void @llvm.masked.store.v4i16.p0v4i16(<4 x i16> undef, <4 x i16>* undef, i32 1, <4 x i1> undef)
> >> >> > +; SKX-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: call void @llvm.masked.store.v4i16.p0v4i16(<4 x i16> undef, <4 x i16>* undef, i32 1, <4 x i1> undef)
> >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v64i8.p0v64i8(<64 x i8> undef, <64 x i8>* undef, i32 1, <64 x i1> undef)
> >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v32i8.p0v32i8(<32 x i8> undef, <32 x i8>* undef, i32 1, <32 x i1> undef)
> >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> undef, <16 x i8>* undef, i32 1, <16 x i1> undef)
> >> >> > -; SKX-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: call void @llvm.masked.store.v8i8.p0v8i8(<8 x i8> undef, <8 x i8>* undef, i32 1, <8 x i1> undef)
> >> >> > +; SKX-NEXT:  Cost Model: Found an estimated cost of 17 for instruction: call void @llvm.masked.store.v8i8.p0v8i8(<8 x i8> undef, <8 x i8>* undef, i32 1, <8 x i1> undef)
> >> >> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 0
> >> >> >  ;
> >> >> >    call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> undef, <8 x double>* undef, i32 1, <8 x i1> undef)
> >> >> > @@ -960,15 +960,10 @@ define <8 x float> @test4(<8 x i32> %tri
> >> >> >  }
> >> >> >
> >> >> >  define void @test5(<2 x i32> %trigger, <2 x float>* %addr, <2 x float> %val) {
> >> >> > -; SSE2-LABEL: 'test5'
> >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
> >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> %val, <2 x float>* %addr, i32 4, <2 x i1> %mask)
> >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
> >> >> > -;
> >> >> > -; SSE42-LABEL: 'test5'
> >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
> >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> %val, <2 x float>* %addr, i32 4, <2 x i1> %mask)
> >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
> >> >> > +; SSE-LABEL: 'test5'
> >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
> >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> %val, <2 x float>* %addr, i32 4, <2 x i1> %mask)
> >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
> >> >> >  ;
> >> >> >  ; AVX-LABEL: 'test5'
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
> >> >> > @@ -986,24 +981,19 @@ define void @test5(<2 x i32> %trigger, <
> >> >> >  }
> >> >> >
> >> >> >  define void @test6(<2 x i32> %trigger, <2 x i32>* %addr, <2 x i32> %val) {
> >> >> > -; SSE2-LABEL: 'test6'
> >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
> >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2 x i32>* %addr, i32 4, <2 x i1> %mask)
> >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
> >> >> > -;
> >> >> > -; SSE42-LABEL: 'test6'
> >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
> >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2 x i32>* %addr, i32 4, <2 x i1> %mask)
> >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
> >> >> > +; SSE-LABEL: 'test6'
> >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
> >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2 x i32>* %addr, i32 4, <2 x i1> %mask)
> >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
> >> >> >  ;
> >> >> >  ; AVX-LABEL: 'test6'
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
> >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2 x i32>* %addr, i32 4, <2 x i1> %mask)
> >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2 x i32>* %addr, i32 4, <2 x i1> %mask)
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
> >> >> >  ;
> >> >> >  ; AVX512-LABEL: 'test6'
> >> >> >  ; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
> >> >> > -; AVX512-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2 x i32>* %addr, i32 4, <2 x i1> %mask)
> >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2 x i32>* %addr, i32 4, <2 x i1> %mask)
> >> >> >  ; AVX512-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
> >> >> >  ;
> >> >> >    %mask = icmp eq <2 x i32> %trigger, zeroinitializer
> >> >> > @@ -1012,15 +1002,10 @@ define void @test6(<2 x i32> %trigger, <
> >> >> >  }
> >> >> >
> >> >> >  define <2 x float> @test7(<2 x i32> %trigger, <2 x float>* %addr, <2 x float> %dst) {
> >> >> > -; SSE2-LABEL: 'test7'
> >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
> >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %res = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x float>* %addr, i32 4, <2 x i1> %mask, <2 x float> %dst)
> >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x float> %res
> >> >> > -;
> >> >> > -; SSE42-LABEL: 'test7'
> >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
> >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %res = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x float>* %addr, i32 4, <2 x i1> %mask, <2 x float> %dst)
> >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x float> %res
> >> >> > +; SSE-LABEL: 'test7'
> >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
> >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %res = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x float>* %addr, i32 4, <2 x i1> %mask, <2 x float> %dst)
> >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x float> %res
> >> >> >  ;
> >> >> >  ; AVX-LABEL: 'test7'
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
> >> >> > @@ -1038,24 +1023,19 @@ define <2 x float> @test7(<2 x i32> %tri
> >> >> >  }
> >> >> >
> >> >> >  define <2 x i32> @test8(<2 x i32> %trigger, <2 x i32>* %addr, <2 x i32> %dst) {
> >> >> > -; SSE2-LABEL: 'test8'
> >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
> >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %res = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* %addr, i32 4, <2 x i1> %mask, <2 x i32> %dst)
> >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %res
> >> >> > -;
> >> >> > -; SSE42-LABEL: 'test8'
> >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
> >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %res = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* %addr, i32 4, <2 x i1> %mask, <2 x i32> %dst)
> >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %res
> >> >> > +; SSE-LABEL: 'test8'
> >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
> >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %res = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* %addr, i32 4, <2 x i1> %mask, <2 x i32> %dst)
> >> >> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %res
> >> >> >  ;
> >> >> >  ; AVX-LABEL: 'test8'
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
> >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %res = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* %addr, i32 4, <2 x i1> %mask, <2 x i32> %dst)
> >> >> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %res = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* %addr, i32 4, <2 x i1> %mask, <2 x i32> %dst)
> >> >> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %res
> >> >> >  ;
> >> >> >  ; AVX512-LABEL: 'test8'
> >> >> >  ; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
> >> >> > -; AVX512-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %res = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* %addr, i32 4, <2 x i1> %mask, <2 x i32> %dst)
> >> >> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %res = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* %addr, i32 4, <2 x i1> %mask, <2 x i32> %dst)
> >> >> >  ; AVX512-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %res
> >> >> >  ;
> >> >> >    %mask = icmp eq <2 x i32> %trigger, zeroinitializer
> >> >> >
> >> >> > Removed: llvm/trunk/test/Analysis/CostModel/X86/reduce-add-widen.ll
> >> >> > URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Analysis/CostModel/X86/reduce-add-widen.ll?rev=368182&view=auto
> >> >> > ==============================================================================
> >> >> > --- llvm/trunk/test/Analysis/CostModel/X86/reduce-add-widen.ll (original)
> >> >> > +++ llvm/trunk/test/Analysis/CostModel/X86/reduce-add-widen.ll (removed)
> >> >> > @@ -1,307 +0,0 @@
> >> >> > -; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
> >> >> > -; RUN: opt < %s -x86-experimental-vector-widening-legalization -cost-model -mtriple=x86_64-apple-darwin -analyze -mattr=+sse2 | FileCheck %s --check-prefixes=CHECK,SSE,SSE2
> >> >> > -; RUN: opt < %s -x86-experimental-vector-widening-legalization -cost-model -mtriple=x86_64-apple-darwin -analyze -mattr=+ssse3 | FileCheck %s --check-prefixes=CHECK,SSE,SSSE3
> >> >> > -; RUN: opt < %s -x86-experimental-vector-widening-legalization -cost-model -mtriple=x86_64-apple-darwin -analyze -mattr=+sse4.2 | FileCheck %s --check-prefixes=CHECK,SSE,SSE42
> >> >> > -; RUN: opt < %s -x86-experimental-vector-widening-legalization -cost-model -mtriple=x86_64-apple-darwin -analyze -mattr=+avx | FileCheck %s --check-prefixes=CHECK,AVX,AVX1
> >> >> > -; RUN: opt < %s -x86-experimental-vector-widening-legalization -cost-model -mtriple=x86_64-apple-darwin -analyze -mattr=+avx2 | FileCheck %s --check-prefixes=CHECK,AVX,AVX2
> >> >> > -; RUN: opt < %s -x86-experimental-vector-widening-legalization -cost-model -mtriple=x86_64-apple-darwin -analyze -mattr=+avx512f | FileCheck %s --check-prefixes=CHECK,AVX512,AVX512F
> >> >> > -; RUN: opt < %s -x86-experimental-vector-widening-legalization -cost-model -mtriple=x86_64-apple-darwin -analyze -mattr=+avx512f,+avx512bw | FileCheck %s --check-prefixes=CHECK,AVX512,AVX512BW
> >> >> > -; RUN: opt < %s -x86-experimental-vector-widening-legalization -cost-model -mtriple=x86_64-apple-darwin -analyze -mattr=+avx512f,+avx512dq | FileCheck %s --check-prefixes=CHECK,AVX512,AVX512DQ
> >> >> > -
> >> >> > -define i32 @reduce_i64(i32 %arg) {
> >> >> > -; SSE2-LABEL: 'reduce_i64'
> >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %V1 = call i64 @llvm.experimental.vector.reduce.add.v1i64(<1 x i64> undef)
> >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V2 = call i64 @llvm.experimental.vector.reduce.add.v2i64(<2 x i64> undef)
> >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V4 = call i64 @llvm.experimental.vector.reduce.add.v4i64(<4 x i64> undef)
> >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V8 = call i64 @llvm.experimental.vector.reduce.add.v8i64(<8 x i64> undef)
> >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %V16 = call i64 @llvm.experimental.vector.reduce.add.v16i64(<16 x i64> undef)
> >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> > -;
> >> >> > -; SSSE3-LABEL: 'reduce_i64'
> >> >> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %V1 = call i64 @llvm.experimental.vector.reduce.add.v1i64(<1 x i64> undef)
> >> >> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V2 = call i64 @llvm.experimental.vector.reduce.add.v2i64(<2 x i64> undef)
> >> >> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V4 = call i64 @llvm.experimental.vector.reduce.add.v4i64(<4 x i64> undef)
> >> >> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V8 = call i64 @llvm.experimental.vector.reduce.add.v8i64(<8 x i64> undef)
> >> >> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %V16 = call i64 @llvm.experimental.vector.reduce.add.v16i64(<16 x i64> undef)
> >> >> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> > -;
> >> >> > -; SSE42-LABEL: 'reduce_i64'
> >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %V1 = call i64 @llvm.experimental.vector.reduce.add.v1i64(<1 x i64> undef)
> >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V2 = call i64 @llvm.experimental.vector.reduce.add.v2i64(<2 x i64> undef)
> >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V4 = call i64 @llvm.experimental.vector.reduce.add.v4i64(<4 x i64> undef)
> >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %V8 = call i64 @llvm.experimental.vector.reduce.add.v8i64(<8 x i64> undef)
> >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %V16 = call i64 @llvm.experimental.vector.reduce.add.v16i64(<16 x i64> undef)
> >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> > -;
> >> >> > -; AVX-LABEL: 'reduce_i64'
> >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %V1 = call i64 @llvm.experimental.vector.reduce.add.v1i64(<1 x i64> undef)
> >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2 = call i64 @llvm.experimental.vector.reduce.add.v2i64(<2 x i64> undef)
> >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V4 = call i64 @llvm.experimental.vector.reduce.add.v4i64(<4 x i64> undef)
> >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V8 = call i64 @llvm.experimental.vector.reduce.add.v8i64(<8 x i64> undef)
> >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V16 = call i64 @llvm.experimental.vector.reduce.add.v16i64(<16 x i64> undef)
> >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> > -;
> >> >> > -; AVX512-LABEL: 'reduce_i64'
> >> >> > -; AVX512-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %V1 = call i64 @llvm.experimental.vector.reduce.add.v1i64(<1 x i64> undef)
> >> >> > -; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2 = call i64 @llvm.experimental.vector.reduce.add.v2i64(<2 x i64> undef)
> >> >> > -; AVX512-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V4 = call i64 @llvm.experimental.vector.reduce.add.v4i64(<4 x i64> undef)
> >> >> > -; AVX512-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %V8 = call i64 @llvm.experimental.vector.reduce.add.v8i64(<8 x i64> undef)
> >> >> > -; AVX512-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %V16 = call i64 @llvm.experimental.vector.reduce.add.v16i64(<16 x i64> undef)
> >> >> > -; AVX512-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> > -;
> >> >> > -  %V1  = call i64 @llvm.experimental.vector.reduce.add.v1i64(<1 x i64> undef)
> >> >> > -  %V2  = call i64 @llvm.experimental.vector.reduce.add.v2i64(<2 x i64> undef)
> >> >> > -  %V4  = call i64 @llvm.experimental.vector.reduce.add.v4i64(<4 x i64> undef)
> >> >> > -  %V8  = call i64 @llvm.experimental.vector.reduce.add.v8i64(<8 x i64> undef)
> >> >> > -  %V16 = call i64 @llvm.experimental.vector.reduce.add.v16i64(<16 x i64> undef)
> >> >> > -  ret i32 undef
> >> >> > -}
> >> >> > -
> >> >> > -define i32 @reduce_i32(i32 %arg) {
> >> >> > -; SSE2-LABEL: 'reduce_i32'
> >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V2 = call i32 @llvm.experimental.vector.reduce.add.v2i32(<2 x i32> undef)
> >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %V4 = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> undef)
> >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V8 = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32> undef)
> >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %V16 = call i32 @llvm.experimental.vector.reduce.add.v16i32(<16 x i32> undef)
> >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V32 = call i32 @llvm.experimental.vector.reduce.add.v32i32(<32 x i32> undef)
> >> >> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> > -;
> >> >> > -; SSSE3-LABEL: 'reduce_i32'
> >> >> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V2 = call i32 @llvm.experimental.vector.reduce.add.v2i32(<2 x i32> undef)
> >> >> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %V4 = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> undef)
> >> >> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V8 = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32> undef)
> >> >> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %V16 = call i32 @llvm.experimental.vector.reduce.add.v16i32(<16 x i32> undef)
> >> >> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V32 = call i32 @llvm.experimental.vector.reduce.add.v32i32(<32 x i32> undef)
> >> >> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> > -;
> >> >> > -; SSE42-LABEL: 'reduce_i32'
> >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V2 = call i32 @llvm.experimental.vector.reduce.add.v2i32(<2 x i32> undef)
> >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V4 = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> undef)
> >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V8 = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32> undef)
> >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V16 = call i32 @llvm.experimental.vector.reduce.add.v16i32(<16 x i32> undef)
> >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 24 for instruction: %V32 = call i32 @llvm.experimental.vector.reduce.add.v32i32(<32 x i32> undef)
> >> >> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> > -;
> >> >> > -; AVX-LABEL: 'reduce_i32'
> >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V2 = call i32 @llvm.experimental.vector.reduce.add.v2i32(<2 x i32> undef)
> >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V4 = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> undef)
> >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %V8 = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32> undef)
> >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %V16 = call i32 @llvm.experimental.vector.reduce.add.v16i32(<16 x i32> undef)
> >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 20 for instruction: %V32 = call i32 @llvm.experimental.vector.reduce.add.v32i32(<32 x i32> undef)
> >> >> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
> >> >> > -;
> >> >> > -; AVX512-LABEL: 'reduce_i32'
> >> >> > -; AVX512-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V2 = call i32 @llvm.experimental.vector.reduce.add.v2i32(<2 x i32> undef)
> >> >> > -; AVX512-NEXT:  Cost Model: Found an estimated cost of 3 for i
_______________________________________________
llvm-commits mailing list
llvm-commits at lists.llvm.org<mailto:llvm-commits at lists.llvm.org>
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20190829/659b536e/attachment-0001.html>


More information about the llvm-commits mailing list