[llvm] r368183 - Recommit r367901 "[X86] Enable -x86-experimental-vector-widening-legalization by default."

Mon Aug 19 22:28:33 PDT 2019

I do apologize, we've only just gotten to performance testing and the
numbers are pretty exciting, but unfortunately many more negatives
than positives. We'll definitely work with you on testing and
performance analysis if that will help?

Thanks!

-eric

On Mon, Aug 19, 2019 at 10:23 PM Craig Topper <craig.topper at gmail.com> wrote:
>
> There have been quite a lot of follow on patches to this. A lot of them would need be reverted to get back to the old state. I can start trying to put that together.
>
> ~Craig
>
>
> On Mon, Aug 19, 2019 at 9:55 PM Eric Christopher via llvm-commits <llvm-commits at lists.llvm.org> wrote:
>>
>> HI Craig,
>>
>> We're seeing a rather lot of performance regressions with this enabled
>> by default. Is it possible to get it turned on under a command flag
>> for the near term while we work on getting you a pile of testcases
>> (some of it is Eigen and those will at least be easier as you have
>> access to that source :)
>>
>> Thoughts?
>>
>> Thanks!
>>
>> -eric
>>
>> On Wed, Aug 7, 2019 at 9:23 AM Craig Topper via llvm-commits
>> <llvm-commits at lists.llvm.org> wrote:
>> >
>> > Author: ctopper
>> > Date: Wed Aug  7 09:24:26 2019
>> > New Revision: 368183
>> >
>> > URL: http://llvm.org/viewvc/llvm-project?rev=368183&view=rev
>> > Log:
>> > Recommit r367901 "[X86] Enable -x86-experimental-vector-widening-legalization by default."
>> >
>> > The assert that caused this to be reverted should be fixed now.
>> >
>> > Original commit message:
>> >
>> > This patch changes our defualt legalization behavior for 16, 32, and
>> > 64 bit vectors with i8/i16/i32/i64 scalar types from promotion to
>> > widening. For example, v8i8 will now be widened to v16i8 instead of
>> > promoted to v8i16. This keeps the elements widths the same and pads
>> > with undef elements. We believe this is a better legalization strategy.
>> > But it carries some issues due to the fragmented vector ISA. For
>> > example, i8 shifts and multiplies get widened and then later have
>> > to be promoted/split into vXi16 vectors.
>> >
>> > This has the potential to cause regressions so we wanted to get
>> > it in early in the 10.0 cycle so we have plenty of time to
>> > address them.
>> >
>> > Next steps will be to merge tests that explicitly test the command
>> > line option. And then we can remove the option and its associated
>> > code.
>> >
>> > Removed:
>> >     llvm/trunk/test/Analysis/CostModel/X86/reduce-add-widen.ll
>> > Modified:
>> >     llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>> >     llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp
>> >     llvm/trunk/test/Analysis/CostModel/X86/alternate-shuffle-cost.ll
>> >     llvm/trunk/test/Analysis/CostModel/X86/arith.ll
>> >     llvm/trunk/test/Analysis/CostModel/X86/cast.ll
>> >     llvm/trunk/test/Analysis/CostModel/X86/fptosi.ll
>> >     llvm/trunk/test/Analysis/CostModel/X86/fptoui.ll
>> >     llvm/trunk/test/Analysis/CostModel/X86/masked-intrinsic-cost.ll
>> >     llvm/trunk/test/Analysis/CostModel/X86/reduce-add.ll
>> >     llvm/trunk/test/Analysis/CostModel/X86/reduce-and.ll
>> >     llvm/trunk/test/Analysis/CostModel/X86/reduce-mul.ll
>> >     llvm/trunk/test/Analysis/CostModel/X86/reduce-or.ll
>> >     llvm/trunk/test/Analysis/CostModel/X86/reduce-smax.ll
>> >     llvm/trunk/test/Analysis/CostModel/X86/reduce-smin.ll
>> >     llvm/trunk/test/Analysis/CostModel/X86/reduce-umax.ll
>> >     llvm/trunk/test/Analysis/CostModel/X86/reduce-umin.ll
>> >     llvm/trunk/test/Analysis/CostModel/X86/reduce-xor.ll
>> >     llvm/trunk/test/Analysis/CostModel/X86/shuffle-transpose.ll
>> >     llvm/trunk/test/Analysis/CostModel/X86/sitofp.ll
>> >     llvm/trunk/test/Analysis/CostModel/X86/slm-arith-costs.ll
>> >     llvm/trunk/test/Analysis/CostModel/X86/testshiftashr.ll
>> >     llvm/trunk/test/Analysis/CostModel/X86/testshiftlshr.ll
>> >     llvm/trunk/test/Analysis/CostModel/X86/testshiftshl.ll
>> >     llvm/trunk/test/Analysis/CostModel/X86/uitofp.ll
>> >     llvm/trunk/test/CodeGen/X86/2008-09-05-sinttofp-2xi32.ll
>> >     llvm/trunk/test/CodeGen/X86/2009-06-05-VZextByteShort.ll
>> >     llvm/trunk/test/CodeGen/X86/2011-10-19-LegelizeLoad.ll
>> >     llvm/trunk/test/CodeGen/X86/2011-12-28-vselecti8.ll
>> >     llvm/trunk/test/CodeGen/X86/2011-12-8-bitcastintprom.ll
>> >     llvm/trunk/test/CodeGen/X86/2012-01-18-vbitcast.ll
>> >     llvm/trunk/test/CodeGen/X86/2012-03-15-build_vector_wl.ll
>> >     llvm/trunk/test/CodeGen/X86/2012-07-10-extload64.ll
>> >     llvm/trunk/test/CodeGen/X86/3dnow-intrinsics.ll
>> >     llvm/trunk/test/CodeGen/X86/4char-promote.ll
>> >     llvm/trunk/test/CodeGen/X86/and-load-fold.ll
>> >     llvm/trunk/test/CodeGen/X86/atomic-unordered.ll
>> >     llvm/trunk/test/CodeGen/X86/avg.ll
>> >     llvm/trunk/test/CodeGen/X86/avx-cvt-2.ll
>> >     llvm/trunk/test/CodeGen/X86/avx-fp2int.ll
>> >     llvm/trunk/test/CodeGen/X86/avx2-conversions.ll
>> >     llvm/trunk/test/CodeGen/X86/avx2-masked-gather.ll
>> >     llvm/trunk/test/CodeGen/X86/avx2-vbroadcast.ll
>> >     llvm/trunk/test/CodeGen/X86/avx512-any_extend_load.ll
>> >     llvm/trunk/test/CodeGen/X86/avx512-cvt.ll
>> >     llvm/trunk/test/CodeGen/X86/avx512-ext.ll
>> >     llvm/trunk/test/CodeGen/X86/avx512-intrinsics-upgrade.ll
>> >     llvm/trunk/test/CodeGen/X86/avx512-mask-op.ll
>> >     llvm/trunk/test/CodeGen/X86/avx512-trunc.ll
>> >     llvm/trunk/test/CodeGen/X86/avx512-vec-cmp.ll
>> >     llvm/trunk/test/CodeGen/X86/avx512-vec3-crash.ll
>> >     llvm/trunk/test/CodeGen/X86/avx512bwvl-intrinsics-upgrade.ll
>> >     llvm/trunk/test/CodeGen/X86/avx512vl-intrinsics-fast-isel.ll
>> >     llvm/trunk/test/CodeGen/X86/avx512vl-intrinsics-upgrade.ll
>> >     llvm/trunk/test/CodeGen/X86/bitcast-and-setcc-128.ll
>> >     llvm/trunk/test/CodeGen/X86/bitcast-setcc-128.ll
>> >     llvm/trunk/test/CodeGen/X86/bitcast-vector-bool.ll
>> >     llvm/trunk/test/CodeGen/X86/bitreverse.ll
>> >     llvm/trunk/test/CodeGen/X86/bswap-vector.ll
>> >     llvm/trunk/test/CodeGen/X86/buildvec-insertvec.ll
>> >     llvm/trunk/test/CodeGen/X86/combine-64bit-vec-binop.ll
>> >     llvm/trunk/test/CodeGen/X86/combine-or.ll
>> >     llvm/trunk/test/CodeGen/X86/complex-fastmath.ll
>> >     llvm/trunk/test/CodeGen/X86/cvtv2f32.ll
>> >     llvm/trunk/test/CodeGen/X86/extract-concat.ll
>> >     llvm/trunk/test/CodeGen/X86/extract-insert.ll
>> >     llvm/trunk/test/CodeGen/X86/f16c-intrinsics.ll
>> >     llvm/trunk/test/CodeGen/X86/fold-vector-sext-zext.ll
>> >     llvm/trunk/test/CodeGen/X86/insertelement-shuffle.ll
>> >     llvm/trunk/test/CodeGen/X86/known-bits.ll
>> >     llvm/trunk/test/CodeGen/X86/load-partial.ll
>> >     llvm/trunk/test/CodeGen/X86/lower-bitcast.ll
>> >     llvm/trunk/test/CodeGen/X86/madd.ll
>> >     llvm/trunk/test/CodeGen/X86/masked_compressstore.ll
>> >     llvm/trunk/test/CodeGen/X86/masked_expandload.ll
>> >     llvm/trunk/test/CodeGen/X86/masked_gather_scatter.ll
>> >     llvm/trunk/test/CodeGen/X86/masked_gather_scatter_widen.ll
>> >     llvm/trunk/test/CodeGen/X86/masked_load.ll
>> >     llvm/trunk/test/CodeGen/X86/masked_store.ll
>> >     llvm/trunk/test/CodeGen/X86/masked_store_trunc.ll
>> >     llvm/trunk/test/CodeGen/X86/masked_store_trunc_ssat.ll
>> >     llvm/trunk/test/CodeGen/X86/masked_store_trunc_usat.ll
>> >     llvm/trunk/test/CodeGen/X86/merge-consecutive-loads-256.ll
>> >     llvm/trunk/test/CodeGen/X86/mmx-arg-passing-x86-64.ll
>> >     llvm/trunk/test/CodeGen/X86/mmx-arith.ll
>> >     llvm/trunk/test/CodeGen/X86/mmx-cvt.ll
>> >     llvm/trunk/test/CodeGen/X86/mulvi32.ll
>> >     llvm/trunk/test/CodeGen/X86/oddshuffles.ll
>> >     llvm/trunk/test/CodeGen/X86/oddsubvector.ll
>> >     llvm/trunk/test/CodeGen/X86/pmaddubsw.ll
>> >     llvm/trunk/test/CodeGen/X86/pmulh.ll
>> >     llvm/trunk/test/CodeGen/X86/pointer-vector.ll
>> >     llvm/trunk/test/CodeGen/X86/pr14161.ll
>> >     llvm/trunk/test/CodeGen/X86/pr35918.ll
>> >     llvm/trunk/test/CodeGen/X86/pr40994.ll
>> >     llvm/trunk/test/CodeGen/X86/promote-vec3.ll
>> >     llvm/trunk/test/CodeGen/X86/promote.ll
>> >     llvm/trunk/test/CodeGen/X86/psubus.ll
>> >     llvm/trunk/test/CodeGen/X86/ret-mmx.ll
>> >     llvm/trunk/test/CodeGen/X86/sad.ll
>> >     llvm/trunk/test/CodeGen/X86/sadd_sat_vec.ll
>> >     llvm/trunk/test/CodeGen/X86/scalar_widen_div.ll
>> >     llvm/trunk/test/CodeGen/X86/select.ll
>> >     llvm/trunk/test/CodeGen/X86/shift-combine.ll
>> >     llvm/trunk/test/CodeGen/X86/shrink_vmul.ll
>> >     llvm/trunk/test/CodeGen/X86/shuffle-strided-with-offset-128.ll
>> >     llvm/trunk/test/CodeGen/X86/shuffle-strided-with-offset-256.ll
>> >     llvm/trunk/test/CodeGen/X86/shuffle-strided-with-offset-512.ll
>> >     llvm/trunk/test/CodeGen/X86/shuffle-vs-trunc-128.ll
>> >     llvm/trunk/test/CodeGen/X86/shuffle-vs-trunc-256.ll
>> >     llvm/trunk/test/CodeGen/X86/shuffle-vs-trunc-512.ll
>> >     llvm/trunk/test/CodeGen/X86/slow-pmulld.ll
>> >     llvm/trunk/test/CodeGen/X86/sse2-intrinsics-canonical.ll
>> >     llvm/trunk/test/CodeGen/X86/sse2-vector-shifts.ll
>> >     llvm/trunk/test/CodeGen/X86/ssub_sat_vec.ll
>> >     llvm/trunk/test/CodeGen/X86/test-shrink-bug.ll
>> >     llvm/trunk/test/CodeGen/X86/trunc-ext-ld-st.ll
>> >     llvm/trunk/test/CodeGen/X86/trunc-subvector.ll
>> >     llvm/trunk/test/CodeGen/X86/uadd_sat_vec.ll
>> >     llvm/trunk/test/CodeGen/X86/unfold-masked-merge-vector-variablemask.ll
>> >     llvm/trunk/test/CodeGen/X86/usub_sat_vec.ll
>> >     llvm/trunk/test/CodeGen/X86/vec_cast2.ll
>> >     llvm/trunk/test/CodeGen/X86/vec_cast3.ll
>> >     llvm/trunk/test/CodeGen/X86/vec_ctbits.ll
>> >     llvm/trunk/test/CodeGen/X86/vec_extract-mmx.ll
>> >     llvm/trunk/test/CodeGen/X86/vec_fp_to_int.ll
>> >     llvm/trunk/test/CodeGen/X86/vec_insert-5.ll
>> >     llvm/trunk/test/CodeGen/X86/vec_insert-7.ll
>> >     llvm/trunk/test/CodeGen/X86/vec_insert-mmx.ll
>> >     llvm/trunk/test/CodeGen/X86/vec_int_to_fp.ll
>> >     llvm/trunk/test/CodeGen/X86/vec_saddo.ll
>> >     llvm/trunk/test/CodeGen/X86/vec_smulo.ll
>> >     llvm/trunk/test/CodeGen/X86/vec_ssubo.ll
>> >     llvm/trunk/test/CodeGen/X86/vec_uaddo.ll
>> >     llvm/trunk/test/CodeGen/X86/vec_umulo.ll
>> >     llvm/trunk/test/CodeGen/X86/vec_usubo.ll
>> >     llvm/trunk/test/CodeGen/X86/vector-blend.ll
>> >     llvm/trunk/test/CodeGen/X86/vector-ext-logic.ll
>> >     llvm/trunk/test/CodeGen/X86/vector-gep.ll
>> >     llvm/trunk/test/CodeGen/X86/vector-half-conversions.ll
>> >     llvm/trunk/test/CodeGen/X86/vector-idiv-v2i32.ll
>> >     llvm/trunk/test/CodeGen/X86/vector-narrow-binop.ll
>> >     llvm/trunk/test/CodeGen/X86/vector-reduce-add.ll
>> >     llvm/trunk/test/CodeGen/X86/vector-reduce-and-bool.ll
>> >     llvm/trunk/test/CodeGen/X86/vector-reduce-and.ll
>> >     llvm/trunk/test/CodeGen/X86/vector-reduce-mul.ll
>> >     llvm/trunk/test/CodeGen/X86/vector-reduce-or-bool.ll
>> >     llvm/trunk/test/CodeGen/X86/vector-reduce-or.ll
>> >     llvm/trunk/test/CodeGen/X86/vector-reduce-smax.ll
>> >     llvm/trunk/test/CodeGen/X86/vector-reduce-smin.ll
>> >     llvm/trunk/test/CodeGen/X86/vector-reduce-umax.ll
>> >     llvm/trunk/test/CodeGen/X86/vector-reduce-umin.ll
>> >     llvm/trunk/test/CodeGen/X86/vector-reduce-xor-bool.ll
>> >     llvm/trunk/test/CodeGen/X86/vector-reduce-xor.ll
>> >     llvm/trunk/test/CodeGen/X86/vector-sext.ll
>> >     llvm/trunk/test/CodeGen/X86/vector-shift-ashr-sub128.ll
>> >     llvm/trunk/test/CodeGen/X86/vector-shift-by-select-loop.ll
>> >     llvm/trunk/test/CodeGen/X86/vector-shift-lshr-sub128.ll
>> >     llvm/trunk/test/CodeGen/X86/vector-shift-shl-sub128.ll
>> >     llvm/trunk/test/CodeGen/X86/vector-shuffle-128-v16.ll
>> >     llvm/trunk/test/CodeGen/X86/vector-shuffle-combining.ll
>> >     llvm/trunk/test/CodeGen/X86/vector-trunc-packus.ll
>> >     llvm/trunk/test/CodeGen/X86/vector-trunc-ssat.ll
>> >     llvm/trunk/test/CodeGen/X86/vector-trunc-usat.ll
>> >     llvm/trunk/test/CodeGen/X86/vector-trunc.ll
>> >     llvm/trunk/test/CodeGen/X86/vector-truncate-combine.ll
>> >     llvm/trunk/test/CodeGen/X86/vector-zext.ll
>> >     llvm/trunk/test/CodeGen/X86/vsel-cmp-load.ll
>> >     llvm/trunk/test/CodeGen/X86/vselect-avx.ll
>> >     llvm/trunk/test/CodeGen/X86/vselect.ll
>> >     llvm/trunk/test/CodeGen/X86/vshift-4.ll
>> >     llvm/trunk/test/CodeGen/X86/widen_arith-1.ll
>> >     llvm/trunk/test/CodeGen/X86/widen_arith-2.ll
>> >     llvm/trunk/test/CodeGen/X86/widen_arith-3.ll
>> >     llvm/trunk/test/CodeGen/X86/widen_bitops-0.ll
>> >     llvm/trunk/test/CodeGen/X86/widen_cast-1.ll
>> >     llvm/trunk/test/CodeGen/X86/widen_cast-2.ll
>> >     llvm/trunk/test/CodeGen/X86/widen_cast-3.ll
>> >     llvm/trunk/test/CodeGen/X86/widen_cast-4.ll
>> >     llvm/trunk/test/CodeGen/X86/widen_cast-5.ll
>> >     llvm/trunk/test/CodeGen/X86/widen_cast-6.ll
>> >     llvm/trunk/test/CodeGen/X86/widen_compare-1.ll
>> >     llvm/trunk/test/CodeGen/X86/widen_conv-1.ll
>> >     llvm/trunk/test/CodeGen/X86/widen_conv-2.ll
>> >     llvm/trunk/test/CodeGen/X86/widen_conv-3.ll
>> >     llvm/trunk/test/CodeGen/X86/widen_conv-4.ll
>> >     llvm/trunk/test/CodeGen/X86/widen_load-2.ll
>> >     llvm/trunk/test/CodeGen/X86/widen_shuffle-1.ll
>> >     llvm/trunk/test/CodeGen/X86/x86-interleaved-access.ll
>> >     llvm/trunk/test/CodeGen/X86/x86-shifts.ll
>> >     llvm/trunk/test/Transforms/SLPVectorizer/X86/blending-shuffle.ll
>> >     llvm/trunk/test/Transforms/SLPVectorizer/X86/fptosi.ll
>> >     llvm/trunk/test/Transforms/SLPVectorizer/X86/fptoui.ll
>> >     llvm/trunk/test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll
>> >     llvm/trunk/test/Transforms/SLPVectorizer/X86/sitofp.ll
>> >     llvm/trunk/test/Transforms/SLPVectorizer/X86/uitofp.ll
>> >
>> > Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>> > URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=368183&r1=368182&r2=368183&view=diff
>> > ==============================================================================
>> > --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
>> > +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Wed Aug  7 09:24:26 2019
>> > @@ -66,7 +66,7 @@ using namespace llvm;
>> >  STATISTIC(NumTailCalls, "Number of tail calls");
>> >
>> >  static cl::opt<bool> ExperimentalVectorWideningLegalization(
>> > -    "x86-experimental-vector-widening-legalization", cl::init(false),
>> > +    "x86-experimental-vector-widening-legalization", cl::init(true),
>> >      cl::desc("Enable an experimental vector type legalization through widening "
>> >               "rather than promotion."),
>> >      cl::Hidden);
>> > @@ -40453,8 +40453,7 @@ static SDValue combineStore(SDNode *N, S
>> >    bool NoImplicitFloatOps = F.hasFnAttribute(Attribute::NoImplicitFloat);
>> >    bool F64IsLegal =
>> >        !Subtarget.useSoftFloat() && !NoImplicitFloatOps && Subtarget.hasSSE2();
>> > -  if (((VT.isVector() && !VT.isFloatingPoint()) ||
>> > -       (VT == MVT::i64 && F64IsLegal && !Subtarget.is64Bit())) &&
>> > +  if ((VT == MVT::i64 && F64IsLegal && !Subtarget.is64Bit()) &&
>> >        isa<LoadSDNode>(St->getValue()) &&
>> >        !cast<LoadSDNode>(St->getValue())->isVolatile() &&
>> >        St->getChain().hasOneUse() && !St->isVolatile()) {
>> >
>> > Modified: llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp
>> > URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp?rev=368183&r1=368182&r2=368183&view=diff
>> > ==============================================================================
>> > --- llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp (original)
>> > +++ llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp Wed Aug  7 09:24:26 2019
>> > @@ -887,7 +887,7 @@ int X86TTIImpl::getArithmeticInstrCost(
>> >  int X86TTIImpl::getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index,
>> >                                 Type *SubTp) {
>> >    // 64-bit packed float vectors (v2f32) are widened to type v4f32.
>> > -  // 64-bit packed integer vectors (v2i32) are promoted to type v2i64.
>> > +  // 64-bit packed integer vectors (v2i32) are widened to type v4i32.
>> >    std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, Tp);
>> >
>> >    // Treat Transpose as 2-op shuffles - there's no difference in lowering.
>> > @@ -2425,14 +2425,6 @@ int X86TTIImpl::getAddressComputationCos
>> >
>> >  int X86TTIImpl::getArithmeticReductionCost(unsigned Opcode, Type *ValTy,
>> >                                             bool IsPairwise) {
>> > -
>> > -  std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, ValTy);
>> > -
>> > -  MVT MTy = LT.second;
>> > -
>> > -  int ISD = TLI->InstructionOpcodeToISD(Opcode);
>> > -  assert(ISD && "Invalid opcode");
>> > -
>> >    // We use the Intel Architecture Code Analyzer(IACA) to measure the throughput
>> >    // and make it as the cost.
>> >
>> > @@ -2440,7 +2432,10 @@ int X86TTIImpl::getArithmeticReductionCo
>> >      { ISD::FADD,  MVT::v2f64,   2 },
>> >      { ISD::FADD,  MVT::v4f32,   4 },
>> >      { ISD::ADD,   MVT::v2i64,   2 },      // The data reported by the IACA tool is "1.6".
>> > +    { ISD::ADD,   MVT::v2i32,   2 }, // FIXME: chosen to be less than v4i32.
>> >      { ISD::ADD,   MVT::v4i32,   3 },      // The data reported by the IACA tool is "3.5".
>> > +    { ISD::ADD,   MVT::v2i16,   3 }, // FIXME: chosen to be less than v4i16
>> > +    { ISD::ADD,   MVT::v4i16,   4 }, // FIXME: chosen to be less than v8i16
>> >      { ISD::ADD,   MVT::v8i16,   5 },
>> >    };
>> >
>> > @@ -2449,8 +2444,11 @@ int X86TTIImpl::getArithmeticReductionCo
>> >      { ISD::FADD,  MVT::v4f64,   5 },
>> >      { ISD::FADD,  MVT::v8f32,   7 },
>> >      { ISD::ADD,   MVT::v2i64,   1 },      // The data reported by the IACA tool is "1.5".
>> > +    { ISD::ADD,   MVT::v2i32,   2 }, // FIXME: chosen to be less than v4i32
>> >      { ISD::ADD,   MVT::v4i32,   3 },      // The data reported by the IACA tool is "3.5".
>> >      { ISD::ADD,   MVT::v4i64,   5 },      // The data reported by the IACA tool is "4.8".
>> > +    { ISD::ADD,   MVT::v2i16,   3 }, // FIXME: chosen to be less than v4i16
>> > +    { ISD::ADD,   MVT::v4i16,   4 }, // FIXME: chosen to be less than v8i16
>> >      { ISD::ADD,   MVT::v8i16,   5 },
>> >      { ISD::ADD,   MVT::v8i32,   5 },
>> >    };
>> > @@ -2459,7 +2457,10 @@ int X86TTIImpl::getArithmeticReductionCo
>> >      { ISD::FADD,  MVT::v2f64,   2 },
>> >      { ISD::FADD,  MVT::v4f32,   4 },
>> >      { ISD::ADD,   MVT::v2i64,   2 },      // The data reported by the IACA tool is "1.6".
>> > +    { ISD::ADD,   MVT::v2i32,   2 }, // FIXME: chosen to be less than v4i32
>> >      { ISD::ADD,   MVT::v4i32,   3 },      // The data reported by the IACA tool is "3.3".
>> > +    { ISD::ADD,   MVT::v2i16,   2 },      // The data reported by the IACA tool is "4.3".
>> > +    { ISD::ADD,   MVT::v4i16,   3 },      // The data reported by the IACA tool is "4.3".
>> >      { ISD::ADD,   MVT::v8i16,   4 },      // The data reported by the IACA tool is "4.3".
>> >    };
>> >
>> > @@ -2468,12 +2469,47 @@ int X86TTIImpl::getArithmeticReductionCo
>> >      { ISD::FADD,  MVT::v4f64,   3 },
>> >      { ISD::FADD,  MVT::v8f32,   4 },
>> >      { ISD::ADD,   MVT::v2i64,   1 },      // The data reported by the IACA tool is "1.5".
>> > +    { ISD::ADD,   MVT::v2i32,   2 }, // FIXME: chosen to be less than v4i32
>> >      { ISD::ADD,   MVT::v4i32,   3 },      // The data reported by the IACA tool is "2.8".
>> >      { ISD::ADD,   MVT::v4i64,   3 },
>> > +    { ISD::ADD,   MVT::v2i16,   2 },      // The data reported by the IACA tool is "4.3".
>> > +    { ISD::ADD,   MVT::v4i16,   3 },      // The data reported by the IACA tool is "4.3".
>> >      { ISD::ADD,   MVT::v8i16,   4 },
>> >      { ISD::ADD,   MVT::v8i32,   5 },
>> >    };
>> >
>> > +  int ISD = TLI->InstructionOpcodeToISD(Opcode);
>> > +  assert(ISD && "Invalid opcode");
>> > +
>> > +  // Before legalizing the type, give a chance to look up illegal narrow types
>> > +  // in the table.
>> > +  // FIXME: Is there a better way to do this?
>> > +  EVT VT = TLI->getValueType(DL, ValTy);
>> > +  if (VT.isSimple()) {
>> > +    MVT MTy = VT.getSimpleVT();
>> > +    if (IsPairwise) {
>> > +      if (ST->hasAVX())
>> > +        if (const auto *Entry = CostTableLookup(AVX1CostTblPairWise, ISD, MTy))
>> > +          return Entry->Cost;
>> > +
>> > +      if (ST->hasSSE42())
>> > +        if (const auto *Entry = CostTableLookup(SSE42CostTblPairWise, ISD, MTy))
>> > +          return Entry->Cost;
>> > +    } else {
>> > +      if (ST->hasAVX())
>> > +        if (const auto *Entry = CostTableLookup(AVX1CostTblNoPairWise, ISD, MTy))
>> > +          return Entry->Cost;
>> > +
>> > +      if (ST->hasSSE42())
>> > +        if (const auto *Entry = CostTableLookup(SSE42CostTblNoPairWise, ISD, MTy))
>> > +          return Entry->Cost;
>> > +    }
>> > +  }
>> > +
>> > +  std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, ValTy);
>> > +
>> > +  MVT MTy = LT.second;
>> > +
>> >    if (IsPairwise) {
>> >      if (ST->hasAVX())
>> >        if (const auto *Entry = CostTableLookup(AVX1CostTblPairWise, ISD, MTy))
>> >
>> > Modified: llvm/trunk/test/Analysis/CostModel/X86/alternate-shuffle-cost.ll
>> > URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Analysis/CostModel/X86/alternate-shuffle-cost.ll?rev=368183&r1=368182&r2=368183&view=diff
>> > ==============================================================================
>> > --- llvm/trunk/test/Analysis/CostModel/X86/alternate-shuffle-cost.ll (original)
>> > +++ llvm/trunk/test/Analysis/CostModel/X86/alternate-shuffle-cost.ll Wed Aug  7 09:24:26 2019
>> > @@ -18,9 +18,21 @@
>> >  ; 64-bit packed float vectors (v2f32) are widened to type v4f32.
>> >
>> >  define <2 x i32> @test_v2i32(<2 x i32> %a, <2 x i32> %b) {
>> > -; CHECK-LABEL: 'test_v2i32'
>> > -; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32 0, i32 3>
>> > -; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %1
>> > +; SSE2-LABEL: 'test_v2i32'
>> > +; SSE2-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32 0, i32 3>
>> > +; SSE2-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %1
>> > +;
>> > +; SSSE3-LABEL: 'test_v2i32'
>> > +; SSSE3-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32 0, i32 3>
>> > +; SSSE3-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %1
>> > +;
>> > +; SSE42-LABEL: 'test_v2i32'
>> > +; SSE42-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32 0, i32 3>
>> > +; SSE42-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %1
>> > +;
>> > +; AVX-LABEL: 'test_v2i32'
>> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32 0, i32 3>
>> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %1
>> >  ;
>> >  ; BTVER2-LABEL: 'test_v2i32'
>> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32 0, i32 3>
>> > @@ -56,9 +68,21 @@ define <2 x float> @test_v2f32(<2 x floa
>> >  }
>> >
>> >  define <2 x i32> @test_v2i32_2(<2 x i32> %a, <2 x i32> %b) {
>> > -; CHECK-LABEL: 'test_v2i32_2'
>> > -; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32 2, i32 1>
>> > -; CHECK-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %1
>> > +; SSE2-LABEL: 'test_v2i32_2'
>> > +; SSE2-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32 2, i32 1>
>> > +; SSE2-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %1
>> > +;
>> > +; SSSE3-LABEL: 'test_v2i32_2'
>> > +; SSSE3-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32 2, i32 1>
>> > +; SSSE3-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %1
>> > +;
>> > +; SSE42-LABEL: 'test_v2i32_2'
>> > +; SSE42-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32 2, i32 1>
>> > +; SSE42-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %1
>> > +;
>> > +; AVX-LABEL: 'test_v2i32_2'
>> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32 2, i32 1>
>> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %1
>> >  ;
>> >  ; BTVER2-LABEL: 'test_v2i32_2'
>> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %1 = shufflevector <2 x i32> %a, <2 x i32> %b, <2 x i32> <i32 2, i32 1>
>> >
>> > Modified: llvm/trunk/test/Analysis/CostModel/X86/arith.ll
>> > URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Analysis/CostModel/X86/arith.ll?rev=368183&r1=368182&r2=368183&view=diff
>> > ==============================================================================
>> > --- llvm/trunk/test/Analysis/CostModel/X86/arith.ll (original)
>> > +++ llvm/trunk/test/Analysis/CostModel/X86/arith.ll Wed Aug  7 09:24:26 2019
>> > @@ -1342,36 +1342,32 @@ define i32 @mul(i32 %arg) {
>> >  ; A <2 x i64> vector multiply is implemented using
>> >  ; 3 PMULUDQ and 2 PADDS and 4 shifts.
>> >  define void @mul_2i32() {
>> > -; SSE-LABEL: 'mul_2i32'
>> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %A0 = mul <2 x i32> undef, undef
>> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
>> > +; SSSE3-LABEL: 'mul_2i32'
>> > +; SSSE3-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %A0 = mul <2 x i32> undef, undef
>> > +; SSSE3-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
>> > +;
>> > +; SSE42-LABEL: 'mul_2i32'
>> > +; SSE42-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %A0 = mul <2 x i32> undef, undef
>> > +; SSE42-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
>> >  ;
>> >  ; AVX-LABEL: 'mul_2i32'
>> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %A0 = mul <2 x i32> undef, undef
>> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %A0 = mul <2 x i32> undef, undef
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
>> >  ;
>> > -; AVX512F-LABEL: 'mul_2i32'
>> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %A0 = mul <2 x i32> undef, undef
>> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
>> > -;
>> > -; AVX512BW-LABEL: 'mul_2i32'
>> > -; AVX512BW-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %A0 = mul <2 x i32> undef, undef
>> > -; AVX512BW-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
>> > -;
>> > -; AVX512DQ-LABEL: 'mul_2i32'
>> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %A0 = mul <2 x i32> undef, undef
>> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
>> > +; AVX512-LABEL: 'mul_2i32'
>> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %A0 = mul <2 x i32> undef, undef
>> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
>> >  ;
>> >  ; SLM-LABEL: 'mul_2i32'
>> > -; SLM-NEXT:  Cost Model: Found an estimated cost of 17 for instruction: %A0 = mul <2 x i32> undef, undef
>> > +; SLM-NEXT:  Cost Model: Found an estimated cost of 11 for instruction: %A0 = mul <2 x i32> undef, undef
>> >  ; SLM-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
>> >  ;
>> >  ; GLM-LABEL: 'mul_2i32'
>> > -; GLM-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %A0 = mul <2 x i32> undef, undef
>> > +; GLM-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %A0 = mul <2 x i32> undef, undef
>> >  ; GLM-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
>> >  ;
>> >  ; BTVER2-LABEL: 'mul_2i32'
>> > -; BTVER2-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %A0 = mul <2 x i32> undef, undef
>> > +; BTVER2-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %A0 = mul <2 x i32> undef, undef
>> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
>> >  ;
>> >    %A0 = mul <2 x i32> undef, undef
>> >
>> > Modified: llvm/trunk/test/Analysis/CostModel/X86/cast.ll
>> > URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Analysis/CostModel/X86/cast.ll?rev=368183&r1=368182&r2=368183&view=diff
>> > ==============================================================================
>> > --- llvm/trunk/test/Analysis/CostModel/X86/cast.ll (original)
>> > +++ llvm/trunk/test/Analysis/CostModel/X86/cast.ll Wed Aug  7 09:24:26 2019
>> > @@ -315,10 +315,10 @@ define void @sitofp4(<4 x i1> %a, <4 x i
>> >  ; SSE-LABEL: 'sitofp4'
>> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %A1 = sitofp <4 x i1> %a to <4 x float>
>> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 40 for instruction: %A2 = sitofp <4 x i1> %a to <4 x double>
>> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %B1 = sitofp <4 x i8> %b to <4 x float>
>> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 40 for instruction: %B2 = sitofp <4 x i8> %b to <4 x double>
>> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %C1 = sitofp <4 x i16> %c to <4 x float>
>> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 40 for instruction: %C2 = sitofp <4 x i16> %c to <4 x double>
>> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %B1 = sitofp <4 x i8> %b to <4 x float>
>> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 160 for instruction: %B2 = sitofp <4 x i8> %b to <4 x double>
>> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %C1 = sitofp <4 x i16> %c to <4 x float>
>> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 80 for instruction: %C2 = sitofp <4 x i16> %c to <4 x double>
>> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %D1 = sitofp <4 x i32> %d to <4 x float>
>> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 40 for instruction: %D2 = sitofp <4 x i32> %d to <4 x double>
>> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
>> > @@ -359,7 +359,7 @@ define void @sitofp4(<4 x i1> %a, <4 x i
>> >  define void @sitofp8(<8 x i1> %a, <8 x i8> %b, <8 x i16> %c, <8 x i32> %d) {
>> >  ; SSE-LABEL: 'sitofp8'
>> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %A1 = sitofp <8 x i1> %a to <8 x float>
>> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %B1 = sitofp <8 x i8> %b to <8 x float>
>> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %B1 = sitofp <8 x i8> %b to <8 x float>
>> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %C1 = sitofp <8 x i16> %c to <8 x float>
>> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %D1 = sitofp <8 x i32> %d to <8 x float>
>> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
>> > @@ -390,9 +390,9 @@ define void @uitofp4(<4 x i1> %a, <4 x i
>> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %A1 = uitofp <4 x i1> %a to <4 x float>
>> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 40 for instruction: %A2 = uitofp <4 x i1> %a to <4 x double>
>> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %B1 = uitofp <4 x i8> %b to <4 x float>
>> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 40 for instruction: %B2 = uitofp <4 x i8> %b to <4 x double>
>> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %C1 = uitofp <4 x i16> %c to <4 x float>
>> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 40 for instruction: %C2 = uitofp <4 x i16> %c to <4 x double>
>> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 160 for instruction: %B2 = uitofp <4 x i8> %b to <4 x double>
>> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %C1 = uitofp <4 x i16> %c to <4 x float>
>> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 80 for instruction: %C2 = uitofp <4 x i16> %c to <4 x double>
>> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %D1 = uitofp <4 x i32> %d to <4 x float>
>> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 40 for instruction: %D2 = uitofp <4 x i32> %d to <4 x double>
>> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
>> > @@ -433,7 +433,7 @@ define void @uitofp4(<4 x i1> %a, <4 x i
>> >  define void @uitofp8(<8 x i1> %a, <8 x i8> %b, <8 x i16> %c, <8 x i32> %d) {
>> >  ; SSE-LABEL: 'uitofp8'
>> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %A1 = uitofp <8 x i1> %a to <8 x float>
>> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %B1 = uitofp <8 x i8> %b to <8 x float>
>> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %B1 = uitofp <8 x i8> %b to <8 x float>
>> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 15 for instruction: %C1 = uitofp <8 x i16> %c to <8 x float>
>> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %D1 = uitofp <8 x i32> %d to <8 x float>
>> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
>> >
>> > Modified: llvm/trunk/test/Analysis/CostModel/X86/fptosi.ll
>> > URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Analysis/CostModel/X86/fptosi.ll?rev=368183&r1=368182&r2=368183&view=diff
>> > ==============================================================================
>> > --- llvm/trunk/test/Analysis/CostModel/X86/fptosi.ll (original)
>> > +++ llvm/trunk/test/Analysis/CostModel/X86/fptosi.ll Wed Aug  7 09:24:26 2019
>> > @@ -92,35 +92,28 @@ define i32 @fptosi_double_i32(i32 %arg)
>> >  define i32 @fptosi_double_i16(i32 %arg) {
>> >  ; SSE-LABEL: 'fptosi_double_i16'
>> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptosi double undef to i16
>> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>
>> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 13 for instruction: %V4I16 = fptosi <4 x double> undef to <4 x i16>
>> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 27 for instruction: %V8I16 = fptosi <8 x double> undef to <8 x i16>
>> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>
>> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V4I16 = fptosi <4 x double> undef to <4 x i16>
>> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %V8I16 = fptosi <8 x double> undef to <8 x i16>
>> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> >  ;
>> >  ; AVX-LABEL: 'fptosi_double_i16'
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptosi double undef to i16
>> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>
>> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptosi <4 x double> undef to <4 x i16>
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V8I16 = fptosi <8 x double> undef to <8 x i16>
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> >  ;
>> > -; AVX512F-LABEL: 'fptosi_double_i16'
>> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptosi double undef to i16
>> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>
>> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptosi <4 x double> undef to <4 x i16>
>> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = fptosi <8 x double> undef to <8 x i16>
>> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> > -;
>> > -; AVX512DQ-LABEL: 'fptosi_double_i16'
>> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptosi double undef to i16
>> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>
>> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptosi <4 x double> undef to <4 x i16>
>> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = fptosi <8 x double> undef to <8 x i16>
>> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> > +; AVX512-LABEL: 'fptosi_double_i16'
>> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptosi double undef to i16
>> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>
>> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptosi <4 x double> undef to <4 x i16>
>> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = fptosi <8 x double> undef to <8 x i16>
>> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> >  ;
>> >  ; BTVER2-LABEL: 'fptosi_double_i16'
>> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptosi double undef to i16
>> > -; BTVER2-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>
>> > +; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I16 = fptosi <2 x double> undef to <2 x i16>
>> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptosi <4 x double> undef to <4 x i16>
>> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V8I16 = fptosi <8 x double> undef to <8 x i16>
>> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> > @@ -143,29 +136,22 @@ define i32 @fptosi_double_i8(i32 %arg) {
>> >  ; AVX-LABEL: 'fptosi_double_i8'
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptosi double undef to i8
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V2I8 = fptosi <2 x double> undef to <2 x i8>
>> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I8 = fptosi <4 x double> undef to <4 x i8>
>> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V8I8 = fptosi <8 x double> undef to <8 x i8>
>> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V4I8 = fptosi <4 x double> undef to <4 x i8>
>> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 25 for instruction: %V8I8 = fptosi <8 x double> undef to <8 x i8>
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> >  ;
>> > -; AVX512F-LABEL: 'fptosi_double_i8'
>> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptosi double undef to i8
>> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V2I8 = fptosi <2 x double> undef to <2 x i8>
>> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I8 = fptosi <4 x double> undef to <4 x i8>
>> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I8 = fptosi <8 x double> undef to <8 x i8>
>> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> > -;
>> > -; AVX512DQ-LABEL: 'fptosi_double_i8'
>> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptosi double undef to i8
>> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I8 = fptosi <2 x double> undef to <2 x i8>
>> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I8 = fptosi <4 x double> undef to <4 x i8>
>> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I8 = fptosi <8 x double> undef to <8 x i8>
>> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> > +; AVX512-LABEL: 'fptosi_double_i8'
>> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptosi double undef to i8
>> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I8 = fptosi <2 x double> undef to <2 x i8>
>> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I8 = fptosi <4 x double> undef to <4 x i8>
>> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I8 = fptosi <8 x double> undef to <8 x i8>
>> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> >  ;
>> >  ; BTVER2-LABEL: 'fptosi_double_i8'
>> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptosi double undef to i8
>> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V2I8 = fptosi <2 x double> undef to <2 x i8>
>> > -; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I8 = fptosi <4 x double> undef to <4 x i8>
>> > -; BTVER2-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V8I8 = fptosi <8 x double> undef to <8 x i8>
>> > +; BTVER2-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V4I8 = fptosi <4 x double> undef to <4 x i8>
>> > +; BTVER2-NEXT:  Cost Model: Found an estimated cost of 25 for instruction: %V8I8 = fptosi <8 x double> undef to <8 x i8>
>> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> >  ;
>> >    %I8 = fptosi double undef to i8
>> > @@ -285,9 +271,9 @@ define i32 @fptosi_float_i16(i32 %arg) {
>> >  define i32 @fptosi_float_i8(i32 %arg) {
>> >  ; SSE-LABEL: 'fptosi_float_i8'
>> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptosi float undef to i8
>> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I8 = fptosi <4 x float> undef to <4 x i8>
>> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V8I8 = fptosi <8 x float> undef to <8 x i8>
>> > -; SSE-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %V16I8 = fptosi <16 x float> undef to <16 x i8>
>> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V4I8 = fptosi <4 x float> undef to <4 x i8>
>> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 25 for instruction: %V8I8 = fptosi <8 x float> undef to <8 x i8>
>> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 51 for instruction: %V16I8 = fptosi <16 x float> undef to <16 x i8>
>> >  ; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> >  ;
>> >  ; AVX-LABEL: 'fptosi_float_i8'
>> >
>> > Modified: llvm/trunk/test/Analysis/CostModel/X86/fptoui.ll
>> > URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Analysis/CostModel/X86/fptoui.ll?rev=368183&r1=368182&r2=368183&view=diff
>> > ==============================================================================
>> > --- llvm/trunk/test/Analysis/CostModel/X86/fptoui.ll (original)
>> > +++ llvm/trunk/test/Analysis/CostModel/X86/fptoui.ll Wed Aug  7 09:24:26 2019
>> > @@ -68,19 +68,12 @@ define i32 @fptoui_double_i32(i32 %arg)
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 33 for instruction: %V8I32 = fptoui <8 x double> undef to <8 x i32>
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> >  ;
>> > -; AVX512F-LABEL: 'fptoui_double_i32'
>> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui double undef to i32
>> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V2I32 = fptoui <2 x double> undef to <2 x i32>
>> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = fptoui <4 x double> undef to <4 x i32>
>> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I32 = fptoui <8 x double> undef to <8 x i32>
>> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> > -;
>> > -; AVX512DQ-LABEL: 'fptoui_double_i32'
>> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui double undef to i32
>> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I32 = fptoui <2 x double> undef to <2 x i32>
>> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = fptoui <4 x double> undef to <4 x i32>
>> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I32 = fptoui <8 x double> undef to <8 x i32>
>> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> > +; AVX512-LABEL: 'fptoui_double_i32'
>> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui double undef to i32
>> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I32 = fptoui <2 x double> undef to <2 x i32>
>> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = fptoui <4 x double> undef to <4 x i32>
>> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I32 = fptoui <8 x double> undef to <8 x i32>
>> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> >  ;
>> >  ; BTVER2-LABEL: 'fptoui_double_i32'
>> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I32 = fptoui double undef to i32
>> > @@ -106,30 +99,23 @@ define i32 @fptoui_double_i16(i32 %arg)
>> >  ;
>> >  ; AVX-LABEL: 'fptoui_double_i16'
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptoui double undef to i16
>> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V2I16 = fptoui <2 x double> undef to <2 x i16>
>> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V4I16 = fptoui <4 x double> undef to <4 x i16>
>> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 25 for instruction: %V8I16 = fptoui <8 x double> undef to <8 x i16>
>> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I16 = fptoui <2 x double> undef to <2 x i16>
>> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptoui <4 x double> undef to <4 x i16>
>> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V8I16 = fptoui <8 x double> undef to <8 x i16>
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> >  ;
>> > -; AVX512F-LABEL: 'fptoui_double_i16'
>> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptoui double undef to i16
>> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V2I16 = fptoui <2 x double> undef to <2 x i16>
>> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptoui <4 x double> undef to <4 x i16>
>> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8I16 = fptoui <8 x double> undef to <8 x i16>
>> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> > -;
>> > -; AVX512DQ-LABEL: 'fptoui_double_i16'
>> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptoui double undef to i16
>> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I16 = fptoui <2 x double> undef to <2 x i16>
>> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptoui <4 x double> undef to <4 x i16>
>> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8I16 = fptoui <8 x double> undef to <8 x i16>
>> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> > +; AVX512-LABEL: 'fptoui_double_i16'
>> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptoui double undef to i16
>> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I16 = fptoui <2 x double> undef to <2 x i16>
>> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptoui <4 x double> undef to <4 x i16>
>> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8I16 = fptoui <8 x double> undef to <8 x i16>
>> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> >  ;
>> >  ; BTVER2-LABEL: 'fptoui_double_i16'
>> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptoui double undef to i16
>> > -; BTVER2-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V2I16 = fptoui <2 x double> undef to <2 x i16>
>> > -; BTVER2-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V4I16 = fptoui <4 x double> undef to <4 x i16>
>> > -; BTVER2-NEXT:  Cost Model: Found an estimated cost of 25 for instruction: %V8I16 = fptoui <8 x double> undef to <8 x i16>
>> > +; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I16 = fptoui <2 x double> undef to <2 x i16>
>> > +; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptoui <4 x double> undef to <4 x i16>
>> > +; BTVER2-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V8I16 = fptoui <8 x double> undef to <8 x i16>
>> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> >  ;
>> >    %I16 = fptoui double undef to i16
>> > @@ -154,19 +140,12 @@ define i32 @fptoui_double_i8(i32 %arg) {
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 25 for instruction: %V8I8 = fptoui <8 x double> undef to <8 x i8>
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> >  ;
>> > -; AVX512F-LABEL: 'fptoui_double_i8'
>> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptoui double undef to i8
>> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V2I8 = fptoui <2 x double> undef to <2 x i8>
>> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I8 = fptoui <4 x double> undef to <4 x i8>
>> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8I8 = fptoui <8 x double> undef to <8 x i8>
>> > -; AVX512F-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> > -;
>> > -; AVX512DQ-LABEL: 'fptoui_double_i8'
>> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptoui double undef to i8
>> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I8 = fptoui <2 x double> undef to <2 x i8>
>> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I8 = fptoui <4 x double> undef to <4 x i8>
>> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8I8 = fptoui <8 x double> undef to <8 x i8>
>> > -; AVX512DQ-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> > +; AVX512-LABEL: 'fptoui_double_i8'
>> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptoui double undef to i8
>> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2I8 = fptoui <2 x double> undef to <2 x i8>
>> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I8 = fptoui <4 x double> undef to <4 x i8>
>> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8I8 = fptoui <8 x double> undef to <8 x i8>
>> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> >  ;
>> >  ; BTVER2-LABEL: 'fptoui_double_i8'
>> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptoui double undef to i8
>> > @@ -277,7 +256,7 @@ define i32 @fptoui_float_i16(i32 %arg) {
>> >  ;
>> >  ; AVX-LABEL: 'fptoui_float_i16'
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptoui float undef to i16
>> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V4I16 = fptoui <4 x float> undef to <4 x i16>
>> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptoui <4 x float> undef to <4 x i16>
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = fptoui <8 x float> undef to <8 x i16>
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V16I16 = fptoui <16 x float> undef to <16 x i16>
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> > @@ -291,7 +270,7 @@ define i32 @fptoui_float_i16(i32 %arg) {
>> >  ;
>> >  ; BTVER2-LABEL: 'fptoui_float_i16'
>> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I16 = fptoui float undef to i16
>> > -; BTVER2-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V4I16 = fptoui <4 x float> undef to <4 x i16>
>> > +; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I16 = fptoui <4 x float> undef to <4 x i16>
>> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = fptoui <8 x float> undef to <8 x i16>
>> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V16I16 = fptoui <16 x float> undef to <16 x i16>
>> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> > @@ -314,8 +293,8 @@ define i32 @fptoui_float_i8(i32 %arg) {
>> >  ; AVX-LABEL: 'fptoui_float_i8'
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptoui float undef to i8
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V4I8 = fptoui <4 x float> undef to <4 x i8>
>> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I8 = fptoui <8 x float> undef to <8 x i8>
>> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V16I8 = fptoui <16 x float> undef to <16 x i8>
>> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 24 for instruction: %V8I8 = fptoui <8 x float> undef to <8 x i8>
>> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 49 for instruction: %V16I8 = fptoui <16 x float> undef to <16 x i8>
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> >  ;
>> >  ; AVX512-LABEL: 'fptoui_float_i8'
>> > @@ -328,8 +307,8 @@ define i32 @fptoui_float_i8(i32 %arg) {
>> >  ; BTVER2-LABEL: 'fptoui_float_i8'
>> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %I8 = fptoui float undef to i8
>> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V4I8 = fptoui <4 x float> undef to <4 x i8>
>> > -; BTVER2-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I8 = fptoui <8 x float> undef to <8 x i8>
>> > -; BTVER2-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V16I8 = fptoui <16 x float> undef to <16 x i8>
>> > +; BTVER2-NEXT:  Cost Model: Found an estimated cost of 24 for instruction: %V8I8 = fptoui <8 x float> undef to <8 x i8>
>> > +; BTVER2-NEXT:  Cost Model: Found an estimated cost of 49 for instruction: %V16I8 = fptoui <16 x float> undef to <16 x i8>
>> >  ; BTVER2-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> >  ;
>> >    %I8 = fptoui float undef to i8
>> >
>> > Modified: llvm/trunk/test/Analysis/CostModel/X86/masked-intrinsic-cost.ll
>> > URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Analysis/CostModel/X86/masked-intrinsic-cost.ll?rev=368183&r1=368182&r2=368183&view=diff
>> > ==============================================================================
>> > --- llvm/trunk/test/Analysis/CostModel/X86/masked-intrinsic-cost.ll (original)
>> > +++ llvm/trunk/test/Analysis/CostModel/X86/masked-intrinsic-cost.ll Wed Aug  7 09:24:26 2019
>> > @@ -52,7 +52,7 @@ define i32 @masked_load() {
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V16I32 = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32>* undef, i32 1, <16 x i1> undef, <16 x i32> undef)
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V8I32 = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32>* undef, i32 1, <8 x i1> undef, <8 x i32> undef)
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V4I32 = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* undef, i32 1, <4 x i1> undef, <4 x i32> undef)
>> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V2I32 = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* undef, i32 1, <2 x i1> undef, <2 x i32> undef)
>> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V2I32 = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* undef, i32 1, <2 x i1> undef, <2 x i32> undef)
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 128 for instruction: %V32I16 = call <32 x i16> @llvm.masked.load.v32i16.p0v32i16(<32 x i16>* undef, i32 1, <32 x i1> undef, <32 x i16> undef)
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 64 for instruction: %V16I16 = call <16 x i16> @llvm.masked.load.v16i16.p0v16i16(<16 x i16>* undef, i32 1, <16 x i1> undef, <16 x i16> undef)
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 32 for instruction: %V8I16 = call <8 x i16> @llvm.masked.load.v8i16.p0v8i16(<8 x i16>* undef, i32 1, <8 x i1> undef, <8 x i16> undef)
>> > @@ -79,7 +79,7 @@ define i32 @masked_load() {
>> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V16I32 = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32>* undef, i32 1, <16 x i1> undef, <16 x i32> undef)
>> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I32 = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32>* undef, i32 1, <8 x i1> undef, <8 x i32> undef)
>> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* undef, i32 1, <4 x i1> undef, <4 x i32> undef)
>> > -; KNL-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V2I32 = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* undef, i32 1, <2 x i1> undef, <2 x i32> undef)
>> > +; KNL-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %V2I32 = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* undef, i32 1, <2 x i1> undef, <2 x i32> undef)
>> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 128 for instruction: %V32I16 = call <32 x i16> @llvm.masked.load.v32i16.p0v32i16(<32 x i16>* undef, i32 1, <32 x i1> undef, <32 x i16> undef)
>> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 64 for instruction: %V16I16 = call <16 x i16> @llvm.masked.load.v16i16.p0v16i16(<16 x i16>* undef, i32 1, <16 x i1> undef, <16 x i16> undef)
>> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 32 for instruction: %V8I16 = call <8 x i16> @llvm.masked.load.v8i16.p0v8i16(<8 x i16>* undef, i32 1, <8 x i1> undef, <8 x i16> undef)
>> > @@ -106,15 +106,15 @@ define i32 @masked_load() {
>> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V16I32 = call <16 x i32> @llvm.masked.load.v16i32.p0v16i32(<16 x i32>* undef, i32 1, <16 x i1> undef, <16 x i32> undef)
>> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I32 = call <8 x i32> @llvm.masked.load.v8i32.p0v8i32(<8 x i32>* undef, i32 1, <8 x i1> undef, <8 x i32> undef)
>> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V4I32 = call <4 x i32> @llvm.masked.load.v4i32.p0v4i32(<4 x i32>* undef, i32 1, <4 x i1> undef, <4 x i32> undef)
>> > -; SKX-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V2I32 = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* undef, i32 1, <2 x i1> undef, <2 x i32> undef)
>> > +; SKX-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %V2I32 = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* undef, i32 1, <2 x i1> undef, <2 x i32> undef)
>> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V32I16 = call <32 x i16> @llvm.masked.load.v32i16.p0v32i16(<32 x i16>* undef, i32 1, <32 x i1> undef, <32 x i16> undef)
>> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V16I16 = call <16 x i16> @llvm.masked.load.v16i16.p0v16i16(<16 x i16>* undef, i32 1, <16 x i1> undef, <16 x i16> undef)
>> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V8I16 = call <8 x i16> @llvm.masked.load.v8i16.p0v8i16(<8 x i16>* undef, i32 1, <8 x i1> undef, <8 x i16> undef)
>> > -; SKX-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V4I16 = call <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16>* undef, i32 1, <4 x i1> undef, <4 x i16> undef)
>> > +; SKX-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: %V4I16 = call <4 x i16> @llvm.masked.load.v4i16.p0v4i16(<4 x i16>* undef, i32 1, <4 x i1> undef, <4 x i16> undef)
>> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V64I8 = call <64 x i8> @llvm.masked.load.v64i8.p0v64i8(<64 x i8>* undef, i32 1, <64 x i1> undef, <64 x i8> undef)
>> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V32I8 = call <32 x i8> @llvm.masked.load.v32i8.p0v32i8(<32 x i8>* undef, i32 1, <32 x i1> undef, <32 x i8> undef)
>> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V16I8 = call <16 x i8> @llvm.masked.load.v16i8.p0v16i8(<16 x i8>* undef, i32 1, <16 x i1> undef, <16 x i8> undef)
>> > -; SKX-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V8I8 = call <8 x i8> @llvm.masked.load.v8i8.p0v8i8(<8 x i8>* undef, i32 1, <8 x i1> undef, <8 x i8> undef)
>> > +; SKX-NEXT:  Cost Model: Found an estimated cost of 17 for instruction: %V8I8 = call <8 x i8> @llvm.masked.load.v8i8.p0v8i8(<8 x i8>* undef, i32 1, <8 x i1> undef, <8 x i8> undef)
>> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 0
>> >  ;
>> >    %V8F64 = call <8 x double> @llvm.masked.load.v8f64.p0v8f64(<8 x double>* undef, i32 1, <8 x i1> undef, <8 x double> undef)
>> > @@ -194,7 +194,7 @@ define i32 @masked_store() {
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: call void @llvm.masked.store.v16i32.p0v16i32(<16 x i32> undef, <16 x i32>* undef, i32 1, <16 x i1> undef)
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> undef, <8 x i32>* undef, i32 1, <8 x i1> undef)
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> undef, <4 x i32>* undef, i32 1, <4 x i1> undef)
>> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> undef, <2 x i32>* undef, i32 1, <2 x i1> undef)
>> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> undef, <2 x i32>* undef, i32 1, <2 x i1> undef)
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 128 for instruction: call void @llvm.masked.store.v32i16.p0v32i16(<32 x i16> undef, <32 x i16>* undef, i32 1, <32 x i1> undef)
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 64 for instruction: call void @llvm.masked.store.v16i16.p0v16i16(<16 x i16> undef, <16 x i16>* undef, i32 1, <16 x i1> undef)
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 32 for instruction: call void @llvm.masked.store.v8i16.p0v8i16(<8 x i16> undef, <8 x i16>* undef, i32 1, <8 x i1> undef)
>> > @@ -221,7 +221,7 @@ define i32 @masked_store() {
>> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v16i32.p0v16i32(<16 x i32> undef, <16 x i32>* undef, i32 1, <16 x i1> undef)
>> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> undef, <8 x i32>* undef, i32 1, <8 x i1> undef)
>> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> undef, <4 x i32>* undef, i32 1, <4 x i1> undef)
>> > -; KNL-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> undef, <2 x i32>* undef, i32 1, <2 x i1> undef)
>> > +; KNL-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> undef, <2 x i32>* undef, i32 1, <2 x i1> undef)
>> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 128 for instruction: call void @llvm.masked.store.v32i16.p0v32i16(<32 x i16> undef, <32 x i16>* undef, i32 1, <32 x i1> undef)
>> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 64 for instruction: call void @llvm.masked.store.v16i16.p0v16i16(<16 x i16> undef, <16 x i16>* undef, i32 1, <16 x i1> undef)
>> >  ; KNL-NEXT:  Cost Model: Found an estimated cost of 32 for instruction: call void @llvm.masked.store.v8i16.p0v8i16(<8 x i16> undef, <8 x i16>* undef, i32 1, <8 x i1> undef)
>> > @@ -248,15 +248,15 @@ define i32 @masked_store() {
>> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v16i32.p0v16i32(<16 x i32> undef, <16 x i32>* undef, i32 1, <16 x i1> undef)
>> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v8i32.p0v8i32(<8 x i32> undef, <8 x i32>* undef, i32 1, <8 x i1> undef)
>> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v4i32.p0v4i32(<4 x i32> undef, <4 x i32>* undef, i32 1, <4 x i1> undef)
>> > -; SKX-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> undef, <2 x i32>* undef, i32 1, <2 x i1> undef)
>> > +; SKX-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> undef, <2 x i32>* undef, i32 1, <2 x i1> undef)
>> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v32i16.p0v32i16(<32 x i16> undef, <32 x i16>* undef, i32 1, <32 x i1> undef)
>> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v16i16.p0v16i16(<16 x i16> undef, <16 x i16>* undef, i32 1, <16 x i1> undef)
>> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v8i16.p0v8i16(<8 x i16> undef, <8 x i16>* undef, i32 1, <8 x i1> undef)
>> > -; SKX-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: call void @llvm.masked.store.v4i16.p0v4i16(<4 x i16> undef, <4 x i16>* undef, i32 1, <4 x i1> undef)
>> > +; SKX-NEXT:  Cost Model: Found an estimated cost of 9 for instruction: call void @llvm.masked.store.v4i16.p0v4i16(<4 x i16> undef, <4 x i16>* undef, i32 1, <4 x i1> undef)
>> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v64i8.p0v64i8(<64 x i8> undef, <64 x i8>* undef, i32 1, <64 x i1> undef)
>> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v32i8.p0v32i8(<32 x i8> undef, <32 x i8>* undef, i32 1, <32 x i1> undef)
>> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: call void @llvm.masked.store.v16i8.p0v16i8(<16 x i8> undef, <16 x i8>* undef, i32 1, <16 x i1> undef)
>> > -; SKX-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: call void @llvm.masked.store.v8i8.p0v8i8(<8 x i8> undef, <8 x i8>* undef, i32 1, <8 x i1> undef)
>> > +; SKX-NEXT:  Cost Model: Found an estimated cost of 17 for instruction: call void @llvm.masked.store.v8i8.p0v8i8(<8 x i8> undef, <8 x i8>* undef, i32 1, <8 x i1> undef)
>> >  ; SKX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 0
>> >  ;
>> >    call void @llvm.masked.store.v8f64.p0v8f64(<8 x double> undef, <8 x double>* undef, i32 1, <8 x i1> undef)
>> > @@ -960,15 +960,10 @@ define <8 x float> @test4(<8 x i32> %tri
>> >  }
>> >
>> >  define void @test5(<2 x i32> %trigger, <2 x float>* %addr, <2 x float> %val) {
>> > -; SSE2-LABEL: 'test5'
>> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> %val, <2 x float>* %addr, i32 4, <2 x i1> %mask)
>> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
>> > -;
>> > -; SSE42-LABEL: 'test5'
>> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> %val, <2 x float>* %addr, i32 4, <2 x i1> %mask)
>> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
>> > +; SSE-LABEL: 'test5'
>> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: call void @llvm.masked.store.v2f32.p0v2f32(<2 x float> %val, <2 x float>* %addr, i32 4, <2 x i1> %mask)
>> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
>> >  ;
>> >  ; AVX-LABEL: 'test5'
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>> > @@ -986,24 +981,19 @@ define void @test5(<2 x i32> %trigger, <
>> >  }
>> >
>> >  define void @test6(<2 x i32> %trigger, <2 x i32>* %addr, <2 x i32> %val) {
>> > -; SSE2-LABEL: 'test6'
>> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2 x i32>* %addr, i32 4, <2 x i1> %mask)
>> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
>> > -;
>> > -; SSE42-LABEL: 'test6'
>> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2 x i32>* %addr, i32 4, <2 x i1> %mask)
>> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
>> > +; SSE-LABEL: 'test6'
>> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2 x i32>* %addr, i32 4, <2 x i1> %mask)
>> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
>> >  ;
>> >  ; AVX-LABEL: 'test6'
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2 x i32>* %addr, i32 4, <2 x i1> %mask)
>> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2 x i32>* %addr, i32 4, <2 x i1> %mask)
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
>> >  ;
>> >  ; AVX512-LABEL: 'test6'
>> >  ; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>> > -; AVX512-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2 x i32>* %addr, i32 4, <2 x i1> %mask)
>> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: call void @llvm.masked.store.v2i32.p0v2i32(<2 x i32> %val, <2 x i32>* %addr, i32 4, <2 x i1> %mask)
>> >  ; AVX512-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret void
>> >  ;
>> >    %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>> > @@ -1012,15 +1002,10 @@ define void @test6(<2 x i32> %trigger, <
>> >  }
>> >
>> >  define <2 x float> @test7(<2 x i32> %trigger, <2 x float>* %addr, <2 x float> %dst) {
>> > -; SSE2-LABEL: 'test7'
>> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %res = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x float>* %addr, i32 4, <2 x i1> %mask, <2 x float> %dst)
>> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x float> %res
>> > -;
>> > -; SSE42-LABEL: 'test7'
>> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %res = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x float>* %addr, i32 4, <2 x i1> %mask, <2 x float> %dst)
>> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x float> %res
>> > +; SSE-LABEL: 'test7'
>> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %res = call <2 x float> @llvm.masked.load.v2f32.p0v2f32(<2 x float>* %addr, i32 4, <2 x i1> %mask, <2 x float> %dst)
>> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x float> %res
>> >  ;
>> >  ; AVX-LABEL: 'test7'
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>> > @@ -1038,24 +1023,19 @@ define <2 x float> @test7(<2 x i32> %tri
>> >  }
>> >
>> >  define <2 x i32> @test8(<2 x i32> %trigger, <2 x i32>* %addr, <2 x i32> %dst) {
>> > -; SSE2-LABEL: 'test8'
>> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %res = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* %addr, i32 4, <2 x i1> %mask, <2 x i32> %dst)
>> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %res
>> > -;
>> > -; SSE42-LABEL: 'test8'
>> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %res = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* %addr, i32 4, <2 x i1> %mask, <2 x i32> %dst)
>> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %res
>> > +; SSE-LABEL: 'test8'
>> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %res = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* %addr, i32 4, <2 x i1> %mask, <2 x i32> %dst)
>> > +; SSE-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %res
>> >  ;
>> >  ; AVX-LABEL: 'test8'
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %res = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* %addr, i32 4, <2 x i1> %mask, <2 x i32> %dst)
>> > +; AVX-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %res = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* %addr, i32 4, <2 x i1> %mask, <2 x i32> %dst)
>> >  ; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %res
>> >  ;
>> >  ; AVX512-LABEL: 'test8'
>> >  ; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>> > -; AVX512-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %res = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* %addr, i32 4, <2 x i1> %mask, <2 x i32> %dst)
>> > +; AVX512-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %res = call <2 x i32> @llvm.masked.load.v2i32.p0v2i32(<2 x i32>* %addr, i32 4, <2 x i1> %mask, <2 x i32> %dst)
>> >  ; AVX512-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret <2 x i32> %res
>> >  ;
>> >    %mask = icmp eq <2 x i32> %trigger, zeroinitializer
>> >
>> > Removed: llvm/trunk/test/Analysis/CostModel/X86/reduce-add-widen.ll
>> > URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Analysis/CostModel/X86/reduce-add-widen.ll?rev=368182&view=auto
>> > ==============================================================================
>> > --- llvm/trunk/test/Analysis/CostModel/X86/reduce-add-widen.ll (original)
>> > +++ llvm/trunk/test/Analysis/CostModel/X86/reduce-add-widen.ll (removed)
>> > @@ -1,307 +0,0 @@
>> > -; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
>> > -; RUN: opt < %s -x86-experimental-vector-widening-legalization -cost-model -mtriple=x86_64-apple-darwin -analyze -mattr=+sse2 | FileCheck %s --check-prefixes=CHECK,SSE,SSE2
>> > -; RUN: opt < %s -x86-experimental-vector-widening-legalization -cost-model -mtriple=x86_64-apple-darwin -analyze -mattr=+ssse3 | FileCheck %s --check-prefixes=CHECK,SSE,SSSE3
>> > -; RUN: opt < %s -x86-experimental-vector-widening-legalization -cost-model -mtriple=x86_64-apple-darwin -analyze -mattr=+sse4.2 | FileCheck %s --check-prefixes=CHECK,SSE,SSE42
>> > -; RUN: opt < %s -x86-experimental-vector-widening-legalization -cost-model -mtriple=x86_64-apple-darwin -analyze -mattr=+avx | FileCheck %s --check-prefixes=CHECK,AVX,AVX1
>> > -; RUN: opt < %s -x86-experimental-vector-widening-legalization -cost-model -mtriple=x86_64-apple-darwin -analyze -mattr=+avx2 | FileCheck %s --check-prefixes=CHECK,AVX,AVX2
>> > -; RUN: opt < %s -x86-experimental-vector-widening-legalization -cost-model -mtriple=x86_64-apple-darwin -analyze -mattr=+avx512f | FileCheck %s --check-prefixes=CHECK,AVX512,AVX512F
>> > -; RUN: opt < %s -x86-experimental-vector-widening-legalization -cost-model -mtriple=x86_64-apple-darwin -analyze -mattr=+avx512f,+avx512bw | FileCheck %s --check-prefixes=CHECK,AVX512,AVX512BW
>> > -; RUN: opt < %s -x86-experimental-vector-widening-legalization -cost-model -mtriple=x86_64-apple-darwin -analyze -mattr=+avx512f,+avx512dq | FileCheck %s --check-prefixes=CHECK,AVX512,AVX512DQ
>> > -
>> > -define i32 @reduce_i64(i32 %arg) {
>> > -; SSE2-LABEL: 'reduce_i64'
>> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %V1 = call i64 @llvm.experimental.vector.reduce.add.v1i64(<1 x i64> undef)
>> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V2 = call i64 @llvm.experimental.vector.reduce.add.v2i64(<2 x i64> undef)
>> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V4 = call i64 @llvm.experimental.vector.reduce.add.v4i64(<4 x i64> undef)
>> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V8 = call i64 @llvm.experimental.vector.reduce.add.v8i64(<8 x i64> undef)
>> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %V16 = call i64 @llvm.experimental.vector.reduce.add.v16i64(<16 x i64> undef)
>> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> > -;
>> > -; SSSE3-LABEL: 'reduce_i64'
>> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %V1 = call i64 @llvm.experimental.vector.reduce.add.v1i64(<1 x i64> undef)
>> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V2 = call i64 @llvm.experimental.vector.reduce.add.v2i64(<2 x i64> undef)
>> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V4 = call i64 @llvm.experimental.vector.reduce.add.v4i64(<4 x i64> undef)
>> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V8 = call i64 @llvm.experimental.vector.reduce.add.v8i64(<8 x i64> undef)
>> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %V16 = call i64 @llvm.experimental.vector.reduce.add.v16i64(<16 x i64> undef)
>> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> > -;
>> > -; SSE42-LABEL: 'reduce_i64'
>> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %V1 = call i64 @llvm.experimental.vector.reduce.add.v1i64(<1 x i64> undef)
>> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %V2 = call i64 @llvm.experimental.vector.reduce.add.v2i64(<2 x i64> undef)
>> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 4 for instruction: %V4 = call i64 @llvm.experimental.vector.reduce.add.v4i64(<4 x i64> undef)
>> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %V8 = call i64 @llvm.experimental.vector.reduce.add.v8i64(<8 x i64> undef)
>> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 16 for instruction: %V16 = call i64 @llvm.experimental.vector.reduce.add.v16i64(<16 x i64> undef)
>> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> > -;
>> > -; AVX-LABEL: 'reduce_i64'
>> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %V1 = call i64 @llvm.experimental.vector.reduce.add.v1i64(<1 x i64> undef)
>> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2 = call i64 @llvm.experimental.vector.reduce.add.v2i64(<2 x i64> undef)
>> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V4 = call i64 @llvm.experimental.vector.reduce.add.v4i64(<4 x i64> undef)
>> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V8 = call i64 @llvm.experimental.vector.reduce.add.v8i64(<8 x i64> undef)
>> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V16 = call i64 @llvm.experimental.vector.reduce.add.v16i64(<16 x i64> undef)
>> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> > -;
>> > -; AVX512-LABEL: 'reduce_i64'
>> > -; AVX512-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: %V1 = call i64 @llvm.experimental.vector.reduce.add.v1i64(<1 x i64> undef)
>> > -; AVX512-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %V2 = call i64 @llvm.experimental.vector.reduce.add.v2i64(<2 x i64> undef)
>> > -; AVX512-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V4 = call i64 @llvm.experimental.vector.reduce.add.v4i64(<4 x i64> undef)
>> > -; AVX512-NEXT:  Cost Model: Found an estimated cost of 7 for instruction: %V8 = call i64 @llvm.experimental.vector.reduce.add.v8i64(<8 x i64> undef)
>> > -; AVX512-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %V16 = call i64 @llvm.experimental.vector.reduce.add.v16i64(<16 x i64> undef)
>> > -; AVX512-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> > -;
>> > -  %V1  = call i64 @llvm.experimental.vector.reduce.add.v1i64(<1 x i64> undef)
>> > -  %V2  = call i64 @llvm.experimental.vector.reduce.add.v2i64(<2 x i64> undef)
>> > -  %V4  = call i64 @llvm.experimental.vector.reduce.add.v4i64(<4 x i64> undef)
>> > -  %V8  = call i64 @llvm.experimental.vector.reduce.add.v8i64(<8 x i64> undef)
>> > -  %V16 = call i64 @llvm.experimental.vector.reduce.add.v16i64(<16 x i64> undef)
>> > -  ret i32 undef
>> > -}
>> > -
>> > -define i32 @reduce_i32(i32 %arg) {
>> > -; SSE2-LABEL: 'reduce_i32'
>> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V2 = call i32 @llvm.experimental.vector.reduce.add.v2i32(<2 x i32> undef)
>> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %V4 = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> undef)
>> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V8 = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32> undef)
>> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %V16 = call i32 @llvm.experimental.vector.reduce.add.v16i32(<16 x i32> undef)
>> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V32 = call i32 @llvm.experimental.vector.reduce.add.v32i32(<32 x i32> undef)
>> > -; SSE2-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> > -;
>> > -; SSSE3-LABEL: 'reduce_i32'
>> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V2 = call i32 @llvm.experimental.vector.reduce.add.v2i32(<2 x i32> undef)
>> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %V4 = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> undef)
>> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V8 = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32> undef)
>> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 8 for instruction: %V16 = call i32 @llvm.experimental.vector.reduce.add.v16i32(<16 x i32> undef)
>> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V32 = call i32 @llvm.experimental.vector.reduce.add.v32i32(<32 x i32> undef)
>> > -; SSSE3-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> > -;
>> > -; SSE42-LABEL: 'reduce_i32'
>> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V2 = call i32 @llvm.experimental.vector.reduce.add.v2i32(<2 x i32> undef)
>> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V4 = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> undef)
>> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 6 for instruction: %V8 = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32> undef)
>> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 12 for instruction: %V16 = call i32 @llvm.experimental.vector.reduce.add.v16i32(<16 x i32> undef)
>> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 24 for instruction: %V32 = call i32 @llvm.experimental.vector.reduce.add.v32i32(<32 x i32> undef)
>> > -; SSE42-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> > -;
>> > -; AVX-LABEL: 'reduce_i32'
>> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V2 = call i32 @llvm.experimental.vector.reduce.add.v2i32(<2 x i32> undef)
>> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V4 = call i32 @llvm.experimental.vector.reduce.add.v4i32(<4 x i32> undef)
>> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 5 for instruction: %V8 = call i32 @llvm.experimental.vector.reduce.add.v8i32(<8 x i32> undef)
>> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 10 for instruction: %V16 = call i32 @llvm.experimental.vector.reduce.add.v16i32(<16 x i32> undef)
>> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 20 for instruction: %V32 = call i32 @llvm.experimental.vector.reduce.add.v32i32(<32 x i32> undef)
>> > -; AVX-NEXT:  Cost Model: Found an estimated cost of 0 for instruction: ret i32 undef
>> > -;
>> > -; AVX512-LABEL: 'reduce_i32'
>> > -; AVX512-NEXT:  Cost Model: Found an estimated cost of 3 for instruction: %V2 = call i32 @llvm.experimental.vector.reduce.add.v2i32(<2 x i32> undef)
>> > -; AVX512-NEXT:  Cost Model: Found an estimated cost of 3 for i