[PATCH] D55251: [X86] Enable -x86-experimental-vector-widening-legalization by default.

Mon Dec 3 22:33:25 PST 2018

craig.topper created this revision.
craig.topper added reviewers: RKSimon, spatel, chandlerc.

This patch changes our defualt legalization behavior for narrow vectors with i8/i16/i32/i64 scalar types from promotion to widening. This keeps the elements widths the same and pads with undef elements. I believe this is a better legalization strategy. But it carries some issues due to the fragmented vector ISA. For example i8 shifts and multiplies get widened and then later have to be promoted/split into vXi16 vectors.

I'm sure there are still some issues in here, but I wanted to get this patch up so we could start spotting the remaining issues.

Repository:
  rL LLVM

https://reviews.llvm.org/D55251

Files:
  lib/Target/X86/X86ISelLowering.cpp
  lib/Target/X86/X86TargetTransformInfo.cpp
  test/Analysis/CostModel/X86/alternate-shuffle-cost.ll
  test/Analysis/CostModel/X86/arith.ll
  test/Analysis/CostModel/X86/cast.ll
  test/Analysis/CostModel/X86/fptosi.ll
  test/Analysis/CostModel/X86/fptoui.ll
  test/Analysis/CostModel/X86/masked-intrinsic-cost.ll
  test/Analysis/CostModel/X86/reduce-add.ll
  test/Analysis/CostModel/X86/reduce-and.ll
  test/Analysis/CostModel/X86/reduce-mul.ll
  test/Analysis/CostModel/X86/reduce-or.ll
  test/Analysis/CostModel/X86/reduce-smax.ll
  test/Analysis/CostModel/X86/reduce-smin.ll
  test/Analysis/CostModel/X86/reduce-umax.ll
  test/Analysis/CostModel/X86/reduce-umin.ll
  test/Analysis/CostModel/X86/reduce-xor.ll
  test/Analysis/CostModel/X86/shuffle-transpose.ll
  test/Analysis/CostModel/X86/sitofp.ll
  test/Analysis/CostModel/X86/slm-arith-costs.ll
  test/Analysis/CostModel/X86/testshiftashr.ll
  test/Analysis/CostModel/X86/testshiftlshr.ll
  test/Analysis/CostModel/X86/testshiftshl.ll
  test/Analysis/CostModel/X86/uitofp.ll
  test/CodeGen/X86/2008-09-05-sinttofp-2xi32.ll
  test/CodeGen/X86/2009-06-05-VZextByteShort.ll
  test/CodeGen/X86/2011-10-19-LegelizeLoad.ll
  test/CodeGen/X86/2011-12-8-bitcastintprom.ll
  test/CodeGen/X86/2012-01-18-vbitcast.ll
  test/CodeGen/X86/2012-03-15-build_vector_wl.ll
  test/CodeGen/X86/2012-07-10-extload64.ll
  test/CodeGen/X86/3dnow-intrinsics.ll
  test/CodeGen/X86/4char-promote.ll
  test/CodeGen/X86/avg.ll
  test/CodeGen/X86/avx-cvt-2.ll
  test/CodeGen/X86/avx-fp2int.ll
  test/CodeGen/X86/avx2-conversions.ll
  test/CodeGen/X86/avx2-masked-gather.ll
  test/CodeGen/X86/avx2-vbroadcast.ll
  test/CodeGen/X86/avx512-any_extend_load.ll
  test/CodeGen/X86/avx512-cvt.ll
  test/CodeGen/X86/avx512-ext.ll
  test/CodeGen/X86/avx512-intrinsics-upgrade.ll
  test/CodeGen/X86/avx512-mask-op.ll
  test/CodeGen/X86/avx512-schedule.ll
  test/CodeGen/X86/avx512-shuffles/broadcast-vector-int.ll
  test/CodeGen/X86/avx512-trunc.ll
  test/CodeGen/X86/avx512-vec-cmp.ll
  test/CodeGen/X86/avx512-vec3-crash.ll
  test/CodeGen/X86/avx512bwvl-intrinsics-upgrade.ll
  test/CodeGen/X86/avx512vl-intrinsics-fast-isel.ll
  test/CodeGen/X86/avx512vl-intrinsics-upgrade.ll
  test/CodeGen/X86/bitcast-and-setcc-128.ll
  test/CodeGen/X86/bitcast-setcc-128.ll
  test/CodeGen/X86/bitreverse.ll
  test/CodeGen/X86/bswap-vector.ll
  test/CodeGen/X86/buildvec-insertvec.ll
  test/CodeGen/X86/combine-64bit-vec-binop.ll
  test/CodeGen/X86/combine-or.ll
  test/CodeGen/X86/compress_expand.ll
  test/CodeGen/X86/cvtv2f32.ll
  test/CodeGen/X86/extract-concat.ll
  test/CodeGen/X86/extract-insert.ll
  test/CodeGen/X86/f16c-intrinsics.ll
  test/CodeGen/X86/fold-vector-sext-zext.ll
  test/CodeGen/X86/insertelement-shuffle.ll
  test/CodeGen/X86/known-bits.ll
  test/CodeGen/X86/known-signbits-vector.ll
  test/CodeGen/X86/lower-bitcast.ll
  test/CodeGen/X86/madd.ll
  test/CodeGen/X86/masked_gather_scatter.ll
  test/CodeGen/X86/masked_gather_scatter_widen.ll
  test/CodeGen/X86/masked_load.ll
  test/CodeGen/X86/masked_store.ll
  test/CodeGen/X86/mmx-arg-passing-x86-64.ll
  test/CodeGen/X86/mmx-arith.ll
  test/CodeGen/X86/mmx-cvt.ll
  test/CodeGen/X86/mulvi32.ll
  test/CodeGen/X86/oddshuffles.ll
  test/CodeGen/X86/paddus.ll
  test/CodeGen/X86/pmaddubsw.ll
  test/CodeGen/X86/pmovsx-inreg.ll
  test/CodeGen/X86/pmul.ll
  test/CodeGen/X86/pmulh.ll
  test/CodeGen/X86/pointer-vector.ll
  test/CodeGen/X86/pr14161.ll
  test/CodeGen/X86/pr35918.ll
  test/CodeGen/X86/promote-vec3.ll
  test/CodeGen/X86/promote.ll
  test/CodeGen/X86/psubus.ll
  test/CodeGen/X86/ret-mmx.ll
  test/CodeGen/X86/sad.ll
  test/CodeGen/X86/scalar_widen_div.ll
  test/CodeGen/X86/select.ll
  test/CodeGen/X86/shrink_vmul.ll
  test/CodeGen/X86/shuffle-strided-with-offset-128.ll
  test/CodeGen/X86/shuffle-strided-with-offset-256.ll
  test/CodeGen/X86/shuffle-strided-with-offset-512.ll
  test/CodeGen/X86/shuffle-vs-trunc-128.ll
  test/CodeGen/X86/shuffle-vs-trunc-256.ll
  test/CodeGen/X86/shuffle-vs-trunc-512.ll
  test/CodeGen/X86/slow-pmulld.ll
  test/CodeGen/X86/sse2-intrinsics-canonical.ll
  test/CodeGen/X86/sse2-vector-shifts.ll
  test/CodeGen/X86/test-shrink-bug.ll
  test/CodeGen/X86/trunc-ext-ld-st.ll
  test/CodeGen/X86/trunc-subvector.ll
  test/CodeGen/X86/unfold-masked-merge-vector-variablemask.ll
  test/CodeGen/X86/urem-seteq-vec-splat.ll
  test/CodeGen/X86/vec_cast.ll
  test/CodeGen/X86/vec_cast2.ll
  test/CodeGen/X86/vec_cast3.ll
  test/CodeGen/X86/vec_ctbits.ll
  test/CodeGen/X86/vec_extract-mmx.ll
  test/CodeGen/X86/vec_fp_to_int.ll
  test/CodeGen/X86/vec_insert-5.ll
  test/CodeGen/X86/vec_insert-7.ll
  test/CodeGen/X86/vec_insert-mmx.ll
  test/CodeGen/X86/vec_int_to_fp.ll
  test/CodeGen/X86/vec_zero_cse.ll
  test/CodeGen/X86/vector-blend.ll
  test/CodeGen/X86/vector-half-conversions.ll
  test/CodeGen/X86/vector-idiv-v2i32.ll
  test/CodeGen/X86/vector-sext.ll
  test/CodeGen/X86/vector-shift-ashr-sub128.ll
  test/CodeGen/X86/vector-shift-lshr-sub128.ll
  test/CodeGen/X86/vector-shift-shl-sub128.ll
  test/CodeGen/X86/vector-shuffle-128-v16.ll
  test/CodeGen/X86/vector-shuffle-combining.ll
  test/CodeGen/X86/vector-trunc-packus.ll
  test/CodeGen/X86/vector-trunc-ssat.ll
  test/CodeGen/X86/vector-trunc-usat.ll
  test/CodeGen/X86/vector-trunc.ll
  test/CodeGen/X86/vector-truncate-combine.ll
  test/CodeGen/X86/vector-zext.ll
  test/CodeGen/X86/vsel-cmp-load.ll
  test/CodeGen/X86/vselect-avx.ll
  test/CodeGen/X86/vselect.ll
  test/CodeGen/X86/vshift-4.ll
  test/CodeGen/X86/widen_arith-1.ll
  test/CodeGen/X86/widen_arith-2.ll
  test/CodeGen/X86/widen_arith-3.ll
  test/CodeGen/X86/widen_bitops-0.ll
  test/CodeGen/X86/widen_cast-1.ll
  test/CodeGen/X86/widen_cast-4.ll
  test/CodeGen/X86/widen_cast-5.ll
  test/CodeGen/X86/widen_cast-6.ll
  test/CodeGen/X86/widen_conv-1.ll
  test/CodeGen/X86/widen_conv-2.ll
  test/CodeGen/X86/widen_conv-3.ll
  test/CodeGen/X86/widen_conv-4.ll
  test/CodeGen/X86/widen_load-2.ll
  test/CodeGen/X86/widen_shuffle-1.ll
  test/CodeGen/X86/widened-broadcast.ll
  test/CodeGen/X86/x86-interleaved-access.ll
  test/CodeGen/X86/x86-shifts.ll
  test/Transforms/SLPVectorizer/X86/fptosi.ll
  test/Transforms/SLPVectorizer/X86/fptoui.ll
  test/Transforms/SLPVectorizer/X86/insert-element-build-vector.ll
  test/Transforms/SLPVectorizer/X86/sitofp.ll
  test/Transforms/SLPVectorizer/X86/uitofp.ll