[all-commits] [llvm/llvm-project] 0c005b: [X86][SSE] getV4X86ShuffleImm8 - canonicalize broa...

Wed Jul 29 03:33:40 PDT 2020

  Branch: refs/heads/master
  Home:   https://github.com/llvm/llvm-project
  Commit: 0c005be6eb6bc21464d7c1f7d5f44eb1e5369749
      https://github.com/llvm/llvm-project/commit/0c005be6eb6bc21464d7c1f7d5f44eb1e5369749
  Author: Simon Pilgrim <llvm-dev at redking.me.uk>
  Date:   2020-07-29 (Wed, 29 Jul 2020)

  Changed paths:
    M llvm/lib/Target/X86/X86ISelLowering.cpp
    M llvm/test/CodeGen/X86/2011-05-09-loaduse.ll
    M llvm/test/CodeGen/X86/atomic-fp.ll
    M llvm/test/CodeGen/X86/atomic-non-integer.ll
    M llvm/test/CodeGen/X86/avg.ll
    M llvm/test/CodeGen/X86/avx-intrinsics-fast-isel.ll
    M llvm/test/CodeGen/X86/avx-splat.ll
    M llvm/test/CodeGen/X86/avx-vbroadcast.ll
    M llvm/test/CodeGen/X86/avx-vinsertf128.ll
    M llvm/test/CodeGen/X86/avx-vperm2x128.ll
    M llvm/test/CodeGen/X86/avx512-any_extend_load.ll
    M llvm/test/CodeGen/X86/avx512-cvt.ll
    M llvm/test/CodeGen/X86/avx512-hadd-hsub.ll
    M llvm/test/CodeGen/X86/avx512-intrinsics-fast-isel.ll
    M llvm/test/CodeGen/X86/avx512-shuffles/partial_permute.ll
    M llvm/test/CodeGen/X86/bitcast-int-to-vector-bool-sext.ll
    M llvm/test/CodeGen/X86/bitcast-int-to-vector-bool-zext.ll
    M llvm/test/CodeGen/X86/bitcast-int-to-vector-bool.ll
    M llvm/test/CodeGen/X86/bitcast-int-to-vector.ll
    M llvm/test/CodeGen/X86/bitcast-vector-bool.ll
    M llvm/test/CodeGen/X86/buildvec-extract.ll
    M llvm/test/CodeGen/X86/buildvec-insertvec.ll
    M llvm/test/CodeGen/X86/cast-vsel.ll
    M llvm/test/CodeGen/X86/combine-fcopysign.ll
    M llvm/test/CodeGen/X86/div-rem-pair-recomposition-signed.ll
    M llvm/test/CodeGen/X86/div-rem-pair-recomposition-unsigned.ll
    M llvm/test/CodeGen/X86/extract-fp.ll
    M llvm/test/CodeGen/X86/extract-store.ll
    M llvm/test/CodeGen/X86/extractelement-index.ll
    M llvm/test/CodeGen/X86/extractelement-load.ll
    M llvm/test/CodeGen/X86/fma.ll
    M llvm/test/CodeGen/X86/fp-intrinsics-fma.ll
    M llvm/test/CodeGen/X86/fp-round.ll
    M llvm/test/CodeGen/X86/fp-roundeven.ll
    M llvm/test/CodeGen/X86/gather-addresses.ll
    M llvm/test/CodeGen/X86/haddsub-2.ll
    M llvm/test/CodeGen/X86/haddsub-3.ll
    M llvm/test/CodeGen/X86/haddsub-undef.ll
    M llvm/test/CodeGen/X86/haddsub.ll
    M llvm/test/CodeGen/X86/half.ll
    M llvm/test/CodeGen/X86/horizontal-reduce-add.ll
    M llvm/test/CodeGen/X86/horizontal-reduce-fadd.ll
    M llvm/test/CodeGen/X86/horizontal-reduce-smax.ll
    M llvm/test/CodeGen/X86/horizontal-reduce-smin.ll
    M llvm/test/CodeGen/X86/horizontal-reduce-umax.ll
    M llvm/test/CodeGen/X86/horizontal-reduce-umin.ll
    M llvm/test/CodeGen/X86/insert-into-constant-vector.ll
    M llvm/test/CodeGen/X86/insert-loaded-scalar.ll
    M llvm/test/CodeGen/X86/insertelement-var-index.ll
    M llvm/test/CodeGen/X86/known-signbits-vector.ll
    M llvm/test/CodeGen/X86/load-partial.ll
    M llvm/test/CodeGen/X86/madd.ll
    M llvm/test/CodeGen/X86/masked_compressstore.ll
    M llvm/test/CodeGen/X86/masked_expandload.ll
    M llvm/test/CodeGen/X86/masked_load.ll
    M llvm/test/CodeGen/X86/masked_store.ll
    M llvm/test/CodeGen/X86/masked_store_trunc.ll
    M llvm/test/CodeGen/X86/masked_store_trunc_ssat.ll
    M llvm/test/CodeGen/X86/masked_store_trunc_usat.ll
    M llvm/test/CodeGen/X86/memset-nonzero.ll
    M llvm/test/CodeGen/X86/merge-consecutive-stores-nt.ll
    M llvm/test/CodeGen/X86/min-legal-vector-width.ll
    M llvm/test/CodeGen/X86/mmx-arith.ll
    M llvm/test/CodeGen/X86/nontemporal-2.ll
    M llvm/test/CodeGen/X86/oddshuffles.ll
    M llvm/test/CodeGen/X86/phaddsub-extract.ll
    M llvm/test/CodeGen/X86/phaddsub-undef.ll
    M llvm/test/CodeGen/X86/phaddsub.ll
    M llvm/test/CodeGen/X86/pmaddubsw.ll
    M llvm/test/CodeGen/X86/pmul.ll
    M llvm/test/CodeGen/X86/pmulh.ll
    M llvm/test/CodeGen/X86/pow.ll
    M llvm/test/CodeGen/X86/pr14161.ll
    M llvm/test/CodeGen/X86/pr29112.ll
    M llvm/test/CodeGen/X86/pr44976.ll
    M llvm/test/CodeGen/X86/pr46455.ll
    M llvm/test/CodeGen/X86/pr46527.ll
    M llvm/test/CodeGen/X86/prefer-avx256-mask-shuffle.ll
    M llvm/test/CodeGen/X86/psubus.ll
    M llvm/test/CodeGen/X86/sad.ll
    M llvm/test/CodeGen/X86/scalarize-fp.ll
    M llvm/test/CodeGen/X86/setcc-combine.ll
    M llvm/test/CodeGen/X86/shrink_vmul.ll
    M llvm/test/CodeGen/X86/shuffle-strided-with-offset-128.ll
    M llvm/test/CodeGen/X86/slow-pmulld.ll
    M llvm/test/CodeGen/X86/smul_fix_sat.ll
    M llvm/test/CodeGen/X86/split-vector-bitcast.ll
    M llvm/test/CodeGen/X86/split-vector-rem.ll
    M llvm/test/CodeGen/X86/sse1.ll
    M llvm/test/CodeGen/X86/sse2-intrinsics-fast-isel.ll
    M llvm/test/CodeGen/X86/sse3-avx-addsub-2.ll
    M llvm/test/CodeGen/X86/sse3.ll
    M llvm/test/CodeGen/X86/sse41.ll
    M llvm/test/CodeGen/X86/umul_fix_sat.ll
    M llvm/test/CodeGen/X86/var-permute-128.ll
    M llvm/test/CodeGen/X86/vec-libcalls.ll
    M llvm/test/CodeGen/X86/vec-strict-128.ll
    M llvm/test/CodeGen/X86/vec-strict-cmp-128.ll
    M llvm/test/CodeGen/X86/vec-strict-cmp-sub128.ll
    M llvm/test/CodeGen/X86/vec-strict-fptoint-128.ll
    M llvm/test/CodeGen/X86/vec-strict-fptoint-256.ll
    M llvm/test/CodeGen/X86/vec-strict-fptoint-512.ll
    M llvm/test/CodeGen/X86/vec-strict-inttofp-128.ll
    M llvm/test/CodeGen/X86/vec-strict-inttofp-256.ll
    M llvm/test/CodeGen/X86/vec3.ll
    M llvm/test/CodeGen/X86/vec_cast2.ll
    M llvm/test/CodeGen/X86/vec_extract-mmx.ll
    M llvm/test/CodeGen/X86/vec_extract.ll
    M llvm/test/CodeGen/X86/vec_fp_to_int.ll
    M llvm/test/CodeGen/X86/vec_int_to_fp.ll
    M llvm/test/CodeGen/X86/vec_saddo.ll
    M llvm/test/CodeGen/X86/vec_set-H.ll
    M llvm/test/CodeGen/X86/vec_shift7.ll
    M llvm/test/CodeGen/X86/vec_smulo.ll
    M llvm/test/CodeGen/X86/vec_ssubo.ll
    M llvm/test/CodeGen/X86/vec_uaddo.ll
    M llvm/test/CodeGen/X86/vec_umulo.ll
    M llvm/test/CodeGen/X86/vec_usubo.ll
    M llvm/test/CodeGen/X86/vector-constrained-fp-intrinsics.ll
    M llvm/test/CodeGen/X86/vector-extend-inreg.ll
    M llvm/test/CodeGen/X86/vector-fshl-128.ll
    M llvm/test/CodeGen/X86/vector-fshl-256.ll
    M llvm/test/CodeGen/X86/vector-fshl-rot-128.ll
    M llvm/test/CodeGen/X86/vector-fshl-rot-256.ll
    M llvm/test/CodeGen/X86/vector-fshr-128.ll
    M llvm/test/CodeGen/X86/vector-fshr-256.ll
    M llvm/test/CodeGen/X86/vector-fshr-rot-128.ll
    M llvm/test/CodeGen/X86/vector-fshr-rot-256.ll
    M llvm/test/CodeGen/X86/vector-idiv-v2i32.ll
    M llvm/test/CodeGen/X86/vector-narrow-binop.ll
    M llvm/test/CodeGen/X86/vector-reduce-add.ll
    M llvm/test/CodeGen/X86/vector-reduce-and-cmp.ll
    M llvm/test/CodeGen/X86/vector-reduce-and.ll
    M llvm/test/CodeGen/X86/vector-reduce-fadd-fast.ll
    M llvm/test/CodeGen/X86/vector-reduce-fadd.ll
    M llvm/test/CodeGen/X86/vector-reduce-fmax-nnan.ll
    M llvm/test/CodeGen/X86/vector-reduce-fmax.ll
    M llvm/test/CodeGen/X86/vector-reduce-fmin-nnan.ll
    M llvm/test/CodeGen/X86/vector-reduce-fmin.ll
    M llvm/test/CodeGen/X86/vector-reduce-fmul-fast.ll
    M llvm/test/CodeGen/X86/vector-reduce-fmul.ll
    M llvm/test/CodeGen/X86/vector-reduce-mul.ll
    M llvm/test/CodeGen/X86/vector-reduce-or.ll
    M llvm/test/CodeGen/X86/vector-reduce-smax.ll
    M llvm/test/CodeGen/X86/vector-reduce-smin.ll
    M llvm/test/CodeGen/X86/vector-reduce-umax.ll
    M llvm/test/CodeGen/X86/vector-reduce-umin.ll
    M llvm/test/CodeGen/X86/vector-reduce-xor.ll
    M llvm/test/CodeGen/X86/vector-rem.ll
    M llvm/test/CodeGen/X86/vector-rotate-128.ll
    M llvm/test/CodeGen/X86/vector-rotate-256.ll
    M llvm/test/CodeGen/X86/vector-sext.ll
    M llvm/test/CodeGen/X86/vector-shift-ashr-128.ll
    M llvm/test/CodeGen/X86/vector-shift-ashr-sub128.ll
    M llvm/test/CodeGen/X86/vector-shift-by-select-loop.ll
    M llvm/test/CodeGen/X86/vector-shift-lshr-128.ll
    M llvm/test/CodeGen/X86/vector-shift-lshr-sub128.ll
    M llvm/test/CodeGen/X86/vector-shift-shl-128.ll
    M llvm/test/CodeGen/X86/vector-shift-shl-sub128.ll
    M llvm/test/CodeGen/X86/vector-shuffle-128-v16.ll
    M llvm/test/CodeGen/X86/vector-shuffle-128-v4.ll
    M llvm/test/CodeGen/X86/vector-shuffle-128-v8.ll
    M llvm/test/CodeGen/X86/vector-shuffle-256-v16.ll
    M llvm/test/CodeGen/X86/vector-shuffle-256-v32.ll
    M llvm/test/CodeGen/X86/vector-shuffle-256-v4.ll
    M llvm/test/CodeGen/X86/vector-shuffle-256-v8.ll
    M llvm/test/CodeGen/X86/vector-shuffle-512-v16.ll
    M llvm/test/CodeGen/X86/vector-shuffle-512-v64.ll
    M llvm/test/CodeGen/X86/vector-shuffle-combining-avx2.ll
    M llvm/test/CodeGen/X86/vector-shuffle-combining.ll
    M llvm/test/CodeGen/X86/vector-shuffle-mmx.ll
    M llvm/test/CodeGen/X86/vector-shuffle-sse1.ll
    M llvm/test/CodeGen/X86/vector-shuffle-v1.ll
    M llvm/test/CodeGen/X86/vector-trunc.ll
    M llvm/test/CodeGen/X86/vector-zext.ll
    M llvm/test/CodeGen/X86/vsel-cmp-load.ll
    M llvm/test/CodeGen/X86/vselect.ll
    M llvm/test/CodeGen/X86/vshift-4.ll
    M llvm/test/CodeGen/X86/widen_conv-3.ll
    M llvm/test/CodeGen/X86/widen_conv-4.ll
    M llvm/test/CodeGen/X86/widened-broadcast.ll

  Log Message:
  -----------
  [X86][SSE] getV4X86ShuffleImm8 - canonicalize broadcast masks

If the mask input to getV4X86ShuffleImm8 only refers to a single source element (+ undefs) then canonicalize to a full broadcast.

getV4X86ShuffleImm8 defaults to inline values for undefs, which can be useful for shuffle widening/narrowing but does leave SimplifyDemanded* calls thinking the shuffle depends on unnecessary elements.

I'm still investigating what we should do more generally to avoid these undemanded elements, but broadcast cases was a simpler win.