[PATCH] D76212: [X86] Make v32i16/v64i8 legal types without avx512bw. Use custom splitting instead.

Sun Mar 15 23:08:51 PDT 2020

craig.topper created this revision.
craig.topper added reviewers: RKSimon, spatel.
Herald added a subscriber: hiraditya.
Herald added a project: LLVM.
craig.topper marked an inline comment as done.
craig.topper added inline comments.

================
Comment at: llvm/lib/Target/X86/X86ISelDAGToDAG.cpp:849
     switch (N->getOpcode()) {
+    case X86ISD::VBROADCAST: {
+      MVT VT = N->getSimpleValueType(0);
----------------
This is a bit of a hack, but it was easier than trying to hunt down all the places that can create broadcasts in lowering. I couldn't do this with a isel pattern for the broadcast_load case. So I just handled both here.

This moves v32i16/v64i8 to a model more consistent with how we
treat integer types with avx1.

This does change the ABI for types vXi16/vXi8 vectors larger than
512 bits to pass in multiple zmms instead of multiple ymms. We'd
already hacked some code to make v64i8/v32i16 pass in zmm.

Cost model is still a bit of a mess. In some place I tried to
match existing behavior. But really we need to account for
splitting and concating costs. Cost model for shuffles is
especially pessimistic. This has an big effect on reductions since
the generic lowering uses PermuteSingleSrc. But reduction uses a
very specific pattern that can handled by subvector extracts and
shifts, but the default handling doesn't know that.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D76212

Files:
  llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/lib/Target/X86/X86TargetTransformInfo.cpp
  llvm/test/Analysis/CostModel/X86/arith-fix.ll
  llvm/test/Analysis/CostModel/X86/arith-overflow.ll
  llvm/test/Analysis/CostModel/X86/arith.ll
  llvm/test/Analysis/CostModel/X86/fshl.ll
  llvm/test/Analysis/CostModel/X86/fshr.ll
  llvm/test/Analysis/CostModel/X86/icmp.ll
  llvm/test/Analysis/CostModel/X86/masked-intrinsic-cost.ll
  llvm/test/Analysis/CostModel/X86/reduce-add.ll
  llvm/test/Analysis/CostModel/X86/reduce-and.ll
  llvm/test/Analysis/CostModel/X86/reduce-mul.ll
  llvm/test/Analysis/CostModel/X86/reduce-or.ll
  llvm/test/Analysis/CostModel/X86/reduce-smax.ll
  llvm/test/Analysis/CostModel/X86/reduce-smin.ll
  llvm/test/Analysis/CostModel/X86/reduce-umax.ll
  llvm/test/Analysis/CostModel/X86/reduce-umin.ll
  llvm/test/Analysis/CostModel/X86/reduce-xor.ll
  llvm/test/Analysis/CostModel/X86/rem.ll
  llvm/test/Analysis/CostModel/X86/shuffle-extract_subvector.ll
  llvm/test/Analysis/CostModel/X86/shuffle-reverse.ll
  llvm/test/Analysis/CostModel/X86/shuffle-two-src.ll
  llvm/test/Analysis/CostModel/X86/trunc.ll
  llvm/test/Analysis/CostModel/X86/vector-extract.ll
  llvm/test/Analysis/CostModel/X86/vector-insert.ll
  llvm/test/CodeGen/X86/avg-mask.ll
  llvm/test/CodeGen/X86/avg.ll
  llvm/test/CodeGen/X86/avx512-calling-conv.ll
  llvm/test/CodeGen/X86/avx512-ext.ll
  llvm/test/CodeGen/X86/avx512-insert-extract.ll
  llvm/test/CodeGen/X86/avx512-logic.ll
  llvm/test/CodeGen/X86/avx512-mask-op.ll
  llvm/test/CodeGen/X86/avx512-select.ll
  llvm/test/CodeGen/X86/avx512-trunc.ll
  llvm/test/CodeGen/X86/avx512-vbroadcasti128.ll
  llvm/test/CodeGen/X86/avx512-vbroadcasti256.ll
  llvm/test/CodeGen/X86/avx512-vec-cmp.ll
  llvm/test/CodeGen/X86/avx512-vselect.ll
  llvm/test/CodeGen/X86/avx512vl-vec-masked-cmp.ll
  llvm/test/CodeGen/X86/bitcast-and-setcc-512.ll
  llvm/test/CodeGen/X86/bitcast-int-to-vector-bool-zext.ll
  llvm/test/CodeGen/X86/bitcast-setcc-512.ll
  llvm/test/CodeGen/X86/fast-isel-nontemporal.ll
  llvm/test/CodeGen/X86/kshift.ll
  llvm/test/CodeGen/X86/madd.ll
  llvm/test/CodeGen/X86/masked_store_trunc.ll
  llvm/test/CodeGen/X86/masked_store_trunc_usat.ll
  llvm/test/CodeGen/X86/merge-consecutive-loads-512.ll
  llvm/test/CodeGen/X86/midpoint-int-vec-512.ll
  llvm/test/CodeGen/X86/movmsk-cmp.ll
  llvm/test/CodeGen/X86/nontemporal-loads-2.ll
  llvm/test/CodeGen/X86/nontemporal-loads.ll
  llvm/test/CodeGen/X86/pmaddubsw.ll
  llvm/test/CodeGen/X86/pmul.ll
  llvm/test/CodeGen/X86/pmulh.ll
  llvm/test/CodeGen/X86/var-permute-512.ll
  llvm/test/CodeGen/X86/vector-compare-results.ll
  llvm/test/CodeGen/X86/vector-fshl-512.ll
  llvm/test/CodeGen/X86/vector-fshl-rot-512.ll
  llvm/test/CodeGen/X86/vector-fshr-512.ll
  llvm/test/CodeGen/X86/vector-fshr-rot-512.ll
  llvm/test/CodeGen/X86/vector-idiv-sdiv-512.ll
  llvm/test/CodeGen/X86/vector-idiv-udiv-512.ll
  llvm/test/CodeGen/X86/vector-popcnt-512.ll
  llvm/test/CodeGen/X86/vector-reduce-and-bool.ll
  llvm/test/CodeGen/X86/vector-reduce-mul.ll
  llvm/test/CodeGen/X86/vector-reduce-or-bool.ll
  llvm/test/CodeGen/X86/vector-reduce-xor-bool.ll
  llvm/test/CodeGen/X86/vector-rotate-512.ll
  llvm/test/CodeGen/X86/vector-sext.ll
  llvm/test/CodeGen/X86/vector-shift-ashr-512.ll
  llvm/test/CodeGen/X86/vector-shift-lshr-512.ll
  llvm/test/CodeGen/X86/vector-shift-shl-512.ll
  llvm/test/CodeGen/X86/vector-shuffle-512-v32.ll
  llvm/test/CodeGen/X86/vector-shuffle-512-v64.ll
  llvm/test/CodeGen/X86/vector-shuffle-v1.ll
  llvm/test/CodeGen/X86/vector-tzcnt-512.ll
  llvm/test/CodeGen/X86/vector-zext.ll
  llvm/test/CodeGen/X86/viabs.ll