[all-commits] [llvm/llvm-project] bdbbed: [X86][CostModel] Update costs for vector truncate ...

Mon Apr 27 12:01:04 PDT 2020

  Branch: refs/heads/master
  Home:   https://github.com/llvm/llvm-project
  Commit: bdbbed115f87fd2700bf10249c6a63625f59a809
      https://github.com/llvm/llvm-project/commit/bdbbed115f87fd2700bf10249c6a63625f59a809
  Author: Craig Topper <craig.topper at intel.com>
  Date:   2020-04-27 (Mon, 27 Apr 2020)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/arith-fix.ll
    M llvm/test/Analysis/CostModel/X86/arith-overflow.ll
    M llvm/test/Analysis/CostModel/X86/cast.ll
    M llvm/test/Analysis/CostModel/X86/min-legal-vector-width.ll
    M llvm/test/Analysis/CostModel/X86/trunc.ll

  Log Message:
  -----------
  [X86][CostModel] Update costs for vector truncate with avx512f/avx512bw.

All avx512 truncate instructions except vXi64->vXi32 are 2 uops
on port 5. So raise their costs to 2. Except when we have an
earlier faster sequence like pshufb for 128 bit input vectors.

Add a lower cost of 3 v16i16->v16i8 with avx512f where we can
extend to v16i32 then truncate. And a cost of 2 for avx512bw with
and without avx512vl. There we can use vpmovwb with either a ymm
or zmm input. Both of these beat masking, splitting, and using
packuswb which is our avx/avx2 codegen.