[all-commits] [llvm/llvm-project] cff668: [X86] Lower the cost of v4i64->v4i32 and v8i64->v8...

Wed Apr 29 13:23:18 PDT 2020

  Branch: refs/heads/master
  Home:   https://github.com/llvm/llvm-project
  Commit: cff66865322e9e990808eb5a7ed7cdacefb699d7
      https://github.com/llvm/llvm-project/commit/cff66865322e9e990808eb5a7ed7cdacefb699d7
  Author: Craig Topper <craig.topper at intel.com>
  Date:   2020-04-29 (Wed, 29 Apr 2020)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/arith-fix.ll
    M llvm/test/Analysis/CostModel/X86/arith-overflow.ll
    M llvm/test/Analysis/CostModel/X86/cast.ll
    M llvm/test/Analysis/CostModel/X86/min-legal-vector-width.ll
    M llvm/test/Analysis/CostModel/X86/trunc.ll

  Log Message:
  -----------
  [X86] Lower the cost of v4i64->v4i32 and v8i64->v8i32 truncate with AVX

We generate much better code these days than we used to. And we use the same sequence for AVX1 and AVX2 for these

For v4i64->v4i32 we generate:
vextractf128    xmm1, ymm0, 1
vshufps xmm0, xmm0, xmm1, 136   # xmm0 = xmm0[0,2],xmm1[0,2]

And for v8i64->v8i32 we generate:
vperm2f128      ymm2, ymm0, ymm1, 49 # ymm2 = ymm0[2,3],ymm1[2,3]
vinsertf128     ymm0, ymm0, xmm1, 1
vshufps ymm0, ymm0, ymm2, 136   # ymm0 = ymm0[0,2],ymm2[0,2],ymm0[4,6],ymm2[4,6]

Differential Revision: https://reviews.llvm.org/D79109