[all-commits] [llvm/llvm-project] 8cd782: [X86][LoopVectorize] "Fix" `X86TTIImpl::getAddress...

Mon Nov 29 23:48:19 PST 2021

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 8cd782487fe68082e57d24a576b77f529d77f96c
      https://github.com/llvm/llvm-project/commit/8cd782487fe68082e57d24a576b77f529d77f96c
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-11-30 (Tue, 30 Nov 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
    M llvm/test/Analysis/CostModel/X86/gather-i16-with-i8-index.ll
    M llvm/test/Analysis/CostModel/X86/gather-i32-with-i8-index.ll
    M llvm/test/Analysis/CostModel/X86/gather-i64-with-i8-index.ll
    M llvm/test/Analysis/CostModel/X86/gather-i8-with-i8-index.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-5.ll
    M llvm/test/Analysis/CostModel/X86/masked-interleaved-load-i16.ll
    M llvm/test/Analysis/CostModel/X86/masked-interleaved-store-i16.ll
    M llvm/test/Analysis/CostModel/X86/masked-scatter-i32-with-i8-index.ll
    M llvm/test/Analysis/CostModel/X86/masked-scatter-i64-with-i8-index.ll
    M llvm/test/Analysis/CostModel/X86/scatter-i16-with-i8-index.ll
    M llvm/test/Analysis/CostModel/X86/scatter-i32-with-i8-index.ll
    M llvm/test/Analysis/CostModel/X86/scatter-i64-with-i8-index.ll
    M llvm/test/Analysis/CostModel/X86/scatter-i8-with-i8-index.ll
    M llvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll
    M llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-store-accesses-with-gaps.ll

  Log Message:
  -----------
  [X86][LoopVectorize] "Fix" `X86TTIImpl::getAddressComputationCost()`

We ask `TTI.getAddressComputationCost()` about the cost of computing vector address,
and then multiply it by the vector width. This doesn't make any sense,
it implies that we'd do a vector GEP and then scalarize the vector of pointers,
but there is no such thing in the vectorized IR, we perform scalar GEP's.

This is *especially* bad on X86, and was effectively prohibiting any scalarized
vectorization of gathers/scatters, because `X86TTIImpl::getAddressComputationCost()`
says that cost of vector address computation is `10` as compared to `1` for scalar.

The computed costs are similar to the ones with D111222+D111220,
but we end up without masked memory intrinsics that we'd then have to
expand later on, without much luck. (D111363)

Differential Revision: https://reviews.llvm.org/D111460