[all-commits] [llvm/llvm-project] 05a4e4: Reland [X86][CostModel] X86TTIImpl::getMemoryOpCos...

Sat May 22 01:52:47 PDT 2021

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 05a4e4a89c6b6dc6e3edfb5efb9ddc950ae47469
      https://github.com/llvm/llvm-project/commit/05a4e4a89c6b6dc6e3edfb5efb9ddc950ae47469
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-05-22 (Sat, 22 May 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/load_store.ll

  Log Message:
  -----------
  Reland [X86][CostModel] X86TTIImpl::getMemoryOpCost(): rewrite vector handling again

Instead of handling power-of-two sized vector chunks,
try handling the large vector in a stream mode,
decreasing the operational vector size
once it no longer works for the elements left to process.

Notably, this improves costs for overaligned loads - loading padding is fine.
This more directly tracks when we need to insert/extract the YMM/XMM subvector,
some costs fluctuate because of that.

This was initially landed in c02476f3158f2908ef0a6f628210b5380bd33695,
but reverted in 5fddc3312bad7e62493f1605385fad5e589e6450,
because the code made some very optimistic assumptions about invariants
that didn't hold in practice.

Reviewed By: RKSimon, ABataev

Differential Revision: https://reviews.llvm.org/D100684

  Commit: 8ed0864fd76ded2646b33de8fc610519dd7f1eb5
      https://github.com/llvm/llvm-project/commit/8ed0864fd76ded2646b33de8fc610519dd7f1eb5
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-05-22 (Sat, 22 May 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i8.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i8.ll

  Log Message:
  -----------
  Reland [X86] X86TTIImpl::getInterleavedMemoryOpCostAVX2(): use getMemoryOpCost()

Now that getMemoryOpCost() correctly handles all the vector variants,
we should no longer hand-roll our own version of it, but use it directly.

The AVX512 variant probably needs a similar change,
but there it is less obvious.

This was initially landed in 69ed93a4355123a45c1d7216aea7cd53d07a361b,
but was reverted in 6b95fd199d96e3ba5c28a23b17b74203522bdaa8
because the patch it depends on was reverted.

Compare: https://github.com/llvm/llvm-project/compare/fd5cc418186a...8ed0864fd76d