[all-commits] [llvm/llvm-project] 4101c7: [X86][Costmodel] `getReplicationShuffleCost()`: im...

Wed Nov 10 11:54:11 PST 2021

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 4101c7bf197172ce48c574c683628a387c073fd8
      https://github.com/llvm/llvm-project/commit/4101c7bf197172ce48c574c683628a387c073fd8
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-11-10 (Wed, 10 Nov 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/lib/Target/X86/X86TargetTransformInfo.h
    M llvm/test/Analysis/CostModel/X86/shuffle-replication-i32.ll
    M llvm/test/Analysis/CostModel/X86/shuffle-replication-i64.ll

  Log Message:
  -----------
  [X86][Costmodel] `getReplicationShuffleCost()`: implement cost model for 32/64 bit-wide elements with AVX512F

This models lowering to `vpermd`/`vpermq`/`vpermps`/`vpermpd`,
that take a single input vector and a single index vector,
and are cross-lane. So far i haven't seen evidence that
replication ever results in demanding more than a single
input vector per output vector.

This results in *shockingly* lesser costs :)

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D113350

  Commit: c6e894b9b26897586322bb42fc36eddb65d8d503
      https://github.com/llvm/llvm-project/commit/c6e894b9b26897586322bb42fc36eddb65d8d503
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-11-10 (Wed, 10 Nov 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/shuffle-replication-i16.ll

  Log Message:
  -----------
  [X86][Costmodel] `getReplicationShuffleCost()`: implement cost model for 16 bit-wide elements with AVX512BW

BWI introduced VPERMW, so cost-model i16 replication shuffle using it.
Note that we can still model i16 replication shufflle without BWI,
by promoting to i32. That will be done in follow-ups.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D113478

  Commit: a70d74323e0423a9ec5df64468ea6afee3cbdcf3
      https://github.com/llvm/llvm-project/commit/a70d74323e0423a9ec5df64468ea6afee3cbdcf3
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-11-10 (Wed, 10 Nov 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/shuffle-replication-i8.ll

  Log Message:
  -----------
  [X86][Costmodel] `getReplicationShuffleCost()`: implement cost model for 8 bit-wide elements with AVX512VBMI

VBMI introduced VPERMB, so cost-model i8 replication shuffle using it.
Note that we can still model i8 replication shufflle without VBMI,
by promoting to i16/i32. That will be done in follow-ups.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D113479

Compare: https://github.com/llvm/llvm-project/compare/bef966eb376e...a70d74323e04