[all-commits] [llvm/llvm-project] 8a3c64: [X86][Costmodel] Load/store i8 Stride=3 VF=2 inter...

Roman Lebedev via All-commits all-commits at lists.llvm.org
Sat Oct 2 03:52:38 PDT 2021


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 8a3c64c3a2393af4058103e7555a20e22151ca5d
      https://github.com/llvm/llvm-project/commit/8a3c64c3a2393af4058103e7555a20e22151ca5d
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-02 (Sat, 02 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-3.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i8 Stride=3 VF=2 interleaving costs

While we already model this tuple, the values are divergent from reality, so fix them.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/WYscYMcW4 - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: <=1.5`
So pick cost of `3`.

For store we have:
https://godbolt.org/z/e9qvYdbbs - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110956


  Commit: f1df2d8eaf188eec2971b12e57c821a0db5f3a36
      https://github.com/llvm/llvm-project/commit/f1df2d8eaf188eec2971b12e57c821a0db5f3a36
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-02 (Sat, 02 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-3.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i8 Stride=3 VF=4 interleaving costs

While we already model this tuple, the values are divergent from reality, so fix them.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/obWz3PrfK - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: <=1.5`
So pick cost of `3`.

For store we have:
https://godbolt.org/z/orjPshn3h - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110958


  Commit: d1460c88a6d8739920f86383ff7d17be3dc517f6
      https://github.com/llvm/llvm-project/commit/d1460c88a6d8739920f86383ff7d17be3dc517f6
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-02 (Sat, 02 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-3.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i8 Stride=3 VF=8 interleaving costs

While we already model this tuple, the values are divergent from reality, so fix them.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/1jeocxj55 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `6`.

For store we have:
https://godbolt.org/z/fr7xfa3K5 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `6`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110960


  Commit: 448c939839992188000841bfaa6fcc6990e0fa2b
      https://github.com/llvm/llvm-project/commit/448c939839992188000841bfaa6fcc6990e0fa2b
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-02 (Sat, 02 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-3.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i8 Stride=3 VF=32 interleaving costs

For VF=16, costs are correct.
For VF=32, load cost is divergent.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/qKjevqf4W - for intels `Block RThroughput: <=14.0`; for ryzens, `Block RThroughput: <=4.5`
So pick cost of `14`.

For store we have:
https://godbolt.org/z/xTssTq319 - for intels `Block RThroughput: =13.0`; for ryzens, `Block RThroughput: <=5.5`
So pick cost of `13`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110961


  Commit: 935b9693aea14343aee3eced905590056c6579dc
      https://github.com/llvm/llvm-project/commit/935b9693aea14343aee3eced905590056c6579dc
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-02 (Sat, 02 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-4.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i8 Stride=4 VF=2 interleaving costs

While we already model this tuple, the values are divergent from reality, so fix them.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/KP6nn36zs - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

For store we have:
https://godbolt.org/z/ov95zhrq6 - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110966


  Commit: ae08362cb8e60864a0505af47189d6a996cfb5d9
      https://github.com/llvm/llvm-project/commit/ae08362cb8e60864a0505af47189d6a996cfb5d9
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-02 (Sat, 02 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-4.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i8 Stride=4 VF=4 interleaving costs

While we already model this tuple, the store cost is divergent from reality, so fix it.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/1n4bPh7Tn - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

For store we have:
https://godbolt.org/z/r8K9sveqo - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110968


  Commit: 74e4a0e327579bfc3b00f6af0c9fd408c5843e8b
      https://github.com/llvm/llvm-project/commit/74e4a0e327579bfc3b00f6af0c9fd408c5843e8b
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-02 (Sat, 02 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-4.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i8 Stride=4 VF=8 interleaving costs

While we already model this tuple, the values are divergent from reality, so fix them.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/v7746Wcf7 - for intels `Block RThroughput: =12.0`; for ryzens, `Block RThroughput: <=6.0`
So pick cost of `12`.

For store we have:
https://godbolt.org/z/aEeEohEbP - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110969


  Commit: 0e71ae6da8f3142f453267d4f1668b0d6d77bec5
      https://github.com/llvm/llvm-project/commit/0e71ae6da8f3142f453267d4f1668b0d6d77bec5
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-02 (Sat, 02 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-4.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i8 Stride=4 VF=16 interleaving costs

While we already model this tuple, the values are divergent from reality, so fix them.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/TrGW7cKsE - for intels `Block RThroughput: =24.0`; for ryzens, `Block RThroughput: <=12.0`
So pick cost of `24`.

For store we have:
https://godbolt.org/z/Mh7qaqEfe - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `8`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110970


  Commit: acb459574afc344bcb676737496f3fa35b1f04c1
      https://github.com/llvm/llvm-project/commit/acb459574afc344bcb676737496f3fa35b1f04c1
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-02 (Sat, 02 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-4.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i8 Stride=4 VF=32 interleaving costs

While we already model this tuple, the load cost is divergent from reality, so fix it.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/zWMhhnPYa - for intels `Block RThroughput: =56.0`; for ryzens, `Block RThroughput: <=24.0`
So pick cost of `56`.

For store we have:
https://godbolt.org/z/vnqqjWx51 - for intels `Block RThroughput: =12.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `12`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110971


Compare: https://github.com/llvm/llvm-project/compare/ac7031b2b2fa...acb459574afc


More information about the All-commits mailing list