[all-commits] [llvm/llvm-project] 3e93fc: [X86][Costmodel] Load/store i32/f32 Stride=3 VF=2 ...

Roman Lebedev via All-commits all-commits at lists.llvm.org
Mon Oct 4 04:41:16 PDT 2021


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 3e93fcdfc893b1cd365126876b81b32b54446d9f
      https://github.com/llvm/llvm-project/commit/3e93fcdfc893b1cd365126876b81b32b54446d9f
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-04 (Mon, 04 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-3.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i32/f32 Stride=3 VF=2 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/z8qa14bs3 - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: =1.5`
So pick cost of `3`.

For store we have:
https://godbolt.org/z/GYGajoc4K - for intels `Block RThroughput: <=4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111019


  Commit: a93411c3afc76a4dd4436829d07eb65f61c2188e
      https://github.com/llvm/llvm-project/commit/a93411c3afc76a4dd4436829d07eb65f61c2188e
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-04 (Mon, 04 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-3.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i32/f32 Stride=3 VF=4 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/d8PdhEszo - for intels `Block RThroughput: =3.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `3`.

For store we have:
https://godbolt.org/z/WojonfG5n - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `5`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111020


  Commit: 198aa84973e6d5f9cdc7b241c4dc9880d63a5b5c
      https://github.com/llvm/llvm-project/commit/198aa84973e6d5f9cdc7b241c4dc9880d63a5b5c
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-04 (Mon, 04 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-float.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-3.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i32/f32 Stride=3 VF=8 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/zdz5Ga6fs - for intels `Block RThroughput: =7.0`; for ryzens, `Block RThroughput: <=6.0`
So pick cost of `7`.

For store we have:
https://godbolt.org/z/qn71513ac - for intels `Block RThroughput: =11.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `11`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111021


  Commit: 4ca5bc07af0685fbbe04e8731b4ab37354368c84
      https://github.com/llvm/llvm-project/commit/4ca5bc07af0685fbbe04e8731b4ab37354368c84
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-04 (Mon, 04 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-3.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i32/f32 Stride=3 VF=16 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/5fqrh4qqo - for intels `Block RThroughput: =14.0`; for ryzens, `Block RThroughput: <=12.0`
So pick cost of `14`.

For store we have:
https://godbolt.org/z/5fqrh4qqo - for intels `Block RThroughput: =22.0`; for ryzens, `Block RThroughput: <=16.0`
So pick cost of `22`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111022


  Commit: d3bbe781ea8e6e968ad4be2eb3aa5eedb168a4a8
      https://github.com/llvm/llvm-project/commit/d3bbe781ea8e6e968ad4be2eb3aa5eedb168a4a8
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-04 (Mon, 04 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f64-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f64-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-3.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i64/f64 Stride=3 VF=2 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/sz5qdKnr4 - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: <=1.0`
So pick cost of `1`.

For store we have:
https://godbolt.org/z/Kzdjff63v - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111025


  Commit: eb9a694c1744f6a1608faf7daa79244bd1e45248
      https://github.com/llvm/llvm-project/commit/eb9a694c1744f6a1608faf7daa79244bd1e45248
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-04 (Mon, 04 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f64-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f64-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-3.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i64/f64 Stride=3 VF=4 interleaving costs

This one required quite a bit of assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/Tce3osvcz - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `5`.

For store we have:
https://godbolt.org/z/oc3arEcnE - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `6`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111026


  Commit: ede0611e792c90acbe528ca7895377195a1bbadf
      https://github.com/llvm/llvm-project/commit/ede0611e792c90acbe528ca7895377195a1bbadf
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-04 (Mon, 04 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f64-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f64-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-3.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i64/f64 Stride=3 VF=8 interleaving costs

This one required quite a bit of assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/oYWv4cTnK - for intels `Block RThroughput: =10.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `10`.

For store we have:
https://godbolt.org/z/33GMhrsG9 - for intels `Block RThroughput: =12.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `12`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111027


  Commit: cef0a693b6373764dc5483ef3b4523e68a812972
      https://github.com/llvm/llvm-project/commit/cef0a693b6373764dc5483ef3b4523e68a812972
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-04 (Mon, 04 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f64-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f64-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-3.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i64/f64 Stride=3 VF=16 interleaving costs

This required huge amount of assembly surgery, but i think this is about right.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/z11crMEcj - for intels `Block RThroughput: =20.0`; for ryzens, `Block RThroughput: <=18.0`
So could pick cost of `25`.

For store we have:
https://godbolt.org/z/eqT4ze3j4 - for intels `Block RThroughput: =24.0`; for ryzens, `Block RThroughput: <=16.0`
So we could pick cost of `24`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111031


Compare: https://github.com/llvm/llvm-project/compare/4fc2f4979cf5...cef0a693b637


More information about the All-commits mailing list