[all-commits] [llvm/llvm-project] 45caac: [X86][Costmodel] Load/store i16 Stride=4 VF=2 inte...

Roman Lebedev via All-commits all-commits at lists.llvm.org
Mon Sep 27 12:20:30 PDT 2021


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 45caac91c4e0caf64ec933f35c4a2d86a3fa31e3
      https://github.com/llvm/llvm-project/commit/45caac91c4e0caf64ec933f35c4a2d86a3fa31e3
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-09-27 (Mon, 27 Sep 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-4.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i16 Stride=4 VF=2 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/5EYc6r9nh - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `6`.

For store we have:
https://godbolt.org/z/z61e5d6GE - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=1.0`
So pick cost of `2`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110536


  Commit: df2b42d12e4b4ff18bec8460c3d6ede6b411c048
      https://github.com/llvm/llvm-project/commit/df2b42d12e4b4ff18bec8460c3d6ede6b411c048
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-09-27 (Mon, 27 Sep 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-4.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i16 Stride=4 VF=4 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/rnsf639Wh - for intels `Block RThroughput: =17.0`; for ryzens, `Block RThroughput: <=7.5`
So pick cost of `17`.

For store we have:
https://godbolt.org/z/565KKrcY6 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: =2.0`
So pick cost of `6`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110537


  Commit: 5615d6a6dd3f904cc9e1a219bfaf7df8183ee765
      https://github.com/llvm/llvm-project/commit/5615d6a6dd3f904cc9e1a219bfaf7df8183ee765
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-09-27 (Mon, 27 Sep 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-4.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i16 Stride=4 VF=8 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/dd8T5P471 - for intels `Block RThroughput: =33.0`; for ryzens, `Block RThroughput: <=14.5`
So pick cost of `33`.

For store we have:
https://godbolt.org/z/zPxcKWhn4 - for intels `Block RThroughput: =10.0`; for ryzens, `Block RThroughput: <=6.0`
So pick cost of `10`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110541


  Commit: ee5a050e2e548991f0369fa7ee29fb3e7aade071
      https://github.com/llvm/llvm-project/commit/ee5a050e2e548991f0369fa7ee29fb3e7aade071
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-09-27 (Mon, 27 Sep 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-4.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i16 Stride=4 VF=16 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/Wd9cKab83 - for intels `Block RThroughput: =75.0`; for ryzens, `Block RThroughput: <=29.5`
So pick cost of `75`. (note that `# 32-byte Reload` does not affect throughput there.)

For store we have:
https://godbolt.org/z/Wd9cKab83 - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=12.0`
So pick cost of `32`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110543


  Commit: 2a7a768dad3a77571fae8506d84078fe4ce3d105
      https://github.com/llvm/llvm-project/commit/2a7a768dad3a77571fae8506d84078fe4ce3d105
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-09-27 (Mon, 27 Sep 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-4.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i16 Stride=4 VF=32 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For this tuple, measuring becomes problematic since there's a lot of spilling going on,
but apparently all these memory ops do not affect worst-case estimate at all here.

For load we have:
https://godbolt.org/z/zP4hd8MT6 - for intels `Block RThroughput: =150.0`; for ryzens, `Block RThroughput: <=59`
So pick cost of `150`.

For store we have:
https://godbolt.org/z/vKb8zTK8E - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=24.0`
So pick cost of `64`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110548


Compare: https://github.com/llvm/llvm-project/compare/18cf5b220d3f...2a7a768dad3a


More information about the All-commits mailing list