[all-commits] [llvm/llvm-project] 396b95: [X86][Costmodel] Load/store i8 Stride=6 VF=2 inter...

Roman Lebedev via All-commits all-commits at lists.llvm.org
Sun Oct 3 13:43:01 PDT 2021


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 396b95e5c9ede161b3634f7c8046188b7da8f387
      https://github.com/llvm/llvm-project/commit/396b95e5c9ede161b3634f7c8046188b7da8f387
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-03 (Sun, 03 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-6.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-6.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i8 Stride=6 VF=2 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/jvj6jzns5 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `6`.

For store we have:
https://godbolt.org/z/ros7eebMP - for intels `Block RThroughput: =7.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `7`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111008


  Commit: 6fe4cce55816863bbb2ca9628d103dfa2d431616
      https://github.com/llvm/llvm-project/commit/6fe4cce55816863bbb2ca9628d103dfa2d431616
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-03 (Sun, 03 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-6.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-6.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i8 Stride=6 VF=4 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/4sWhs396o - for intels `Block RThroughput: =14.0`; for ryzens, `Block RThroughput: <=7.0`
So pick cost of `14`.

For store we have:
https://godbolt.org/z/4sWhs396o - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `9`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111010


  Commit: 0b27f9c0886fcd052b4b0194c6d41376787213d4
      https://github.com/llvm/llvm-project/commit/0b27f9c0886fcd052b4b0194c6d41376787213d4
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-03 (Sun, 03 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-6.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-6.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i8 Stride=6 VF=8 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/v98qPTTf6 - for intels `Block RThroughput: =18.0`; for ryzens, `Block RThroughput: =6.0`
So pick cost of `18`.

For store we have:
https://godbolt.org/z/rn5T9E8q6 - for intels `Block RThroughput: <=16.0`; for ryzens, `Block RThroughput: <=4.5`
So pick cost of `16`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111011


  Commit: bd5ba437fd8fb42d068876c5d070c7a72ca17643
      https://github.com/llvm/llvm-project/commit/bd5ba437fd8fb42d068876c5d070c7a72ca17643
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-03 (Sun, 03 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-6.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-6.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i8 Stride=6 VF=16 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/Gz8hhqfTM - for intels `Block RThroughput: <=43.0`; for ryzens, `Block RThroughput: <=14.0`
So pick cost of `43`.

For store we have:
https://godbolt.org/z/9vrdssYa8 - for intels `Block RThroughput: <=27.0`; for ryzens, `Block RThroughput: <=12.0`
So pick cost of `27`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111012


  Commit: a5e5883ef515abe6fc5e8565f11b1c49bb33c2e3
      https://github.com/llvm/llvm-project/commit/a5e5883ef515abe6fc5e8565f11b1c49bb33c2e3
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-03 (Sun, 03 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i8-stride-6.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i8-stride-6.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i8 Stride=6 VF=32 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/c1jjKqP7b - for intels `Block RThroughput: <=82.0`; for ryzens, `Block RThroughput: <=26.0`
So pick cost of `82`.

For store we have:
https://godbolt.org/z/YM4ErY8x7 - for intels `Block RThroughput: <=90.0`; for ryzens, `Block RThroughput: <=25.5`
So pick cost of `90`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111013


  Commit: 8e8fb77aa40c287067306df7ff2416122b31e33b
      https://github.com/llvm/llvm-project/commit/8e8fb77aa40c287067306df7ff2416122b31e33b
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-03 (Sun, 03 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-3.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i16 Stride=3 VF=2 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/xnE988aej - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=2.5`
So pick cost of `5`.

For store we have:
https://godbolt.org/z/rMGT31Tnh - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111014


  Commit: 04f1469cb4caeedaabc3ab0f9ae00a8576f774eb
      https://github.com/llvm/llvm-project/commit/04f1469cb4caeedaabc3ab0f9ae00a8576f774eb
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-03 (Sun, 03 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-3.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i16 Stride=3 VF=4 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/sP4j1173f - for intels `Block RThroughput: =7.0`; for ryzens, `Block RThroughput: <=3.0`
So pick cost of `7`.

For store we have:
https://godbolt.org/z/sP4j1173f - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `6`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111015


  Commit: 72f8a9244a64387d83a313607f94509cd2fd5fd2
      https://github.com/llvm/llvm-project/commit/72f8a9244a64387d83a313607f94509cd2fd5fd2
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-03 (Sun, 03 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-3.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i16 Stride=3 VF=8 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/Mh9MnnT8W - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=2.3`
So pick cost of `9`.

For store we have:
https://godbolt.org/z/Mh9MnnT8W - for intels `Block RThroughput: <=12.0`; for ryzens, `Block RThroughput: <=3.3`
So pick cost of `12`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111016


  Commit: 3cbc0a07f92b4a630a1c03a6587d52f206ec8248
      https://github.com/llvm/llvm-project/commit/3cbc0a07f92b4a630a1c03a6587d52f206ec8248
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-03 (Sun, 03 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-3.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i16 Stride=3 VF=16 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/1T6MMzeh3 - for intels `Block RThroughput: =28.0`; for ryzens, `Block RThroughput: <=8.5`
So pick cost of `28`.

For store we have:
https://godbolt.org/z/1T6MMzeh3 - for intels `Block RThroughput: <=27.0`; for ryzens, `Block RThroughput: <=7.0`
So pick cost of `27`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111017


  Commit: 67f1ee2e38e83af34b58e3873bd4ba6dec7f5c50
      https://github.com/llvm/llvm-project/commit/67f1ee2e38e83af34b58e3873bd4ba6dec7f5c50
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-03 (Sun, 03 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-3.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i16 Stride=3 VF=32 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/rMaYr67hz - for intels `Block RThroughput: =56.0`; for ryzens, `Block RThroughput: <=17.8`
So pick cost of `56`.

For store we have:
https://godbolt.org/z/eMsbKqnvv - for intels `Block RThroughput: <=54.0`; for ryzens, `Block RThroughput: <=15.0`
So pick cost of `54`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111018


Compare: https://github.com/llvm/llvm-project/compare/a944f801cacd...67f1ee2e38e8


More information about the All-commits mailing list