[all-commits] [llvm/llvm-project] b6234c: [X86][Costmodel] Load/store i32/f32 Stride=4 VF=2 ...

Roman Lebedev via All-commits all-commits at lists.llvm.org
Tue Oct 5 06:59:57 PDT 2021


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: b6234c1edffc8286815c61887eb02fd6ddab0090
      https://github.com/llvm/llvm-project/commit/b6234c1edffc8286815c61887eb02fd6ddab0090
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-05 (Tue, 05 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-4.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i32/f32 Stride=4 VF=2 interleaving costs

Finally, we are getting to the heavy-hitter stuff!

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/7crGWoar6 - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So could pick cost of `4`.

For store we have:
https://godbolt.org/z/T8aq3MszM - for intels `Block RThroughput: =5.0`; for ryzens, `Block RThroughput: <=2.0`
So we could pick cost of `5`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111060


  Commit: 3c2e22b795485df28ca898bd3a58b6478c1e903d
      https://github.com/llvm/llvm-project/commit/3c2e22b795485df28ca898bd3a58b6478c1e903d
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-05 (Tue, 05 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-4.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i32/f32 Stride=4 VF=4 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/avq1oz98W - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: =4.0`
So could pick cost of `8`.

For store we have:
https://godbolt.org/z/89PGMc1qs - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=6.0`
So we could pick cost of `6`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111061


  Commit: 4aee1e5b93e79ffd350485b866d4c6c982aab15f
      https://github.com/llvm/llvm-project/commit/4aee1e5b93e79ffd350485b866d4c6c982aab15f
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-05 (Tue, 05 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-4.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i32/f32 Stride=4 VF=8 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/a6rxMG6ec - for intels `Block RThroughput: =16.0`; for ryzens, `Block RThroughput: <=12.0`
So could pick cost of `16`.

For store we have:
https://godbolt.org/z/ced1bdqc9 - for intels `Block RThroughput: =16.0`; for ryzens, `Block RThroughput: <=8.0`
So we could pick cost of `16`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111063


  Commit: 7d91037fd2f71f1253bd8751a887eb4b6ed7d2ec
      https://github.com/llvm/llvm-project/commit/7d91037fd2f71f1253bd8751a887eb4b6ed7d2ec
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-05 (Tue, 05 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-4.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i32/f32 Stride=4 VF=16 interleaving costs

This one required quite a bit of assembly surgery, but the trend continues, so i think this is right.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/EKWdj8cKT - for intels `Block RThroughput: <=32.0`; for ryzens, `Block RThroughput: <=24.0`
So could pick cost of `32`.

For store we have:
https://godbolt.org/z/zj4bb9P75 - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=16.0`
So we could pick cost of `32`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111064


  Commit: dcc2b0d9336c6d377cab4e2bcc7278a44123263d
      https://github.com/llvm/llvm-project/commit/dcc2b0d9336c6d377cab4e2bcc7278a44123263d
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-05 (Tue, 05 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f64-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f64-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-4.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i64/f64 Stride=4 VF=2 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/z197317d1 - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: =2.0`
So could pick cost of `6`.

For store we have:
https://godbolt.org/z/8dzszjf9q - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=4.0`
So we could pick cost of `6`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111073


  Commit: 000ce0bfd52bbfe48732f378f5a67f307424552b
      https://github.com/llvm/llvm-project/commit/000ce0bfd52bbfe48732f378f5a67f307424552b
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-05 (Tue, 05 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f64-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f64-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-4.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i64/f64 Stride=4 VF=4 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/MTKdzjvnr - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0`
So could pick cost of `8`.

For store we have:
https://godbolt.org/z/cMYEvqoah - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0`
So we could pick cost of `8`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111075


  Commit: c800119c46fb266b7fc75409fd9cbbb1a6d8f72a
      https://github.com/llvm/llvm-project/commit/c800119c46fb266b7fc75409fd9cbbb1a6d8f72a
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-05 (Tue, 05 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f64-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f64-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-4.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i64/f64 Stride=4 VF=8 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/3M3hbq7n8 - for intels `Block RThroughput: =20.0`; for ryzens, `Block RThroughput: =8.0`
So could pick cost of `20`.

For store we have:
https://godbolt.org/z/zvnPYWTx7 - for intels `Block RThroughput: =20.0`; for ryzens, `Block RThroughput: =8.0`
So we could pick cost of `20`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111076


  Commit: 764fd5f463e4a2d13e77751e0da1c623d2781d4b
      https://github.com/llvm/llvm-project/commit/764fd5f463e4a2d13e77751e0da1c623d2781d4b
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-05 (Tue, 05 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-6.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-6.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-6.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-6.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i32/f32 Stride=6 VF=2 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/aec96Thee - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: <=3.3`
So could pick cost of `6`.

For store we have:
https://godbolt.org/z/aec96Thee - for intels `Block RThroughput: =9.0`; for ryzens, `Block RThroughput: <=3.0`
So we could pick cost of `9`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111083


  Commit: d51532d8aad529fcefeedd686f0f1d2d967661f5
      https://github.com/llvm/llvm-project/commit/d51532d8aad529fcefeedd686f0f1d2d967661f5
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-05 (Tue, 05 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-6.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-6.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-6.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-6.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i32/f32 Stride=6 VF=4 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/szEj1ceee - for intels `Block RThroughput: =15.0`; for ryzens, `Block RThroughput: <=8.8`
So could pick cost of `15`.

For store we have:
https://godbolt.org/z/81bq4fTo1 - for intels `Block RThroughput: =12.0`; for ryzens, `Block RThroughput: <=10.0`
So we could pick cost of `12`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111087


  Commit: 2996a2b50fe39784b4c98748ba2a5b9595dc40f4
      https://github.com/llvm/llvm-project/commit/2996a2b50fe39784b4c98748ba2a5b9595dc40f4
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-05 (Tue, 05 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-6.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-6.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-6.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-6.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i32/f32 Stride=6 VF=8 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/jK85GWKaK - for intels `Block RThroughput: =31.0`; for ryzens, `Block RThroughput: <=17.0`
So could pick cost of `31`.

For store we have:
https://godbolt.org/z/hPWWhEEf9 - for intels `Block RThroughput: =33.0`; for ryzens, `Block RThroughput: <=13.8`
So we could pick cost of `33`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111089


  Commit: 79d6d12d9585dd584f259fa7395ad9465bef9aeb
      https://github.com/llvm/llvm-project/commit/79d6d12d9585dd584f259fa7395ad9465bef9aeb
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-05 (Tue, 05 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-6.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-6.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-6.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-6.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i32/f32 Stride=6 VF=16 interleaving costs

This one required quite a bit of an assembly surgery, but i think it's in the right ballpark..

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/na97Kb96o - for intels `Block RThroughput: <=64.0`; for ryzens, `Block RThroughput: <=32.0`
So could pick cost of `64`.

For store we have:
https://godbolt.org/z/GG1WeoKar - for intels `Block RThroughput: =66.0`; for ryzens, `Block RThroughput: <=27.5`
So we could pick cost of `66`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111091


  Commit: 3960693048a067e295d25c252b5f3a985c637bf2
      https://github.com/llvm/llvm-project/commit/3960693048a067e295d25c252b5f3a985c637bf2
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-05 (Tue, 05 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f64-stride-6.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-6.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f64-stride-6.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-6.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i64/f64 Stride=6 VF=2 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/onese7rec - for intels `Block RThroughput: =6.0`; for ryzens, `Block RThroughput: =3.0`
So could pick cost of `6`.

For store we have:
https://godbolt.org/z/bMd7dddnT - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=6.0`
So we could pick cost of `8`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111092


  Commit: e2784c5d8cf6b2fe29d4b72addebadc619044c44
      https://github.com/llvm/llvm-project/commit/e2784c5d8cf6b2fe29d4b72addebadc619044c44
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-05 (Tue, 05 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f64-stride-6.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-6.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f64-stride-6.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-6.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i64/f64 Stride=6 VF=4 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/rc8jYxW6M - for intels `Block RThroughput: =18.0`; for ryzens, `Block RThroughput: =6.0`
So could pick cost of `18`.

For store we have:
https://godbolt.org/z/9PhPEr65G - for intels `Block RThroughput: =15.0`; for ryzens, `Block RThroughput: =6.0`
So we could pick cost of `15`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111093


  Commit: 3f9b235482a0e75946e3fc76dff93e0e29f104ab
      https://github.com/llvm/llvm-project/commit/3f9b235482a0e75946e3fc76dff93e0e29f104ab
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-05 (Tue, 05 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f64-stride-6.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-6.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f64-stride-6.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-6.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i64/f64 Stride=6 VF=8 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/1jfGddcre - for intels `Block RThroughput: =36.0`; for ryzens, `Block RThroughput: =12.0`
So could pick cost of `36`

For store we have:
https://godbolt.org/z/ao9srMT8r - for intels `Block RThroughput: =30.0`; for ryzens, `Block RThroughput: =12.0`
So we could pick cost of `30`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111094


Compare: https://github.com/llvm/llvm-project/compare/095c48fdf3d2...3f9b235482a0


More information about the All-commits mailing list