[all-commits] [llvm/llvm-project] 887acf: [X86][Costmodel] Load/store i16 Stride=6 VF=32 int...

Roman Lebedev via All-commits all-commits at lists.llvm.org
Sun Oct 17 07:40:00 PDT 2021


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 887acf6842cb48e7c51728ed8d81fc5ab0425403
      https://github.com/llvm/llvm-project/commit/887acf6842cb48e7c51728ed8d81fc5ab0425403
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-17 (Sun, 17 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-6.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-6.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i16 Stride=6 VF=32 interleaving costs

A few more tuples are being queried after D111546. Might be good to model them,
They all require a lot of manual assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/YTeT9M7fW - for intels `Block RThroughput: <=212.0`; for ryzens, `Block RThroughput: <=64.0`
So could pick cost of `212`

For store we have:
https://godbolt.org/z/vc954KEGP - for intels `Block RThroughput: <=90.0`; for ryzens, `Block RThroughput: <=24.0`
So we could pick cost of `90`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111940


  Commit: 4b76a74b4283362f69748c4d0a5bc22b1237ced0
      https://github.com/llvm/llvm-project/commit/4b76a74b4283362f69748c4d0a5bc22b1237ced0
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-17 (Sun, 17 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-3-indices-01u.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-3-indices-0uu.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-3.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-3.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i32 Stride=3 VF=32 interleaving costs

A few more tuples are being queried after D111546. Might be good to model them,
They all require a lot of manual assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/s5b6E6jsP - for intels `Block RThroughput: <=32.0`; for ryzens, `Block RThroughput: <=24.0`
So could pick cost of `32`

For store we have:
https://godbolt.org/z/efh99d93b - for intels `Block RThroughput: <=48.0`; for ryzens, `Block RThroughput: <=32.0`
So we could pick cost of `48`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111942


  Commit: 3a6a9f74d3a59beb359a9968ac27dcf97d072b3a
      https://github.com/llvm/llvm-project/commit/3a6a9f74d3a59beb359a9968ac27dcf97d072b3a
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-17 (Sun, 17 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-4-indices-012u.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-4-indices-01uu.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-4-indices-0uuu.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-4.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i32 Stride=4 VF=32 interleaving costs

A few more tuples are being queried after D111546. Might be good to model them,
They all require a lot of manual assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/11rcvdreP - for intels `Block RThroughput: <=68.0`; for ryzens, `Block RThroughput: <=48.0`
So could pick cost of `68`

For store we have:
https://godbolt.org/z/6aM11fWcP - for intels `Block RThroughput: <=64.0`; for ryzens, `Block RThroughput: <=32.0`
So we could pick cost of `64`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111943


  Commit: 3274ce3a287dcd4d02b4d2c7a2bf60e942836e06
      https://github.com/llvm/llvm-project/commit/3274ce3a287dcd4d02b4d2c7a2bf60e942836e06
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-17 (Sun, 17 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f64-stride-2.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-2.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f64-stride-2.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-2.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i64 Stride=2 VF=32 interleaving costs

A few more tuples are being queried after D111546. Might be good to model them,
They all require a lot of manual assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/MTaKboejM - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=16.0`
So could pick cost of `32`

For store we have:
https://godbolt.org/z/v7xPj3Wd4 - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=32.0`
So we could pick cost of `32`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111944


  Commit: 91373bf12ec66591addf56b9f447ec9befd6ddae
      https://github.com/llvm/llvm-project/commit/91373bf12ec66591addf56b9f447ec9befd6ddae
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-17 (Sun, 17 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f64-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f64-stride-4.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-4.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i64 Stride=4 VF=16 interleaving costs

A few more tuples are being queried after D111546. Might be good to model them,
They all require a lot of manual assembly surgery.

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/9bnKrefcG - for intels `Block RThroughput: =40.0`; for ryzens, `Block RThroughput: =16.0`
So could pick cost of `40`

For store we have:
https://godbolt.org/z/5s3s14dEY - for intels `Block RThroughput: =40.0`; for ryzens, `Block RThroughput: =16.0`
So we could pick cost of `40`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D111945


Compare: https://github.com/llvm/llvm-project/compare/dd8c8d4b7cee...91373bf12ec6


More information about the All-commits mailing list