[all-commits] [llvm/llvm-project] f44d90: [X86][Costmodel] Load/store i32/f32 Stride=2 VF=2 ...

Roman Lebedev via All-commits all-commits at lists.llvm.org
Fri Oct 1 07:49:16 PDT 2021


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: f44d9009c25827dd9fad5bfa240f6e59335d07b8
      https://github.com/llvm/llvm-project/commit/f44d9009c25827dd9fad5bfa240f6e59335d07b8
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-01 (Fri, 01 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-2.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-2.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-2.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-2.ll
    M llvm/test/Transforms/LoopVectorize/X86/interleaving.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i32/f32 Stride=2 VF=2 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/4rY96hnGT - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: =1.0`
So pick cost of `2`.

For store we have:
https://godbolt.org/z/vbo37Y3r9 - for intels `Block RThroughput: =1.0`; for ryzens, `Block RThroughput: =0.5`
So pick cost of `1`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110753


  Commit: b12aeaec9aca28cbd23587dda6a3126ab0aaf1c0
      https://github.com/llvm/llvm-project/commit/b12aeaec9aca28cbd23587dda6a3126ab0aaf1c0
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-01 (Fri, 01 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-2.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-2.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-2.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-2.ll
    M llvm/test/Transforms/LoopVectorize/X86/interleaving.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i32/f32 Stride=2 VF=4 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/EM5Ean7bd - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: =1.0`
So pick cost of `2`.

For store we have:
https://godbolt.org/z/EM5Ean7bd - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `2`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110754


  Commit: 3a0643e9c2252290a9f29c2b3ceb696033af4903
      https://github.com/llvm/llvm-project/commit/3a0643e9c2252290a9f29c2b3ceb696033af4903
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-01 (Fri, 01 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-2.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-2.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-2.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-2.ll
    M llvm/test/Transforms/LoopVectorize/X86/interleaving.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i32/f32 Stride=2 VF=8 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/n8aMKeo4E - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

For store we have:
https://godbolt.org/z/n8aMKeo4E - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: =2.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110755


  Commit: 80cd8da78d027f59b54586887af4bb9c3b36a6ba
      https://github.com/llvm/llvm-project/commit/80cd8da78d027f59b54586887af4bb9c3b36a6ba
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-01 (Fri, 01 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-2.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-2.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-2.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-2.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i32/f32 Stride=2 VF=16 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/M9eev3xe8 - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `8`.

For store we have:
https://godbolt.org/z/M9eev3xe8 - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: =4.0`
So pick cost of `8`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110756


  Commit: ea76cb87ee4022d8663a7c25943478fe3f64e21a
      https://github.com/llvm/llvm-project/commit/ea76cb87ee4022d8663a7c25943478fe3f64e21a
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-01 (Fri, 01 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-2.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-2.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-2.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-2.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i32/f32 Stride=2 VF=32 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

Here for `store` pattern we are starting to have spilling,
so accurate modelling may be problematic,
although if i drop the spilling, the measurements don't change.

For load we have:
https://godbolt.org/z/1oTTnncbx - for intels `Block RThroughput: =16.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `16`.

For store we have:
https://godbolt.org/z/1oTTnncbx - for intels `Block RThroughput: =16.0`; for ryzens, `Block RThroughput: =8.0`
So pick cost of `16`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110761


  Commit: 612e5b05a281b867383f52e457781d1b5ba76c2d
      https://github.com/llvm/llvm-project/commit/612e5b05a281b867383f52e457781d1b5ba76c2d
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-01 (Fri, 01 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f64-stride-2.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-2.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f64-stride-2.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-2.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i64/f64 Stride=2 VF=2 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/8a1cfGeMn - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: =1.0`
So pick cost of `2`.

For store we have:
https://godbolt.org/z/jMdcM47bx - for intels `Block RThroughput: =2.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `2`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110835


  Commit: 71bc31b907193c294f718046ed8ef569e3d4b9fa
      https://github.com/llvm/llvm-project/commit/71bc31b907193c294f718046ed8ef569e3d4b9fa
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-01 (Fri, 01 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f64-stride-2.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-2.ll
    R llvm/test/Analysis/CostModel/X86/interleaved-load-store-double.ll
    R llvm/test/Analysis/CostModel/X86/interleaved-load-store-i64.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f64-stride-2.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-2.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i64/f64 Stride=2 VF=4 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/j5co1qWEW - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=2.0`
So pick cost of `4`.

For store we have:
https://godbolt.org/z/j5co1qWEW - for intels `Block RThroughput: =4.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `4`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110837


  Commit: abd37de63ee97330f9397c4468802498b6101360
      https://github.com/llvm/llvm-project/commit/abd37de63ee97330f9397c4468802498b6101360
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-01 (Fri, 01 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f64-stride-2.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-2.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f64-stride-2.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-2.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i64/f64 Stride=2 VF=8 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/PGYbYKPq8 - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=4.0`
So pick cost of `8`.

For store we have:
https://godbolt.org/z/PGYbYKPq8 - for intels `Block RThroughput: =8.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `8`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110838


  Commit: 3e260efdfc6064481396a0c3ade703a739023c77
      https://github.com/llvm/llvm-project/commit/3e260efdfc6064481396a0c3ade703a739023c77
  Author: Roman Lebedev <lebedev.ri at gmail.com>
  Date:   2021-10-01 (Fri, 01 Oct 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
    M llvm/test/Analysis/CostModel/X86/interleaved-load-f64-stride-2.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-2.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-f64-stride-2.ll
    M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-2.ll

  Log Message:
  -----------
  [X86][Costmodel] Load/store i64/f64 Stride=2 VF=16 interleaving costs

The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3

For load we have:
https://godbolt.org/z/1WMTojvfW - for intels `Block RThroughput: =16.0`; for ryzens, `Block RThroughput: <=8.0`
So pick cost of `16`.

For store we have:
https://godbolt.org/z/1WMTojvfW - for intels `Block RThroughput: =16.0`; for ryzens, `Block RThroughput: <=16.0`
So pick cost of `16`.

I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D110840


Compare: https://github.com/llvm/llvm-project/compare/4f0a39b9b4ba...3e260efdfc60


More information about the All-commits mailing list