[all-commits] [llvm/llvm-project] 887acf: [X86][Costmodel] Load/store i16 Stride=6 VF=32 int...
Roman Lebedev via All-commits
all-commits at lists.llvm.org
Sun Oct 17 07:40:00 PDT 2021
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 887acf6842cb48e7c51728ed8d81fc5ab0425403
https://github.com/llvm/llvm-project/commit/887acf6842cb48e7c51728ed8d81fc5ab0425403
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-17 (Sun, 17 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-i16-stride-6.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i16-stride-6.ll
Log Message:
-----------
[X86][Costmodel] Load/store i16 Stride=6 VF=32 interleaving costs
A few more tuples are being queried after D111546. Might be good to model them,
They all require a lot of manual assembly surgery.
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/YTeT9M7fW - for intels `Block RThroughput: <=212.0`; for ryzens, `Block RThroughput: <=64.0`
So could pick cost of `212`
For store we have:
https://godbolt.org/z/vc954KEGP - for intels `Block RThroughput: <=90.0`; for ryzens, `Block RThroughput: <=24.0`
So we could pick cost of `90`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111940
Commit: 4b76a74b4283362f69748c4d0a5bc22b1237ced0
https://github.com/llvm/llvm-project/commit/4b76a74b4283362f69748c4d0a5bc22b1237ced0
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-17 (Sun, 17 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-3-indices-01u.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-3-indices-0uu.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-3.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-3.ll
Log Message:
-----------
[X86][Costmodel] Load/store i32 Stride=3 VF=32 interleaving costs
A few more tuples are being queried after D111546. Might be good to model them,
They all require a lot of manual assembly surgery.
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/s5b6E6jsP - for intels `Block RThroughput: <=32.0`; for ryzens, `Block RThroughput: <=24.0`
So could pick cost of `32`
For store we have:
https://godbolt.org/z/efh99d93b - for intels `Block RThroughput: <=48.0`; for ryzens, `Block RThroughput: <=32.0`
So we could pick cost of `48`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111942
Commit: 3a6a9f74d3a59beb359a9968ac27dcf97d072b3a
https://github.com/llvm/llvm-project/commit/3a6a9f74d3a59beb359a9968ac27dcf97d072b3a
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-17 (Sun, 17 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-f32-stride-4.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-4-indices-012u.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-4-indices-01uu.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-4-indices-0uuu.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-4.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-f32-stride-4.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i32-stride-4.ll
Log Message:
-----------
[X86][Costmodel] Load/store i32 Stride=4 VF=32 interleaving costs
A few more tuples are being queried after D111546. Might be good to model them,
They all require a lot of manual assembly surgery.
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/11rcvdreP - for intels `Block RThroughput: <=68.0`; for ryzens, `Block RThroughput: <=48.0`
So could pick cost of `68`
For store we have:
https://godbolt.org/z/6aM11fWcP - for intels `Block RThroughput: <=64.0`; for ryzens, `Block RThroughput: <=32.0`
So we could pick cost of `64`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111943
Commit: 3274ce3a287dcd4d02b4d2c7a2bf60e942836e06
https://github.com/llvm/llvm-project/commit/3274ce3a287dcd4d02b4d2c7a2bf60e942836e06
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-17 (Sun, 17 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-f64-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-f64-stride-2.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-2.ll
Log Message:
-----------
[X86][Costmodel] Load/store i64 Stride=2 VF=32 interleaving costs
A few more tuples are being queried after D111546. Might be good to model them,
They all require a lot of manual assembly surgery.
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/MTaKboejM - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=16.0`
So could pick cost of `32`
For store we have:
https://godbolt.org/z/v7xPj3Wd4 - for intels `Block RThroughput: =32.0`; for ryzens, `Block RThroughput: <=32.0`
So we could pick cost of `32`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111944
Commit: 91373bf12ec66591addf56b9f447ec9befd6ddae
https://github.com/llvm/llvm-project/commit/91373bf12ec66591addf56b9f447ec9befd6ddae
Author: Roman Lebedev <lebedev.ri at gmail.com>
Date: 2021-10-17 (Sun, 17 Oct 2021)
Changed paths:
M llvm/lib/Target/X86/X86TargetTransformInfo.cpp
M llvm/test/Analysis/CostModel/X86/interleaved-load-f64-stride-4.ll
M llvm/test/Analysis/CostModel/X86/interleaved-load-i64-stride-4.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-f64-stride-4.ll
M llvm/test/Analysis/CostModel/X86/interleaved-store-i64-stride-4.ll
Log Message:
-----------
[X86][Costmodel] Load/store i64 Stride=4 VF=16 interleaving costs
A few more tuples are being queried after D111546. Might be good to model them,
They all require a lot of manual assembly surgery.
The only sched models that for cpu's that support avx2
but not avx512 are: haswell, broadwell, skylake, zen1-3
For load we have:
https://godbolt.org/z/9bnKrefcG - for intels `Block RThroughput: =40.0`; for ryzens, `Block RThroughput: =16.0`
So could pick cost of `40`
For store we have:
https://godbolt.org/z/5s3s14dEY - for intels `Block RThroughput: =40.0`; for ryzens, `Block RThroughput: =16.0`
So we could pick cost of `40`.
I'm directly using the shuffling asm the llc produced,
without any manual fixups that may be needed
to ensure sequential execution.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D111945
Compare: https://github.com/llvm/llvm-project/compare/dd8c8d4b7cee...91373bf12ec6
More information about the All-commits
mailing list