[PATCH] D40008: [X86][TTI] update costs of interleaved load\store of i64\double

Mon Apr 26 04:49:15 PDT 2021

lebedev.ri added a comment.

In D40008#2716332 <https://reviews.llvm.org/D40008#2716332>, @RKSimon wrote:

> In D40008#2715994 <https://reviews.llvm.org/D40008#2715994>, @lebedev.ri wrote:
>
>> @RKSimon @magabari I'd like to add some more tuples, but i have a question: how are the costs actually derived?
>> For example, the assembly for interleaved load of i16 w/ stride 2: https://godbolt.org/z/hjb3d5x6E
>> What's it cost? I'm guessing it's not just `10`, aka the instruction count excluding the loads/stores?
>> Is it 5 from `Block RThroughput: 4.8` from MCA: https://godbolt.org/z/fxYcEj3Wx ?
>> Which CPU should be used for these numbers?
>
> I believe they were taken from IACA probably with a Haswell CPU - a reciprocal throughput from llvm-mca should be similar.
>
> Usually with cost tables we tend to compare numbers from similar spec CPUs (AVX2 - Haswell/Ryzen) and choose the worst.....

I see. So in this case we have:

- znver1/2 4.8 https://godbolt.org/z/W9x6GWdnh https://godbolt.org/z/dx7718YT9 (likely unreliable, awaiting zen3)
- haswell/broadwell/skylake 9 https://godbolt.org/z/bzG17drjn https://godbolt.org/z/frnEfeY6K https://godbolt.org/z/o7jK9M9hK

therefore for that tuple we choose `9`, correct? I'm not seeing any other sched models for AVX2 but not AVX512 CPU's.

And another question: now that we've established the rules, should i be submitting these changes through review,
or committing these directly? I fear latter would either result in bulky patches that are hard to review,
or saturate the review queue.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D40008/new/

https://reviews.llvm.org/D40008