[PATCH] D101924: [X86] Improve costmodel for scalar byte swaps

Thu May 6 14:02:03 PDT 2021

lebedev.ri marked an inline comment as done.
lebedev.ri added a subscriber: magabari.
lebedev.ri added a comment.

In D101924#2741844 <https://reviews.llvm.org/D101924#2741844>, @RKSimon wrote:

> Random comment: This has me wishing that we'd made further progress with D46276 <https://reviews.llvm.org/D46276> - I'd much prefer being able to peer into the scheduler models for costs than relying on yet more feature flags :(

I know, right? E.g. even when looking at the most simplest snippet from

In D40008#2716353 <https://reviews.llvm.org/D40008#2716353>, @lebedev.ri wrote:

> In D40008#2716332 <https://reviews.llvm.org/D40008#2716332>, @RKSimon wrote:
>
>> In D40008#2715994 <https://reviews.llvm.org/D40008#2715994>, @lebedev.ri wrote:
>>
>>> @RKSimon @magabari I'd like to add some more tuples, but i have a question: how are the costs actually derived?
>>> For example, the assembly for interleaved load of i16 w/ stride 2: https://godbolt.org/z/hjb3d5x6E
>>> What's it cost? I'm guessing it's not just `10`, aka the instruction count excluding the loads/stores?
>>> Is it 5 from `Block RThroughput: 4.8` from MCA: https://godbolt.org/z/fxYcEj3Wx ?
>>> Which CPU should be used for these numbers?
>>
>> I believe they were taken from IACA probably with a Haswell CPU - a reciprocal throughput from llvm-mca should be similar.
>>
>> Usually with cost tables we tend to compare numbers from similar spec CPUs (AVX2 - Haswell/Ryzen) and choose the worst.....
>
> I see. So in this case we have:
>
> - znver1/2 4.8 https://godbolt.org/z/W9x6GWdnh https://godbolt.org/z/dx7718YT9 (likely unreliable, awaiting zen3)
> - haswell/broadwell/skylake 9 https://godbolt.org/z/bzG17drjn https://godbolt.org/z/frnEfeY6K https://godbolt.org/z/o7jK9M9hK
>
> therefore for that tuple we choose `9`, correct? I'm not seeing any other sched models for AVX2 but not AVX512 CPU's.
>
> And another question: now that we've established the rules, should i be submitting these changes through review,
> or committing these directly? I fear former would either result in bulky patches that are hard to review,
> or saturate the review queue.

... the difference between haswell and zen3 is 3x...
I don't really want to add yet another costmodel subset, but that difference is *NOT* great...

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101924/new/

https://reviews.llvm.org/D101924