[PATCH] D103144: [X86][Costmodel] Load/store v2i16 VF=2 interleaving costs

Wed May 26 15:07:29 PDT 2021

lebedev.ri added a comment.

@RKSimon to be most specific: what i want here, is to agree on the algorithm, namely:

1. pick one IR sequence (from the codegen tests i've already added in `llvm/test/CodeGen/X86/vector-interleaved-{load,store}-i16-stride-[2-6].ll`)
2. annotate the [de]interleaving shuffle block with MCA macros
3. for each CPU (that has AVX2 but not AVX512) for which we have sched model:
  1. codegen the IR for given CPU
  2. do NO manual changes to the produced assembly whatsoever
  3. record `Block RThroughput` the MCA produces for that CPU/sched model
4. pick largest recorded `Block RThroughput` (rounding 0.999->1, 1.5->1, 1.50001->2) as the cost for the tuple

... so i can proceed to add the missing costs without nagging on you to double-check the findings.
At least that is what i would like to happen. If that is unfeasible, i can submit them as reviews..

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103144/new/

https://reviews.llvm.org/D103144