[PATCH] D100684: [X86][CostModel] X86TTIImpl::getMemoryOpCost(): rewrite vector handling again

Wed May 12 05:13:40 PDT 2021

lebedev.ri added a comment.

@RKSimon i'm in need of a bit of guidance.

I'd like to maybe deal with `getInterleavedMemoryOpCostAVX2()` next, but i'm not sure what's the best way forward.
After thinking about it, it'm iffy about just adding more hardcoded entries to the costtable there.
We have element {i8, i16, i32, i64} * stride {2..6} * VF {8..64}. That's 64 entries already, by naive estimates.
This ignores partial strided loads (with Indices.size() != stride), and other vector widths.
Those will cause a basically exponential explosion.

Do we really want to proceed on that path?
I'm seeing two alternatives:

1. Perhaps we should try to come up with an algorithmic approach, like we have here?
2. Perhaps we should simply automate this? Run the strided load pattern through codegen, run that through exegesis, and automatically record it's performance?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100684/new/

https://reviews.llvm.org/D100684