[llvm] [TTI][CostModel] Add cost modeling for expandload and compressstore intrinsics (PR #122882)

Thu Jan 23 04:39:00 PST 2025

skachkov-sc wrote:

> Can you take a step back and explain why you're doing this? At the moment, the only thing I can find in tree which creates these is the sanitizers?

The overall goal is to support compressstore/expandload patterns in VPlan (e.g. vectorize these loops: https://godbolt.org/z/Gdrsa6jr6). There were some attempts to implement this in the past: https://github.com/llvm/llvm-project/pull/83467, AMD presentation: https://www.youtube.com/watch?v=HVQ5mG3lExs (the code seems to be never published in open source). For now, I'm trying to address some missing points required for this implementation (and cost estimation for expandload/compressstore intrinsics will be needed in LoopVectorizer)

(Also, I've experimented with #83467 and discovered some issues, e.g. LAA can't estimate array bounds for such idioms, so it doesn't work without explicit restrict qualifier on pointers. I have patches to improve this, but for now they are in draft state.)

> From just a quick look at the x86 codegen tests, I don't think the cost proposed here actually matches the lowering.

X86 lowering of compressstore/expandload goes through scalarize-masked-mem-intrin pass (https://godbolt.org/z/734GvnG15), and it actually produces big amount of code (the cost of this code is estimated by getCommonMaskedMemoryOpCost). Before this patch, the cost was estimated by getIntrinsicInstrCost + getTypeBasedIntrinsicInstrCost using generic approach of scalarizing intrinsic (https://github.com/llvm/llvm-project/blob/6f684816e25d8b4e5fb2cbc7d0560d608a8bd938/llvm/include/llvm/CodeGen/BasicTTIImpl.h#L2025-L2065), which is a worse approximation of what is actually happens IMO. Could you please expain the concerns about new cost of these intrinsics? (TBH, I think it will not be beneficial to generate these intrinsics either  without any "native" form of supporting them like in RISC-V RVV)

https://github.com/llvm/llvm-project/pull/122882