[PATCH] D111174: [X86][Costmodel] Improve cost modelling for not-fully-interleaved load

Roman Lebedev via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Oct 5 11:55:41 PDT 2021


lebedev.ri created this revision.
lebedev.ri added a reviewer: RKSimon.
lebedev.ri added a project: LLVM.
Herald added subscribers: pengfei, hiraditya.
lebedev.ri requested review of this revision.

While i've modelled most of the relevant tuples for AVX2,
that only covered fully-interleaved groups.

By definition, interleaving load of stride N means:
load N*VF elements, and shuffle them into N VF-sized vectors,
with 0'th vector containing elements `[0, VF)*stride + 0`,
and 1'th vector containing elements `[0, VF)*stride + 1`.
Example: https://godbolt.org/z/df561Me5E

Now, not fully interleaved load, is when not all of these vectors is demanded.
So at worst, we could just pretend that everything is demanded,
and discard the non-demanded vectors. What this means is that the cost
for not-fully-interleaved group should be not greater than the cost
for the same fully-interleaved group, but perhaps somewhat less.
Example: https://godbolt.org/z/a78dK5Geq

As we have established over the course of last ~70 patches, (wow)
`BaseT::getInterleavedMemoryOpCos()` is absolutely bogus,
it is usually almost an order of magnitude overestimation,
so i would claim that we should at least use the hardcoded costs
of fully interleaved load groups.

We could go further and adjust them e.g. by the number of demanded indices,
but then i'm somewhat fearful of underestimating the cost.

Thoughts?


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D111174

Files:
  llvm/lib/Target/X86/X86TargetTransformInfo.cpp
  llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-2-indices-0u.ll
  llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-3-indices-01u.ll
  llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-3-indices-0uu.ll
  llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-4-indices-012u.ll
  llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-4-indices-01uu.ll
  llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-4-indices-0uuu.ll
  llvm/test/Transforms/LoopVectorize/X86/pr48340.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D111174.377320.patch
Type: text/x-patch
Size: 23196 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20211005/d721153a/attachment.bin>


More information about the llvm-commits mailing list