[PATCH] D112307: [X86] `X86TTIImpl::getInterleavedMemoryOpCost()`: scale interleaving cost by the fraction of live members

Roman Lebedev via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Oct 22 04:33:08 PDT 2021


lebedev.ri created this revision.
lebedev.ri added a reviewer: RKSimon.
lebedev.ri added a project: LLVM.
Herald added subscribers: pengfei, hiraditya.
lebedev.ri requested review of this revision.

By definition, interleaving load of stride N means:
load N*VF elements, and shuffle them into N VF-sized vectors,
with 0'th vector containing elements `[0, VF)*stride + 0`,
and 1'th vector containing elements `[0, VF)*stride + 1`.
Example: https://godbolt.org/z/df561Me5E (i64 stride 4 vf 2 => cost 6)

Now, not fully interleaved load, is when not all of these vectors is demanded.
So at worst, we could just pretend that everything is demanded,
and discard the non-demanded vectors. What this means is that the cost
for not-fully-interleaved group should be not greater than the cost
for the same fully-interleaved group, but perhaps somewhat less.
Examples:
https://godbolt.org/z/a78dK5Geq (i64 stride 4 (indices 012u) vf 2 => cost 4)
https://godbolt.org/z/G91ceo8dM (i64 stride 4 (indices 01uu) vf 2 => cost 2)
https://godbolt.org/z/5joYob9rx (i64 stride 4 (indices 0uuu) vf 2 => cost 1)

Right now, for such not-fully-interleaved loads we just use the costs
for fully-interleaved loads. But at least **in general**,
that is obviously overly pessimistic, because **in general**,
not all the shuffles needed to perform the full interleaving
will end up being live.

So what i propose, is to naively scale the interleaving cost
by the fraction of the live members. I believe this should still result
in the right ballpark cost estimate.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D112307

Files:
  llvm/lib/Target/X86/X86TargetTransformInfo.cpp
  llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-2-indices-0u.ll
  llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-3-indices-01u.ll
  llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-3-indices-0uu.ll
  llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-4-indices-012u.ll
  llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-4-indices-01uu.ll
  llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-4-indices-0uuu.ll
  llvm/test/Transforms/LoopVectorize/X86/pr48340.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D112307.381506.patch
Type: text/x-patch
Size: 24764 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20211022/b5214a4c/attachment.bin>


More information about the llvm-commits mailing list