[PATCH] D111174: [X86][Costmodel] Improve cost modelling for not-fully-interleaved load
Roman Lebedev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Oct 5 11:55:41 PDT 2021
lebedev.ri created this revision.
lebedev.ri added a reviewer: RKSimon.
lebedev.ri added a project: LLVM.
Herald added subscribers: pengfei, hiraditya.
lebedev.ri requested review of this revision.
While i've modelled most of the relevant tuples for AVX2,
that only covered fully-interleaved groups.
By definition, interleaving load of stride N means:
load N*VF elements, and shuffle them into N VF-sized vectors,
with 0'th vector containing elements `[0, VF)*stride + 0`,
and 1'th vector containing elements `[0, VF)*stride + 1`.
Example: https://godbolt.org/z/df561Me5E
Now, not fully interleaved load, is when not all of these vectors is demanded.
So at worst, we could just pretend that everything is demanded,
and discard the non-demanded vectors. What this means is that the cost
for not-fully-interleaved group should be not greater than the cost
for the same fully-interleaved group, but perhaps somewhat less.
Example: https://godbolt.org/z/a78dK5Geq
As we have established over the course of last ~70 patches, (wow)
`BaseT::getInterleavedMemoryOpCos()` is absolutely bogus,
it is usually almost an order of magnitude overestimation,
so i would claim that we should at least use the hardcoded costs
of fully interleaved load groups.
We could go further and adjust them e.g. by the number of demanded indices,
but then i'm somewhat fearful of underestimating the cost.
Thoughts?
Repository:
rG LLVM Github Monorepo
https://reviews.llvm.org/D111174
Files:
llvm/lib/Target/X86/X86TargetTransformInfo.cpp
llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-2-indices-0u.ll
llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-3-indices-01u.ll
llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-3-indices-0uu.ll
llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-4-indices-012u.ll
llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-4-indices-01uu.ll
llvm/test/Analysis/CostModel/X86/interleaved-load-i32-stride-4-indices-0uuu.ll
llvm/test/Transforms/LoopVectorize/X86/pr48340.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D111174.377320.patch
Type: text/x-patch
Size: 23196 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20211005/d721153a/attachment.bin>
More information about the llvm-commits
mailing list