[PATCH] D111460: [X86][LoopVectorize] "Fix" `X86TTIImpl::getAddressComputationCost()`
Pengfei Wang via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Oct 18 06:14:36 PDT 2021
pengfei added a comment.
Thanks for the explanation! Apologize for my silly questions, I'm still learning in this realm :)
> It's mostly tangential, the longer this patch drags out, the more tests may need to be updated, which isn't fun.
I don't get the point, why a tangential patch results in test updated?
> Naive interleaving sequence is: wide load + extracts + inserts,
> while scalarized gather sequence is: narrow loads + inserts,
>
> Right now "gather" sequence is modelled as having artificially-high cost
> (what this patch changes), effectively prohibiting using it for vectorization.
> But afterwards, naturally, the cost of scalarized gather sequence lowers,
> becoming more attractive for vectorization, and sometimes even more than
> for the naive interleaving sequence, as you can see in `interleaved-load-i16-stride-5.ll`.
I think the high cost might have practical consideration, e.g., impact some benchmarks etc.
What's the value to give a cost smaller than naive interleaving sequence? IIUC, you are saying we prefer naive interleaving to scalarized gather, right?
> But in general, we can assume that sequence of shuffles may be better than
> a sequence of extracts+inserts, so we may be better off with interleaving
> rather than gather. But we can't really improve the cost modelling for the
> naive interleaving sequence (because we can't really properly cost model shuffles),
> so what we have to do is hardcode the costs for the interestting interleaving tuples
> in the cost tables, like i have done in those 90+ patches.
Do you mean this patch will enable the use of hardcode the costs? Can we make sure we always generate the interleaving sequence, e.g., when not have a constant stride?
================
Comment at: llvm/lib/Target/X86/X86TargetTransformInfo.cpp:3847
// ADD instruction.
- if (Ty->isVectorTy() && SE) {
+ if (Ty->isVectorTy() && SE && !ST->hasAVX2()) {
+ // FIXME: this is a hack to artificially favor interleaved loads over gather
----------------
Does this still fall into @dmgreen 's previous comments?
> The base getAddressComputationCost never returns anything other than 0 and doesn't distinguish between scalar and vector types.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7061
Type *PtrTy = ToVectorTy(Ptr->getType(), VF);
+ // FIXME: PtrTy should not be a vector, it's a hack.
----------------
lebedev.ri wrote:
> pengfei wrote:
> > It would be more productive if it includes the reason why we can’t change it right away.
> > Lack of such explanation led to the original version of the patch that repeats D93129.
> I'm open to suggestions as to how this should be amended.
Sorry, I don't have a concrete suggestion. I have concerns because I saw disagreement from Dave and didn't see consensus have been made.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D111460/new/
https://reviews.llvm.org/D111460
More information about the llvm-commits
mailing list