[PATCH] D111460: [X86][LoopVectorize] "Fix" `X86TTIImpl::getAddressComputationCost()`

Mon Oct 18 06:14:36 PDT 2021

pengfei added a comment.

Thanks for the explanation! Apologize for my silly questions, I'm still learning in this realm :)

> It's mostly tangential, the longer this patch drags out, the more tests may need to be updated, which isn't fun.

I don't get the point, why a tangential patch results in test updated?

> Naive interleaving sequence is: wide load + extracts + inserts,
> while scalarized gather sequence is: narrow loads + inserts,
>
> Right now "gather" sequence is modelled as having artificially-high cost
> (what this patch changes), effectively prohibiting using it for vectorization. 
> But afterwards, naturally, the cost of scalarized gather sequence lowers,
> becoming more attractive for vectorization, and sometimes even more than 
> for the naive interleaving sequence, as you can see in `interleaved-load-i16-stride-5.ll`.

I think the high cost might have practical consideration, e.g., impact some benchmarks etc.
What's the value to give a cost smaller than naive interleaving sequence? IIUC, you are saying we prefer naive interleaving to scalarized gather, right?

> But in general, we can assume that sequence of shuffles may be better than 
> a sequence of extracts+inserts, so we may be better off with interleaving
> rather than gather. But we can't really improve the cost modelling for the
> naive interleaving sequence (because we can't really properly cost model shuffles),
> so what we have to do is hardcode the costs for the interestting interleaving tuples
> in the cost tables, like i have done in those 90+ patches.

Do you mean this patch will enable the use of hardcode the costs? Can we make sure we always generate the interleaving sequence, e.g., when not have a constant stride?

================
Comment at: llvm/lib/Target/X86/X86TargetTransformInfo.cpp:3847
   // ADD instruction.
-  if (Ty->isVectorTy() && SE) {
+  if (Ty->isVectorTy() && SE && !ST->hasAVX2()) {
+    // FIXME: this is a hack to artificially favor interleaved loads over gather
----------------
Does this still fall into @dmgreen 's previous comments?

> The base getAddressComputationCost never returns anything other than 0 and doesn't distinguish between scalar and vector types.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7061
   Type *PtrTy = ToVectorTy(Ptr->getType(), VF);
+  // FIXME: PtrTy should not be a vector, it's a hack.

----------------
lebedev.ri wrote:
> pengfei wrote:
> > It would be more productive if it includes the reason why we can’t change it right away.
> > Lack of such explanation led to the original version of the patch that repeats D93129.
> I'm open to suggestions as to how this should be amended.
Sorry, I don't have a concrete suggestion. I have concerns because I saw disagreement from Dave and didn't see consensus have been made.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D111460/new/

https://reviews.llvm.org/D111460