[PATCH] D140789: [SLP] Unify GEP cost modeling for load, store and GEP nodes.

Wed Jan 4 10:35:33 PST 2023

ABataev added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6906-6907
+      // count them as savings.
+      if (!Ptr->hasOneUse() && isa<LoadInst, StoreInst>(VL0))
+        continue;
+
----------------
vdmitrie wrote:
> ABataev wrote:
> > vdmitrie wrote:
> > > ABataev wrote:
> > > > Shall we drop this check? We still vectorize GEPs with multiple uses and then emit extractelement for them. The cost of the extractelement is calculated separately. So, when we calculate the cost for GEPs with multiple uses, we exclude them from saving cost and then we add an extra cost for extractelement. If we're still going to emit extractelement, need to remove this check (the original пуз will be vectorized and removed and then the extractelement is generated). If it is better to keep scalar copy, need to remove the cost of the extractelement calculation and keep this check.
> > > For plain vector loads and stores we do not vectorize GEPs and hence do not emit extract element instructions. Instead as scalar loads are removed and GEPs for which these loads (or stores) were single users are also removed. All the rest GEPs stay in the code.  When we build vec tree we do not dive into loads or stores pointer arguments, these loads/or store nodes are terminal nodes. This is why I added check for stores/loads which will only return true for vector loads or stores.
> > > define ptr @foo(ptr nocapture readonly %src, ptr nocapture %dst) local_unnamed_addr {
> > > entry:
> > >   %arrayidxA0 = getelementptr inbounds double, ptr %src, i64 0
> > >   %A0 = load double, ptr %arrayidxA0, align 1
> > >   %arrayidxA1 = getelementptr inbounds double, ptr %src, i64 1
> > >   %A1 = load double, ptr %arrayidxA1, align 1
> > > 
> > >   %arrayidx0 = getelementptr inbounds double, ptr %dst, i64 0
> > >   store double %A0, ptr %arrayidx0, align 16
> > >   %arrayidx1 = getelementptr inbounds double, ptr %dst, i64 1
> > >   store double %A1, ptr %arrayidx1, align 16
> > > 
> > >   ret ptr %arrayidxA1
> > > }
> > > 
> > > We generate:
> > >   %arrayidxA0 = getelementptr inbounds double, ptr %src, i64 0
> > >   %arrayidxA1 = getelementptr inbounds double, ptr %src, i64 1
> > >   %arrayidx0 = getelementptr inbounds double, ptr %dst, i64 0
> > >   %0 = load <2 x double>, ptr %arrayidxA0, align 1
> > >   store <2 x double> %0, ptr %arrayidx0, align 16
> > >   ret ptr %arrayidxA1
> > > 
> > > We do not do the same for gather loads (aka ScatterVectorize) as we indeed vectorize GEPs and then do extracts for external uses.
> > > 
> > Yes, for vector loads/store it is so. But what about masked gather? We avoid the cost compensation here and then add the extractelement cost.
> For gather loads this routine is not called when we process tree node with set of loads.
> It is called when tree node with GEPs is processed. For GEPs the VL0 will not be a load.
> It was a bug in cost modeling that we evaluated GEPs twice for gather loads which this patch particularly fixes. 
I mean not gather but masked gather, ScatterVectorize.
Another one thing. Shall we add a cost of vector GEP? Currently we just subtract the cost of scalar GEPs.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140789/new/

https://reviews.llvm.org/D140789