[PATCH] D140789: [SLP] Unify GEP cost modeling for load, store and GEP nodes.

Valeriy Dmitriev via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jan 4 10:50:44 PST 2023


vdmitrie added inline comments.


================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6906-6907
+      // count them as savings.
+      if (!Ptr->hasOneUse() && isa<LoadInst, StoreInst>(VL0))
+        continue;
+
----------------
ABataev wrote:
> vdmitrie wrote:
> > ABataev wrote:
> > > vdmitrie wrote:
> > > > ABataev wrote:
> > > > > Shall we drop this check? We still vectorize GEPs with multiple uses and then emit extractelement for them. The cost of the extractelement is calculated separately. So, when we calculate the cost for GEPs with multiple uses, we exclude them from saving cost and then we add an extra cost for extractelement. If we're still going to emit extractelement, need to remove this check (the original пуз will be vectorized and removed and then the extractelement is generated). If it is better to keep scalar copy, need to remove the cost of the extractelement calculation and keep this check.
> > > > For plain vector loads and stores we do not vectorize GEPs and hence do not emit extract element instructions. Instead as scalar loads are removed and GEPs for which these loads (or stores) were single users are also removed. All the rest GEPs stay in the code.  When we build vec tree we do not dive into loads or stores pointer arguments, these loads/or store nodes are terminal nodes. This is why I added check for stores/loads which will only return true for vector loads or stores.
> > > > define ptr @foo(ptr nocapture readonly %src, ptr nocapture %dst) local_unnamed_addr {
> > > > entry:
> > > >   %arrayidxA0 = getelementptr inbounds double, ptr %src, i64 0
> > > >   %A0 = load double, ptr %arrayidxA0, align 1
> > > >   %arrayidxA1 = getelementptr inbounds double, ptr %src, i64 1
> > > >   %A1 = load double, ptr %arrayidxA1, align 1
> > > > 
> > > >   %arrayidx0 = getelementptr inbounds double, ptr %dst, i64 0
> > > >   store double %A0, ptr %arrayidx0, align 16
> > > >   %arrayidx1 = getelementptr inbounds double, ptr %dst, i64 1
> > > >   store double %A1, ptr %arrayidx1, align 16
> > > > 
> > > >   ret ptr %arrayidxA1
> > > > }
> > > > 
> > > > We generate:
> > > >   %arrayidxA0 = getelementptr inbounds double, ptr %src, i64 0
> > > >   %arrayidxA1 = getelementptr inbounds double, ptr %src, i64 1
> > > >   %arrayidx0 = getelementptr inbounds double, ptr %dst, i64 0
> > > >   %0 = load <2 x double>, ptr %arrayidxA0, align 1
> > > >   store <2 x double> %0, ptr %arrayidx0, align 16
> > > >   ret ptr %arrayidxA1
> > > > 
> > > > We do not do the same for gather loads (aka ScatterVectorize) as we indeed vectorize GEPs and then do extracts for external uses.
> > > > 
> > > Yes, for vector loads/store it is so. But what about masked gather? We avoid the cost compensation here and then add the extractelement cost.
> > For gather loads this routine is not called when we process tree node with set of loads.
> > It is called when tree node with GEPs is processed. For GEPs the VL0 will not be a load.
> > It was a bug in cost modeling that we evaluated GEPs twice for gather loads which this patch particularly fixes. 
> I mean not gather but masked gather, ScatterVectorize.
> Another one thing. Shall we add a cost of vector GEP? Currently we just subtract the cost of scalar GEPs.
> I mean not gather but masked gather, ScatterVectorize.

Yes, I understand you. By "gather loads" I meant exactly masked gather load (aka ScatterVectorize). So what I said still holds.

> Another one thing. Shall we add a cost of vector GEP? Currently we just subtract the cost of scalar GEPs.

This can be considered for a follow up changes. To be more accurate we do not count base address now (see the first check point in the loop) assuming that cost of vector GEP will be the same. So we don't add  its cost for scalar calculation and hence do not subtract it then (it's kind a shortcut). For plain vector loads and stores that is correct estimation (where for regular pointers we only generate a pointer cast which I believe is free). For  GEP nodes this estimation may not be quite correct. 





Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140789/new/

https://reviews.llvm.org/D140789



More information about the llvm-commits mailing list