[PATCH] D140789: [SLP] Unify GEP cost modeling for load, store and GEP nodes.

Wed Jan 4 10:15:30 PST 2023

ABataev added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:6906-6907
+      // count them as savings.
+      if (!Ptr->hasOneUse() && isa<LoadInst, StoreInst>(VL0))
+        continue;
+
----------------
vdmitrie wrote:
> ABataev wrote:
> > Shall we drop this check? We still vectorize GEPs with multiple uses and then emit extractelement for them. The cost of the extractelement is calculated separately. So, when we calculate the cost for GEPs with multiple uses, we exclude them from saving cost and then we add an extra cost for extractelement. If we're still going to emit extractelement, need to remove this check (the original пуз will be vectorized and removed and then the extractelement is generated). If it is better to keep scalar copy, need to remove the cost of the extractelement calculation and keep this check.
> For plain vector loads and stores we do not vectorize GEPs and hence do not emit extract element instructions. Instead as scalar loads are removed and GEPs for which these loads (or stores) were single users are also removed. All the rest GEPs stay in the code.  When we build vec tree we do not dive into loads or stores pointer arguments, these loads/or store nodes are terminal nodes. This is why I added check for stores/loads which will only return true for vector loads or stores.
> define ptr @foo(ptr nocapture readonly %src, ptr nocapture %dst) local_unnamed_addr {
> entry:
>   %arrayidxA0 = getelementptr inbounds double, ptr %src, i64 0
>   %A0 = load double, ptr %arrayidxA0, align 1
>   %arrayidxA1 = getelementptr inbounds double, ptr %src, i64 1
>   %A1 = load double, ptr %arrayidxA1, align 1
> 
>   %arrayidx0 = getelementptr inbounds double, ptr %dst, i64 0
>   store double %A0, ptr %arrayidx0, align 16
>   %arrayidx1 = getelementptr inbounds double, ptr %dst, i64 1
>   store double %A1, ptr %arrayidx1, align 16
> 
>   ret ptr %arrayidxA1
> }
> 
> We generate:
>   %arrayidxA0 = getelementptr inbounds double, ptr %src, i64 0
>   %arrayidxA1 = getelementptr inbounds double, ptr %src, i64 1
>   %arrayidx0 = getelementptr inbounds double, ptr %dst, i64 0
>   %0 = load <2 x double>, ptr %arrayidxA0, align 1
>   store <2 x double> %0, ptr %arrayidx0, align 16
>   ret ptr %arrayidxA1
> 
> We do not do the same for gather loads (aka ScatterVectorize) as we indeed vectorize GEPs and then do extracts for external uses.
> 
Yes, for vector loads/store it is so. But what about masked gather? We avoid the cost compensation here and then add the extractelement cost.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140789/new/

https://reviews.llvm.org/D140789