[PATCH] D147720: [LV] Use the known trip count when costing non-tail folded VFs

Dave Green via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Apr 24 07:38:50 PDT 2023


dmgreen added inline comments.


================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5410-5414
+    auto RTCostB =
+        foldTailByMasking()
+            ? (CostB * divideCeil(MaxTripCount, B.Width.getFixedValue()))
+            : (CostB * (MaxTripCount / B.Width.getFixedValue()) +
+               B.ScalarCost * (MaxTripCount % B.Width.getFixedValue()));
----------------
sdesmalen wrote:
> nit: Is it worth using a lambda for this, e.g.
> 
>   auto GetCostForTC = [MaxTripCount, this](unsigned VF, InstructionCost VectorCost,
>                                            InstructionCost ScalarCost) {
>     return foldTailByMasking() ?
>       VectorCost * divideCeil(MaxTripCount, VF);
>       VectorCost * (MaxTripCount / VF) + ScalarCost * (MaxTripCount % VF);
>   };
> 
>   auto RTCostA = GetCostForTC(A.Width.getFixedValue(), CostA, A.ScalarCost);
>   auto RTCostB = GetCostForTC(B.Width.getFixedValue(), CostB, B.ScalarCost);
Sounds good I'll do that now.


================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/smallest-and-widest-types.ll:98
   %inc = add nuw nsw i8 %i.08, 1
-  %exitcond.not = icmp eq i8 %inc, 12345
+  %exitcond.not = icmp eq i8 %inc, 241
   br i1 %exitcond.not, label %for.end, label %for.body
----------------
sdesmalen wrote:
> I'm curious why this test needed changing. What VF does it pick with 12345?
12345 as a i8 is really 57, which was too many scalar iterations to pick v16 over v8. It is `3*vf16 + 9*vf1` vs `7*vf8 + 1*vf1`.

I've changed it to 241 so that it keeps testing the same thing. And is a i8 value.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147720/new/

https://reviews.llvm.org/D147720



More information about the llvm-commits mailing list