[PATCH] D147720: [LV] Use the known trip count when costing non-tail folded VFs

Sander de Smalen via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Apr 24 07:35:13 PDT 2023

sdesmalen added inline comments.

Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5410-5414
+    auto RTCostB =
+        foldTailByMasking()
+            ? (CostB * divideCeil(MaxTripCount, B.Width.getFixedValue()))
+            : (CostB * (MaxTripCount / B.Width.getFixedValue()) +
+               B.ScalarCost * (MaxTripCount % B.Width.getFixedValue()));
nit: Is it worth using a lambda for this, e.g.

  auto GetCostForTC = [MaxTripCount, this](unsigned VF, InstructionCost VectorCost,
                                           InstructionCost ScalarCost) {
    return foldTailByMasking() ?
      VectorCost * divideCeil(MaxTripCount, VF);
      VectorCost * (MaxTripCount / VF) + ScalarCost * (MaxTripCount % VF);

  auto RTCostA = GetCostForTC(A.Width.getFixedValue(), CostA, A.ScalarCost);
  auto RTCostB = GetCostForTC(B.Width.getFixedValue(), CostB, B.ScalarCost);

Comment at: llvm/test/Transforms/LoopVectorize/AArch64/smallest-and-widest-types.ll:98
   %inc = add nuw nsw i8 %i.08, 1
-  %exitcond.not = icmp eq i8 %inc, 12345
+  %exitcond.not = icmp eq i8 %inc, 241
   br i1 %exitcond.not, label %for.end, label %for.body
I'm curious why this test needed changing. What VF does it pick with 12345?



More information about the llvm-commits mailing list