[PATCH] D147720: [LV] Use the known trip count when costing non-tail folded VFs
Dave Green via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Apr 24 07:38:50 PDT 2023
dmgreen added inline comments.
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5410-5414
+ auto RTCostB =
+ foldTailByMasking()
+ ? (CostB * divideCeil(MaxTripCount, B.Width.getFixedValue()))
+ : (CostB * (MaxTripCount / B.Width.getFixedValue()) +
+ B.ScalarCost * (MaxTripCount % B.Width.getFixedValue()));
----------------
sdesmalen wrote:
> nit: Is it worth using a lambda for this, e.g.
>
> auto GetCostForTC = [MaxTripCount, this](unsigned VF, InstructionCost VectorCost,
> InstructionCost ScalarCost) {
> return foldTailByMasking() ?
> VectorCost * divideCeil(MaxTripCount, VF);
> VectorCost * (MaxTripCount / VF) + ScalarCost * (MaxTripCount % VF);
> };
>
> auto RTCostA = GetCostForTC(A.Width.getFixedValue(), CostA, A.ScalarCost);
> auto RTCostB = GetCostForTC(B.Width.getFixedValue(), CostB, B.ScalarCost);
Sounds good I'll do that now.
================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/smallest-and-widest-types.ll:98
%inc = add nuw nsw i8 %i.08, 1
- %exitcond.not = icmp eq i8 %inc, 12345
+ %exitcond.not = icmp eq i8 %inc, 241
br i1 %exitcond.not, label %for.end, label %for.body
----------------
sdesmalen wrote:
> I'm curious why this test needed changing. What VF does it pick with 12345?
12345 as a i8 is really 57, which was too many scalar iterations to pick v16 over v8. It is `3*vf16 + 9*vf1` vs `7*vf8 + 1*vf1`.
I've changed it to 241 so that it keeps testing the same thing. And is a i8 value.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D147720/new/
https://reviews.llvm.org/D147720
More information about the llvm-commits
mailing list