[llvm] [LV] Change loops' interleave count computation (PR #73766)
Nilanjana Basu via llvm-commits
llvm-commits at lists.llvm.org
Wed Dec 6 19:03:37 PST 2023
================
@@ -5737,10 +5741,15 @@ LoopVectorizationCostModel::selectInterleaveCount(ElementCount VF,
// the InterleaveCount as if vscale is '1', although if some information about
// the vector is known (e.g. min vector size), we can make a better decision.
if (BestKnownTC) {
- MaxInterleaveCount =
- std::min(*BestKnownTC / VF.getKnownMinValue(), MaxInterleaveCount);
- // Make sure MaxInterleaveCount is greater than 0.
- MaxInterleaveCount = std::max(1u, MaxInterleaveCount);
+ if (InterleaveSmallLoopScalarReduction ||
+ (*BestKnownTC % VF.getKnownMinValue() == 0))
+ MaxInterleaveCount =
+ std::min(*BestKnownTC / VF.getKnownMinValue(), MaxInterleaveCount);
+ else
+ MaxInterleaveCount = std::min(*BestKnownTC / (VF.getKnownMinValue() * 2),
+ MaxInterleaveCount);
+ // Make sure MaxInterleaveCount is greater than 0 & a power of 2.
+ MaxInterleaveCount = llvm::bit_floor(std::max(1u, MaxInterleaveCount));
----------------
nilanjana87 wrote:
Thank you for the suggestion & point taken. I changed the computation like you mentioned, where I select the greater IC when the remainder tail TC are same. This solves the `VF: 16, TC range: 33 to 47` case, where it seems more efficient to use IC 2 over IC 1 for the same residual loop TC. Whereas, for `VF: 16, TC range: 48 to 63` case IC 1 is chosen since it will minimize the residual loop TC. However, these changes are done only when the exact TC is known, while we use the conservative IC of (TC / VF*2) for estimated TCs.
https://github.com/llvm/llvm-project/pull/73766
More information about the llvm-commits
mailing list