[llvm] [LV] Change loops' interleave count computation (PR #73766)

Wed Dec 6 19:03:37 PST 2023

================
@@ -5737,10 +5741,15 @@ LoopVectorizationCostModel::selectInterleaveCount(ElementCount VF,
   // the InterleaveCount as if vscale is '1', although if some information about
   // the vector is known (e.g. min vector size), we can make a better decision.
   if (BestKnownTC) {
-    MaxInterleaveCount =
-        std::min(*BestKnownTC / VF.getKnownMinValue(), MaxInterleaveCount);
-    // Make sure MaxInterleaveCount is greater than 0.
-    MaxInterleaveCount = std::max(1u, MaxInterleaveCount);
+    if (InterleaveSmallLoopScalarReduction ||
+        (*BestKnownTC % VF.getKnownMinValue() == 0))
+      MaxInterleaveCount =
+          std::min(*BestKnownTC / VF.getKnownMinValue(), MaxInterleaveCount);
+    else
+      MaxInterleaveCount = std::min(*BestKnownTC / (VF.getKnownMinValue() * 2),
+                                    MaxInterleaveCount);
+    // Make sure MaxInterleaveCount is greater than 0 & a power of 2.
+    MaxInterleaveCount = llvm::bit_floor(std::max(1u, MaxInterleaveCount));
----------------
nilanjana87 wrote:

Thank you for the suggestion & point taken. I changed the computation like you mentioned, where I select the greater IC when the remainder tail TC are same. This solves the `VF: 16, TC range: 33 to 47` case, where it seems more efficient to use IC 2 over IC 1 for the same residual loop TC. Whereas, for `VF: 16, TC range: 48 to 63` case IC 1 is chosen since it will minimize the residual loop TC. However, these changes are done only when the exact TC is known, while we use the conservative IC of (TC / VF*2) for estimated TCs.

https://github.com/llvm/llvm-project/pull/73766