[llvm] [LV] Incorporate trip counts into selection of scalable VFs (PR #80926)

Wed Feb 21 08:36:12 PST 2024

================
@@ -4933,11 +4933,17 @@ bool LoopVectorizationPlanner::isMoreProfitable(
       EstimatedWidthB *= *VScale;
   }
 
+  if (MaxTripCount > 0) {
+    EstimatedWidthA = std::min(EstimatedWidthA, MaxTripCount);
----------------
preames wrote:

Thought about this a bit, and I don't think your proposal is the right way to go about this.  If you notice, that code already doesn't kick in (even for fixed length vectors), if we can tail fold the loop.  This seems like the right behavior for the hypothetical case where we only have one (scalable) register class and it is known to exceed trip count, but we have support for masking.

My change is specifically a costing change.  For the hypothetical case above, my change still allows the loop to be vectorized, it just doesn't over count the profit of doing so.  Your proposed change would not since the scalable VF corresponding to our single register class would not be considered legal.  

A worthwhile question is if after this change lands, we can remove the existing TripCount reduction logic from getMaximizedVFForTarget.  Not sure about this, but I think we can for at least the power of two logic.  (i.e. leave the legality check only for the case where we *can't* legally use the sole register class in my example.)

I do think the code you pointed to should be updated to ask TTI for min vscale, but that's basically completely orthogonal.  


https://github.com/llvm/llvm-project/pull/80926