[llvm] [LV] optimize VF for low TC, when tail-folding (PR #91253)

Wed May 22 10:35:54 PDT 2024

artagnon wrote:

> Hi. One quick question - If there is a target that can vectorize naturally at, say v16i8, but has more trouble vectorizing at v8i8 or v4i8, then why should the vectorizer not be considering larger vector factors? So long as it's a single iteration, could the higher factor not be lower cost?
> 
> I doubt it should matter a lot, but MVE has 128bit vector lengths and smaller factors can be less natural for it. Does the cost models not already handle picking the best factor?

VPlan already picks the best VF based on CostModel. By rule of thumb, a larger VF is more expensive than a smaller one. There are several reasons why a larger VF could be cheaper than a smaller one: it depends on the target and the instruction itself. Here's an example from the tree:

```llvm
; CHECK-NEXT:  Cost Model: Found an estimated cost of 2 for instruction: %v4i64 = call <4 x i64> @llvm.bswap.v4i64(<4 x i64> undef)
; CHECK-NEXT:  Cost Model: Found an estimated cost of 1 for instruction: %v3i32 = call <3 x i32> @llvm.bswap.v3i32(<3 x i32> undef)
```

Now, the patch itself is to `getMaximizedVFForTarget`, which by `computeFeasibleMaxVF`, which in turn gets called in two cases:
1. We need to generate a scalar epilogue.
2. We need to fold the tail by masking.

The function of `getMaximizedVFForTarget` is to pick a VF that is no larger than necessary, to fit the elements. At the end of the function, all cost modeling decisions are invalidated. When the trip count is known, and is less than or equal to the size of the widest register with the minimum element count, returning a VF that doesn't take the trip count into account is wasteful, bordering on a logical error.

In the case of a scalar epilogue, we correctly take the `bit_floor` of the TC. However, due to different people fixing different things in the code historically, we don't take the TC into account when we're masking by tail folding. This patch fixes that.

https://github.com/llvm/llvm-project/pull/91253