[llvm] [LV] optimize VF for low TC, when tail-folding (PR #91253)

Fri May 24 06:29:58 PDT 2024

artagnon wrote:

> I'll try and add a test case. There are some examples in https://godbolt.org/z/3aa81Yaqj of things that I thought might be more expensive for smaller vector lengths, even if the costs for X86 don't always show it (to be fair I think that might be more about the loads/stores than the instructions between in that case). Those are unpredicated, and some of the codegen could definitely be better, but if you imagine with predicated loads/stores too small vector could be difficult to codegen. MVE certainly has some peculiarities with it being low-power, but I don't think it is especially different other than it is a heavy use of (tail) predicated vectorization. Short vectors in general do not always get looked at as much as longer ones as they usually come up less.

Okay, I'll wait for your test case then.

> My understanding is that the code in
> 
> https://github.com/llvm/llvm-project/blob/a38f0157f2a9efcae13b691c63723426e8adc0ee/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp#L4885
> 
> handles the costs, and will account for the tripcount so long it is known (and the vectors are fixed-width). Perhaps that could be extended to scalable vectors in some way too?

Thanks for the suggestion! See #93300.

https://github.com/llvm/llvm-project/pull/91253