[llvm] [LoopVectorize] Don't discount instructions scalarized due to tail folding (PR #109289)

Mon Dec 9 08:48:14 PST 2024

david-arm wrote:

Hi @john-brawn-arm, thanks for taking the time to look into the
example I mentioned in my previous commit. However, this still 
isn't my preferred approach to solving issue https://github.com/llvm/llvm-project/issues/66652
because it's using a performance-based cost model (that can be
unreliable) to try and solve what to me looks like a functional
defect. Here are two examples of problems with this:

1. As you mentioned yourself, there are still loops where we
vectorise with predicated blocks at -Os, so this patch is
only a partial fix. A more complete solution would involve
dealing with the root case - @fhahn gave a suggestion about
how to do this. Did you get a chance to look at this and see
if it's viable or easy to do?
2. The change in this PR is effectively dealing with two
aspects of tail-folding simultaneously: a) low trip count loops
when building with -O2 or above, and b) normal loops when
building with -Os. The point I was trying to make in my
previous message (and I realise now I may not have made this
very clear) is that the two could potentially be in
conflict with each other. I don't think it's impossible to
come up with a loop where the vectorised version with
predicated blocks is faster than the scalar, whilst
simultaneously being bad for code size. There is a chance
that someone reverts this PR later on for exactly this
reason.

Ideally we'd tackle the code size issue presented in #66652
and the low trip count performance issues with separate PRs,
using solutions that aren't tied to each other. I'd be
interested to know what @fhahn thinks as well here.

https://github.com/llvm/llvm-project/pull/109289