lukel97 wrote: Do we still want to vectorize small trip counts with EVL tail folding? I can also queue up a performance run for this on the BPI-F3 (without tail folding) https://github.com/llvm/llvm-project/pull/132176