[PATCH] D30247: Epilog loop vectorization

Tue Jan 29 09:08:09 PST 2019

rengolin added a comment.

In D30247#1375520 <https://reviews.llvm.org/D30247#1375520>, @Ayal wrote:

> BTW, targets that have efficient masked operations may also find it useful for cycles and size to vectorize (including epilogs) under -Os, and following r345705 possibly also with enable-masked-interleaved-mem-accesses turned on.

I agree, but I think this approach is a tad too heavy for masked operations.

In SVE, the masking will happen naturally and the last loop will just have the remaining elements as a consequence of the ISA design.

In non-scalable vector extensions, that's not true, so it would need some emulation. But I wouldn't make it such a special case and with so many checks.

I imagine that, if the vectorisation of the full vector length was legal, then so is the case for a smaller length. It may not be profitable on its own, but the fact that it continues the existing patterns (before moving results into scalar registers) will probably make it cheaper than a scalar tail.

This could either be some arithmetic on the mask computation (which would occur on every iteration of the loop + 1) or a tail loop with the same mask computation duplicated from the one just vectorised. I imagine -O3 would have different trade-offs than -Os, so we could potentially have both solutions.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D30247/new/

https://reviews.llvm.org/D30247