[PATCH] D128342: [LoopVectorize] Disable tail-folding when masked interleaved accesses are unavailable

Tue Jun 28 08:10:44 PDT 2022

dmgreen added a comment.

In terms of MVE - A VLD2/VLD4 cannot be predicated so in that regards we do not support "MaskedInterleavedAccesses". There is code in canTailPredicateLoop that attempts to get that right. Any other interleaving group width will be emulated with a gather/scatter though, which can happily be masked.
https://godbolt.org/z/KzvEqz439

For SVE my understanding is that LD2/LD3/LD4 can be predicated, and other widths (and current codegen as interleaving is not yet supported) will use gather/scatter which can be masked. In the long run they may have MaskedInterleavedAccesses returning true.

Is the problem that we are trying to use one variable for both Neon and SVE vectorization, where SVE prefers folding the tail, and NEON will need not to?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128342/new/

https://reviews.llvm.org/D128342