[PATCH] D128342: [LoopVectorize] Disable tail-folding when masked interleaved accesses are unavailable

Tue Jun 28 05:14:05 PDT 2022

david-arm added inline comments.

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-interleave.ll:1
+; RUN: opt -loop-vectorize -S -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue \
+; RUN:   -debug < %s 2>%t | FileCheck %s
----------------
fhahn wrote:
> sdesmalen wrote:
> > Why is this loop still vectorized with a VF of 8?
> This needs `REQUIRES: asserts` I think, as it uses `-debug`, same for the other test. Could. you precommit the tests, so only the impact of the patch is shown in this diff?
Hi @sdesmalen, this is actually what we want to happen because this is a loop with interleaved memory accesses where NEON can do this very efficiently. It's actually a VF of 4, but we have the <8 x float> and <12 x float> types in the loop due to deinterleaving loads and interleaving stores, respectively. What this patch is doing is instructing the vectoriser to disable tail folding because performance is likely to be terrible, and instead fall back on normal unpredicated vector loops that don't require masked interleaved accesses.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128342/new/

https://reviews.llvm.org/D128342