[PATCH] D128342: [LoopVectorize] Disable tail-folding when masked interleaved accesses are unavailable
David Sherwood via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jun 28 05:14:05 PDT 2022
david-arm added inline comments.
================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-interleave.ll:1
+; RUN: opt -loop-vectorize -S -prefer-predicate-over-epilogue=predicate-else-scalar-epilogue \
+; RUN: -debug < %s 2>%t | FileCheck %s
----------------
fhahn wrote:
> sdesmalen wrote:
> > Why is this loop still vectorized with a VF of 8?
> This needs `REQUIRES: asserts` I think, as it uses `-debug`, same for the other test. Could. you precommit the tests, so only the impact of the patch is shown in this diff?
Hi @sdesmalen, this is actually what we want to happen because this is a loop with interleaved memory accesses where NEON can do this very efficiently. It's actually a VF of 4, but we have the <8 x float> and <12 x float> types in the loop due to deinterleaving loads and interleaving stores, respectively. What this patch is doing is instructing the vectoriser to disable tail folding because performance is likely to be terrible, and instead fall back on normal unpredicated vector loops that don't require masked interleaved accesses.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D128342/new/
https://reviews.llvm.org/D128342
More information about the llvm-commits
mailing list