[PATCH] D121899: [LoopVectorize] Optimise away the icmp when tail-folding for some low trip counts

David Sherwood via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Mar 17 03:38:03 PDT 2022


david-arm created this revision.
david-arm added reviewers: sdesmalen, kmclaughlin, frasercrmck, dmgreen.
Herald added subscribers: pengfei, rogfer01, hiraditya.
Herald added a project: All.
david-arm requested review of this revision.
Herald added subscribers: llvm-commits, vkmr.
Herald added a project: LLVM.

For low trip counts the vectoriser will attempt to create a single
predicated loop that folds the scalar tail into the vector body. For
some combinations of the trip count and the VF it is possible to
determine at compile time if there will only be a single vector
iteration. If so, we can avoid creating the comparison at the end of
the loop and just always branch to the loop exit. This improves the
code quality for smaller loops with low trip counts because the
compare + branch add a relatively high cost to the loop.

This optimisation may also apply for unpredicated vector loops with
low trip counts too, hence the change in test X86/pr42674.ll.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D121899

Files:
  llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
  llvm/lib/Transforms/Vectorize/VPlan.cpp
  llvm/test/Transforms/LoopVectorize/AArch64/sve-low-trip-count.ll
  llvm/test/Transforms/LoopVectorize/X86/constant-fold.ll
  llvm/test/Transforms/LoopVectorize/X86/outer_loop_test1_no_explicit_vect_width.ll
  llvm/test/Transforms/LoopVectorize/X86/pr34438.ll
  llvm/test/Transforms/LoopVectorize/X86/pr42674.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D121899.416124.patch
Type: text/x-patch
Size: 8601 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20220317/fa3b5b72/attachment-0001.bin>


More information about the llvm-commits mailing list