[PATCH] D142109: [LoopVectorize] Remove runtime check and scalar tail loop when tail-folding.

Wed Jan 25 01:45:39 PST 2023

david-arm added a comment.

This is a nice patch @sdesmalen! It's great to remove the overflow checks and scalar loop - a double win for performance and code size. :) I just had a comment about ways we can avoid overflow checks completely for low trip counts.

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8747
+
+    auto *TCMinusVF =
+        new VPInstruction(VPInstruction::CalculateTripCountMinusVF, {TC}, DL);
----------------
I wonder if it's possible to be avoid this extra work in the preheader if we know that we can never overflow? For example, I'm thinking of a loop like this:

  void foo(int* __restrict__ dst, int* __restrict__ src) {
    for (long i = 0; i < 7; i++) {
      dst[i] = 1000 / src[i];
    }
  }

where we will vectorise with tail-folding (assuming the unroller hasn't already destroyed the vectorisation opportunity!). For vscale=2 this is actually just a single iteration, so the additional work in the preheader becomes more significant.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142109/new/

https://reviews.llvm.org/D142109