[all-commits] [llvm/llvm-project] bdaf16: [LoopVectorize] Refine runtime memory check costs ...

Mon Jan 29 15:05:57 PST 2024

  Branch: refs/heads/release/18.x
  Home:   https://github.com/llvm/llvm-project
  Commit: bdaf16d59f4a64529371cbe056245f6cc035d7cf
      https://github.com/llvm/llvm-project/commit/bdaf16d59f4a64529371cbe056245f6cc035d7cf
  Author: David Sherwood <57997763+david-arm at users.noreply.github.com>
  Date:   2024-01-29 (Mon, 29 Jan 2024)

  Changed paths:
    M llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
    A llvm/test/Transforms/LoopVectorize/AArch64/low_trip_memcheck_cost.ll

  Log Message:
  -----------
  [LoopVectorize] Refine runtime memory check costs when there is an outer loop (#76034)

When we generate runtime memory checks for an inner loop it's
possible that these checks are invariant in the outer loop and
so will get hoisted out. In such cases, the effective cost of
the checks should reduce to reflect the outer loop trip count.

This fixes a 25% performance regression introduced by commit

49b0e6dcc296792b577ae8f0f674e61a0929b99d

when building the SPEC2017 x264 benchmark with PGO, where we
decided the inner loop trip count wasn't high enough to warrant
the (incorrect) high cost of the runtime checks. Also, when
runtime memory checks consist entirely of diff checks these are
likely to be outer loop invariant.

(cherry picked from commit 962fbafecf4730ba84a3b9fd7a662a5c30bb2c7c)