[all-commits] [llvm/llvm-project] 90d09e: [LoopPeel] Allow peeling with multiple unreachable...

Wed Aug 25 05:27:58 PDT 2021

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 90d09eb300dbd68099715a5f70e804225adfa471
      https://github.com/llvm/llvm-project/commit/90d09eb300dbd68099715a5f70e804225adfa471
  Author: Florian Hahn <flo at fhahn.com>
  Date:   2021-08-25 (Wed, 25 Aug 2021)

  Changed paths:
    M llvm/lib/Transforms/Utils/LoopPeel.cpp
    M llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt-idom-2.ll
    M llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt-idom.ll
    M llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt.ll
    M llvm/test/Transforms/LoopUnroll/peel-multiple-unreachable-exits.ll

  Log Message:
  -----------
  [LoopPeel] Allow peeling with multiple unreachable-terminated exit blocks.

Support for peeling with multiple exit blocks was added in D63921/77bb3a486fa6.

So far it has only been enabled for loops where all non-latch exits are
'de-optimizing' exits (D63923). But peeling of multi-exit loops can be
highly beneficial in other cases too, like if all non-latch exiting
blocks are unreachable.

The motivating case are loops with runtime checks, like the C++ example
below. The main issue preventing vectorization is that the invariant
accesses to load the bounds of B is conditionally executed in the loop
and cannot be hoisted out. If we peel off the first iteration, they
become dereferenceable in the loop, because they must execute before the
loop is executed, as all non-latch exits are terminated with
unreachable. This subsequently allows hoisting the loads and runtime
checks out of the loop, allowing vectorization of the loop.

     int sum(std::vector<int> *A, std::vector<int> *B, int N) {
       int cost = 0;
       for (int i = 0; i < N; ++i)
         cost += A->at(i) + B->at(i);
       return cost;
     }

This gives a ~20-30% increase of score for Geekbench5/HDR on AArch64.

Note that this requires a follow-up improvement to the peeling cost
model to actually peel iterations off loops as above. I will share that
shortly.

Also, peeling of multi-exits might be beneficial for exit blocks with
other terminators, but I would like to keep the scope limited to known
high-reward cases for now.

I removed the option to disable peeling for multi-deopt exits because
the code is more general now. Alternatively, the option could also be
generalized, but I am not sure if there's much value in the option?

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D108108