[llvm] 90d09eb - [LoopPeel] Allow peeling with multiple unreachable-terminated exit blocks.

Florian Hahn via llvm-commits llvm-commits at lists.llvm.org
Wed Aug 25 05:27:49 PDT 2021


Author: Florian Hahn
Date: 2021-08-25T13:26:40+01:00
New Revision: 90d09eb300dbd68099715a5f70e804225adfa471

URL: https://github.com/llvm/llvm-project/commit/90d09eb300dbd68099715a5f70e804225adfa471
DIFF: https://github.com/llvm/llvm-project/commit/90d09eb300dbd68099715a5f70e804225adfa471.diff

LOG: [LoopPeel] Allow peeling with multiple unreachable-terminated exit blocks.

Support for peeling with multiple exit blocks was added in D63921/77bb3a486fa6.

So far it has only been enabled for loops where all non-latch exits are
'de-optimizing' exits (D63923). But peeling of multi-exit loops can be
highly beneficial in other cases too, like if all non-latch exiting
blocks are unreachable.

The motivating case are loops with runtime checks, like the C++ example
below. The main issue preventing vectorization is that the invariant
accesses to load the bounds of B is conditionally executed in the loop
and cannot be hoisted out. If we peel off the first iteration, they
become dereferenceable in the loop, because they must execute before the
loop is executed, as all non-latch exits are terminated with
unreachable. This subsequently allows hoisting the loads and runtime
checks out of the loop, allowing vectorization of the loop.

     int sum(std::vector<int> *A, std::vector<int> *B, int N) {
       int cost = 0;
       for (int i = 0; i < N; ++i)
         cost += A->at(i) + B->at(i);
       return cost;
     }

This gives a ~20-30% increase of score for Geekbench5/HDR on AArch64.

Note that this requires a follow-up improvement to the peeling cost
model to actually peel iterations off loops as above. I will share that
shortly.

Also, peeling of multi-exits might be beneficial for exit blocks with
other terminators, but I would like to keep the scope limited to known
high-reward cases for now.

I removed the option to disable peeling for multi-deopt exits because
the code is more general now. Alternatively, the option could also be
generalized, but I am not sure if there's much value in the option?

Reviewed By: reames

Differential Revision: https://reviews.llvm.org/D108108

Added: 
    

Modified: 
    llvm/lib/Transforms/Utils/LoopPeel.cpp
    llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt-idom-2.ll
    llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt-idom.ll
    llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt.ll
    llvm/test/Transforms/LoopUnroll/peel-multiple-unreachable-exits.ll

Removed: 
    


################################################################################
diff  --git a/llvm/lib/Transforms/Utils/LoopPeel.cpp b/llvm/lib/Transforms/Utils/LoopPeel.cpp
index cd1f6f0c78a51..a6cdce1f4b8f0 100644
--- a/llvm/lib/Transforms/Utils/LoopPeel.cpp
+++ b/llvm/lib/Transforms/Utils/LoopPeel.cpp
@@ -73,10 +73,6 @@ static cl::opt<unsigned> UnrollForcePeelCount(
     "unroll-force-peel-count", cl::init(0), cl::Hidden,
     cl::desc("Force a peel count regardless of profiling information."));
 
-static cl::opt<bool> UnrollPeelMultiDeoptExit(
-    "unroll-peel-multi-deopt-exit", cl::init(true), cl::Hidden,
-    cl::desc("Allow peeling of loops with multiple deopt exits."));
-
 static const char *PeeledCountMetaData = "llvm.loop.peeled.count";
 
 // Designates that a Phi is estimated to become invariant after an "infinite"
@@ -91,39 +87,31 @@ bool llvm::canPeel(Loop *L) {
   if (!L->isLoopSimplifyForm())
     return false;
 
-  if (UnrollPeelMultiDeoptExit) {
-    SmallVector<BasicBlock *, 4> Exits;
-    L->getUniqueNonLatchExitBlocks(Exits);
-
-    if (!Exits.empty()) {
-      // Latch's terminator is a conditional branch, Latch is exiting and
-      // all non Latch exits ends up with deoptimize.
-      const BasicBlock *Latch = L->getLoopLatch();
-      const BranchInst *T = dyn_cast<BranchInst>(Latch->getTerminator());
-      return T && T->isConditional() && L->isLoopExiting(Latch) &&
-             all_of(Exits, [](const BasicBlock *BB) {
-               return BB->getTerminatingDeoptimizeCall();
-             });
-    }
-  }
-
-  // Only peel loops that contain a single exit
-  if (!L->getExitingBlock() || !L->getUniqueExitBlock())
-    return false;
-
   // Don't try to peel loops where the latch is not the exiting block.
   // This can be an indication of two 
diff erent things:
   // 1) The loop is not rotated.
   // 2) The loop contains irreducible control flow that involves the latch.
   const BasicBlock *Latch = L->getLoopLatch();
-  if (Latch != L->getExitingBlock())
+  if (!L->isLoopExiting(Latch))
     return false;
 
   // Peeling is only supported if the latch is a branch.
   if (!isa<BranchInst>(Latch->getTerminator()))
     return false;
 
-  return true;
+  SmallVector<BasicBlock *, 4> Exits;
+  L->getUniqueNonLatchExitBlocks(Exits);
+  // The latch must either be the only exiting block or all non-latch exit
+  // blocks have either a deopt or unreachable terminator. Both deopt and
+  // unreachable terminators are a strong indication they are not taken. Note
+  // that this is a profitability check, not a legality check. Also note that
+  // LoopPeeling currently can only update the branch weights of latch blocks
+  // and branch weights to blocks with deopt or unreachable do not need
+  // updating.
+  return all_of(Exits, [](const BasicBlock *BB) {
+    return BB->getTerminatingDeoptimizeCall() ||
+           isa<UnreachableInst>(BB->getTerminator());
+  });
 }
 
 // This function calculates the number of iterations after which the given Phi

diff  --git a/llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt-idom-2.ll b/llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt-idom-2.ll
index 29134c0311aad..2d3636d553350 100644
--- a/llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt-idom-2.ll
+++ b/llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt-idom-2.ll
@@ -1,6 +1,6 @@
 ; REQUIRES: asserts
-; RUN: opt < %s -S -debug-only=loop-unroll -loop-unroll -unroll-runtime -unroll-peel-multi-deopt-exit 2>&1 | FileCheck %s
-; RUN: opt < %s -S -debug-only=loop-unroll -unroll-peel-multi-deopt-exit -passes='require<profile-summary>,function(require<opt-remark-emit>,loop-unroll)' 2>&1 | FileCheck %s
+; RUN: opt < %s -S -debug-only=loop-unroll -loop-unroll -unroll-runtime 2>&1 | FileCheck %s
+; RUN: opt < %s -S -debug-only=loop-unroll -passes='require<profile-summary>,function(require<opt-remark-emit>,loop-unroll)' 2>&1 | FileCheck %s
 
 ; Regression test for setting the correct idom for exit blocks.
 

diff  --git a/llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt-idom.ll b/llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt-idom.ll
index 184a07151dfa5..f24da112adb0b 100644
--- a/llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt-idom.ll
+++ b/llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt-idom.ll
@@ -1,6 +1,6 @@
 ; REQUIRES: asserts
-; RUN: opt < %s -S -debug-only=loop-unroll -loop-unroll -unroll-runtime -unroll-peel-multi-deopt-exit 2>&1 | FileCheck %s
-; RUN: opt < %s -S -debug-only=loop-unroll -unroll-peel-multi-deopt-exit -passes='require<profile-summary>,function(require<opt-remark-emit>,loop-unroll)' 2>&1 | FileCheck %s
+; RUN: opt < %s -S -debug-only=loop-unroll -loop-unroll -unroll-runtime 2>&1 | FileCheck %s
+; RUN: opt < %s -S -debug-only=loop-unroll -passes='require<profile-summary>,function(require<opt-remark-emit>,loop-unroll)' 2>&1 | FileCheck %s
 
 ; Regression test for setting the correct idom for exit blocks.
 

diff  --git a/llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt.ll b/llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt.ll
index 205a2832bb7ab..dd3eda2f643fb 100644
--- a/llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt.ll
+++ b/llvm/test/Transforms/LoopUnroll/peel-loop-pgo-deopt.ll
@@ -1,7 +1,7 @@
 ; REQUIRES: asserts
-; RUN: opt < %s -S -debug-only=loop-unroll -loop-unroll -unroll-runtime -unroll-peel-multi-deopt-exit 2>&1 | FileCheck %s
-; RUN: opt < %s -S -debug-only=loop-unroll -unroll-peel-multi-deopt-exit -passes='require<profile-summary>,function(require<opt-remark-emit>,loop-unroll)' 2>&1 | FileCheck %s
-; RUN: opt < %s -S -debug-only=loop-unroll -unroll-peel-multi-deopt-exit -passes='require<profile-summary>,function(require<opt-remark-emit>,loop-unroll<no-profile-peeling>)' 2>&1 | FileCheck %s --check-prefixes=CHECK-NO-PEEL
+; RUN: opt < %s -S -debug-only=loop-unroll -loop-unroll -unroll-runtime 2>&1 | FileCheck %s
+; RUN: opt < %s -S -debug-only=loop-unroll -passes='require<profile-summary>,function(require<opt-remark-emit>,loop-unroll)' 2>&1 | FileCheck %s
+; RUN: opt < %s -S -debug-only=loop-unroll -passes='require<profile-summary>,function(require<opt-remark-emit>,loop-unroll<no-profile-peeling>)' 2>&1 | FileCheck %s --check-prefixes=CHECK-NO-PEEL
 
 ; Make sure we use the profile information correctly to peel-off 3 iterations
 ; from the loop, and update the branch weights for the peeled loop properly.

diff  --git a/llvm/test/Transforms/LoopUnroll/peel-multiple-unreachable-exits.ll b/llvm/test/Transforms/LoopUnroll/peel-multiple-unreachable-exits.ll
index 435a8015010de..8e4054885beba 100644
--- a/llvm/test/Transforms/LoopUnroll/peel-multiple-unreachable-exits.ll
+++ b/llvm/test/Transforms/LoopUnroll/peel-multiple-unreachable-exits.ll
@@ -6,25 +6,51 @@ declare void @foo()
 define void @peel_unreachable_exit_and_latch_exit(i32* %ptr, i32 %N, i32 %x) {
 ; CHECK-LABEL: @peel_unreachable_exit_and_latch_exit(
 ; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[LOOP_HEADER_PEEL_BEGIN:%.*]]
+; CHECK:       loop.header.peel.begin:
+; CHECK-NEXT:    br label [[LOOP_HEADER_PEEL:%.*]]
+; CHECK:       loop.header.peel:
+; CHECK-NEXT:    [[C_PEEL:%.*]] = icmp ult i32 1, 2
+; CHECK-NEXT:    br i1 [[C_PEEL]], label [[THEN_PEEL:%.*]], label [[ELSE_PEEL:%.*]]
+; CHECK:       else.peel:
+; CHECK-NEXT:    [[C_2_PEEL:%.*]] = icmp eq i32 1, [[X:%.*]]
+; CHECK-NEXT:    br i1 [[C_2_PEEL]], label [[UNREACHABLE_EXIT:%.*]], label [[LOOP_LATCH_PEEL:%.*]]
+; CHECK:       then.peel:
+; CHECK-NEXT:    br label [[LOOP_LATCH_PEEL]]
+; CHECK:       loop.latch.peel:
+; CHECK-NEXT:    [[M_PEEL:%.*]] = phi i32 [ 0, [[THEN_PEEL]] ], [ [[X]], [[ELSE_PEEL]] ]
+; CHECK-NEXT:    [[GEP_PEEL:%.*]] = getelementptr i32, i32* [[PTR:%.*]], i32 1
+; CHECK-NEXT:    store i32 [[M_PEEL]], i32* [[GEP_PEEL]], align 4
+; CHECK-NEXT:    [[IV_NEXT_PEEL:%.*]] = add nuw nsw i32 1, 1
+; CHECK-NEXT:    [[C_3_PEEL:%.*]] = icmp ult i32 1, 1000
+; CHECK-NEXT:    br i1 [[C_3_PEEL]], label [[LOOP_HEADER_PEEL_NEXT:%.*]], label [[EXIT:%.*]]
+; CHECK:       loop.header.peel.next:
+; CHECK-NEXT:    br label [[LOOP_HEADER_PEEL_NEXT1:%.*]]
+; CHECK:       loop.header.peel.next1:
+; CHECK-NEXT:    br label [[ENTRY_PEEL_NEWPH:%.*]]
+; CHECK:       entry.peel.newph:
 ; CHECK-NEXT:    br label [[LOOP_HEADER:%.*]]
 ; CHECK:       loop.header:
-; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ 1, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP_LATCH:%.*]] ]
-; CHECK-NEXT:    [[C:%.*]] = icmp ult i32 [[IV]], 2
-; CHECK-NEXT:    br i1 [[C]], label [[THEN:%.*]], label [[ELSE:%.*]]
+; CHECK-NEXT:    [[IV:%.*]] = phi i32 [ [[IV_NEXT_PEEL]], [[ENTRY_PEEL_NEWPH]] ], [ [[IV_NEXT:%.*]], [[LOOP_LATCH:%.*]] ]
+; CHECK-NEXT:    br i1 false, label [[THEN:%.*]], label [[ELSE:%.*]]
 ; CHECK:       then:
 ; CHECK-NEXT:    br label [[LOOP_LATCH]]
 ; CHECK:       else:
-; CHECK-NEXT:    [[C_2:%.*]] = icmp eq i32 [[IV]], [[X:%.*]]
-; CHECK-NEXT:    br i1 [[C_2]], label [[UNREACHABLE_EXIT:%.*]], label [[LOOP_LATCH]]
+; CHECK-NEXT:    [[C_2:%.*]] = icmp eq i32 [[IV]], [[X]]
+; CHECK-NEXT:    br i1 [[C_2]], label [[UNREACHABLE_EXIT_LOOPEXIT:%.*]], label [[LOOP_LATCH]]
 ; CHECK:       loop.latch:
 ; CHECK-NEXT:    [[M:%.*]] = phi i32 [ 0, [[THEN]] ], [ [[X]], [[ELSE]] ]
-; CHECK-NEXT:    [[GEP:%.*]] = getelementptr i32, i32* [[PTR:%.*]], i32 [[IV]]
+; CHECK-NEXT:    [[GEP:%.*]] = getelementptr i32, i32* [[PTR]], i32 [[IV]]
 ; CHECK-NEXT:    store i32 [[M]], i32* [[GEP]], align 4
 ; CHECK-NEXT:    [[IV_NEXT]] = add nuw nsw i32 [[IV]], 1
 ; CHECK-NEXT:    [[C_3:%.*]] = icmp ult i32 [[IV]], 1000
-; CHECK-NEXT:    br i1 [[C_3]], label [[LOOP_HEADER]], label [[EXIT:%.*]]
+; CHECK-NEXT:    br i1 [[C_3]], label [[LOOP_HEADER]], label [[EXIT_LOOPEXIT:%.*]], !llvm.loop [[LOOP0:![0-9]+]]
+; CHECK:       exit.loopexit:
+; CHECK-NEXT:    br label [[EXIT]]
 ; CHECK:       exit:
 ; CHECK-NEXT:    ret void
+; CHECK:       unreachable.exit.loopexit:
+; CHECK-NEXT:    br label [[UNREACHABLE_EXIT]]
 ; CHECK:       unreachable.exit:
 ; CHECK-NEXT:    call void @foo()
 ; CHECK-NEXT:    unreachable


        


More information about the llvm-commits mailing list