[llvm] [LoopUnroll] Penalize interior control flow (PR #67137)

via llvm-commits llvm-commits at lists.llvm.org
Fri Sep 22 06:43:03 PDT 2023


llvmbot wrote:


<!--LLVM PR SUMMARY COMMENT-->

@llvm/pr-subscribers-llvm-transforms

<details>
<summary>Changes</summary>

This patch adds a small cost penalty for each (non-trivial, non-exiting, non-latch) branch in the loop.

At a high level, I believe that fully unrolling loops with interior control flow tends to be less profitable, because we don't get straight-line (or extended-BB) code out of it, so it is less amenable to further optimization. It also dilutes branch predictors.

The specific motivation for this is twofold:
 * This avoids an undesirable unroll in x264 that led to the revert of https://reviews.llvm.org/D156532. While the loop vectorizer can handle interior control flow, the SLP vectorizer can't (and likely won't be able to in any foreseeable future).
 * While looking into https://github.com/llvm/llvm-project/issues/57865, I found a case where we produce a quadratic unroll result, because we first unroll an outer loop and then after simplification unroll (N copies of) the inner loop as well, resulting in huge IR output. The penalty applied here does not prevent this in general, but does disincentivize it. (We don't want to forbid unrolling outer loops entirely, it is necessary for vectorization in some cases.)

The penalty of 2 here is chosen somewhat arbitrarily as the lowest value that fixes both issues. I think the outer loop case may want an even larger penalty, but this seems like a conservative starting point.

---

Patch is 51.16 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/67137.diff


4 Files Affected:

- (modified) llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp (+14) 
- (added) llvm/test/Transforms/LoopUnroll/quadratic-unroll.ll (+340) 
- (modified) llvm/test/Transforms/LoopUnroll/runtime-loop-multiexit-dom-verify.ll (+1-1) 
- (modified) llvm/test/Transforms/PhaseOrdering/AArch64/quant_4x4.ll (+16-428) 


``````````diff
diff --git a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
index 335b489d3cb25ba..41c82783a7a4d79 100644
--- a/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
+++ b/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
@@ -173,6 +173,10 @@ static cl::opt<unsigned>
                            cl::desc("Default threshold (max size of unrolled "
                                     "loop), used in all but O3 optimizations"));
 
+static cl::opt<unsigned> UnrollInteriorControlFlowPenalty(
+    "unroll-interior-control-flow-penalty", cl::init(2), cl::Hidden,
+    cl::desc("Penalty for non-exiting branches in the loop"));
+
 /// A magic value for use with the Threshold parameter to indicate
 /// that the loop unroll should be performed regardless of how much
 /// code expansion would result.
@@ -676,6 +680,16 @@ InstructionCost llvm::ApproximateLoopSize(
 
   InstructionCost LoopSize = Metrics.NumInsts;
 
+  // Add a penalty for interior control flow (excluding exits and latches).
+  // Unrolling such loops is less profitable, as it does not result in
+  // straight-line code (or extended basic blocks if multiple exits). This
+  // also disincentivizes unrolling outer loops, which may result in large
+  // size increases if the inner loop is also unrolled later.
+  for (BasicBlock *BB : L->blocks())
+    if (!BB->getSingleSuccessor() && !L->isLoopExiting(BB) &&
+        !L->isLoopLatch(BB))
+      LoopSize += UnrollInteriorControlFlowPenalty;
+
   // Don't allow an estimate of size zero.  This would allows unrolling of loops
   // with huge iteration counts, which is a compile time problem even if it's
   // not a problem for code quality. Also, the code using this size may assume
diff --git a/llvm/test/Transforms/LoopUnroll/quadratic-unroll.ll b/llvm/test/Transforms/LoopUnroll/quadratic-unroll.ll
new file mode 100644
index 000000000000000..bfd88a12c9ca034
--- /dev/null
+++ b/llvm/test/Transforms/LoopUnroll/quadratic-unroll.ll
@@ -0,0 +1,340 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 3
+; RUN: opt -S -passes='loop(loop-unroll-full),simplifycfg,loop(loop-unroll-full)' -unroll-threshold=300 < %s | FileCheck %s
+
+; Check that we don't produce quadratic output by unrolling both the inner
+; and outer loop.
+
+ at a = global i32 0, align 4
+ at d = global ptr null, align 8
+ at e = external global ptr
+
+define i8 @f(ptr %p) {
+; CHECK-LABEL: define i8 @f(
+; CHECK-SAME: ptr [[P:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    br label [[FOR_COND1_PREHEADER:%.*]]
+; CHECK:       for.cond1.preheader:
+; CHECK-NEXT:    [[H_017:%.*]] = phi i32 [ 0, [[ENTRY:%.*]] ], [ [[INC12:%.*]], [[FOR_INC11:%.*]] ]
+; CHECK-NEXT:    br label [[FOR_BODY4:%.*]]
+; CHECK:       for.body4:
+; CHECK-NEXT:    [[V:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT:%.*]] = icmp eq i32 [[V]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT]], label [[FOR_END:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_1:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_1:%.*]] = icmp eq i32 [[V_1]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_1]], label [[FOR_END_1:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.1:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_2:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_2:%.*]] = icmp eq i32 [[V_2]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_2]], label [[FOR_END_2:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.2:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_3:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_3:%.*]] = icmp eq i32 [[V_3]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_3]], label [[FOR_END_3:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.3:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_4:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_4:%.*]] = icmp eq i32 [[V_4]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_4]], label [[FOR_END_4:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.4:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_5:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_5:%.*]] = icmp eq i32 [[V_5]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_5]], label [[FOR_END_5:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.5:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_6:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_6:%.*]] = icmp eq i32 [[V_6]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_6]], label [[FOR_END_6:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.6:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_7:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_7:%.*]] = icmp eq i32 [[V_7]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_7]], label [[FOR_END_7:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.7:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_8:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_8:%.*]] = icmp eq i32 [[V_8]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_8]], label [[FOR_END_8:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.8:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_9:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_9:%.*]] = icmp eq i32 [[V_9]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_9]], label [[FOR_END_9:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.9:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_10:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_10:%.*]] = icmp eq i32 [[V_10]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_10]], label [[FOR_END_10:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.10:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_11:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_11:%.*]] = icmp eq i32 [[V_11]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_11]], label [[FOR_END_11:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.11:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_12:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_12:%.*]] = icmp eq i32 [[V_12]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_12]], label [[FOR_END_12:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.12:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_13:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_13:%.*]] = icmp eq i32 [[V_13]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_13]], label [[FOR_END_13:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.13:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_14:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_14:%.*]] = icmp eq i32 [[V_14]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_14]], label [[FOR_END_14:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.14:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_15:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_15:%.*]] = icmp eq i32 [[V_15]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_15]], label [[FOR_END_15:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.15:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_16:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_16:%.*]] = icmp eq i32 [[V_16]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_16]], label [[FOR_END_16:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.16:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_17:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_17:%.*]] = icmp eq i32 [[V_17]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_17]], label [[FOR_END_17:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.17:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_18:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_18:%.*]] = icmp eq i32 [[V_18]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_18]], label [[FOR_END_18:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.18:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_19:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_19:%.*]] = icmp eq i32 [[V_19]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_19]], label [[FOR_END_19:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.19:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_20:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_20:%.*]] = icmp eq i32 [[V_20]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_20]], label [[FOR_END_20:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.20:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_21:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_21:%.*]] = icmp eq i32 [[V_21]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_21]], label [[FOR_END_21:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.21:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_22:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_22:%.*]] = icmp eq i32 [[V_22]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_22]], label [[FOR_END_22:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.22:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_23:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_23:%.*]] = icmp eq i32 [[V_23]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_23]], label [[FOR_END_23:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.23:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_24:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_24:%.*]] = icmp eq i32 [[V_24]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_24]], label [[FOR_END_24:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.24:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_25:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_25:%.*]] = icmp eq i32 [[V_25]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_25]], label [[FOR_END_25:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.25:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_26:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_26:%.*]] = icmp eq i32 [[V_26]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_26]], label [[FOR_END_26:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.26:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_27:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_27:%.*]] = icmp eq i32 [[V_27]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_27]], label [[FOR_END_27:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.27:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_28:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_28:%.*]] = icmp eq i32 [[V_28]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_28]], label [[FOR_END_28:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.28:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_29:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_29:%.*]] = icmp eq i32 [[V_29]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_29]], label [[FOR_END_29:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.29:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_30:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_30:%.*]] = icmp eq i32 [[V_30]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_30]], label [[FOR_END_30:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.30:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_31:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_31:%.*]] = icmp eq i32 [[V_31]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_31]], label [[FOR_END_31:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.31:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_32:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_32:%.*]] = icmp eq i32 [[V_32]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_32]], label [[FOR_END_32:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.32:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_33:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_33:%.*]] = icmp eq i32 [[V_33]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_33]], label [[FOR_END_33:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.33:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_34:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_34:%.*]] = icmp eq i32 [[V_34]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_34]], label [[FOR_END_34:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.34:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_35:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_35:%.*]] = icmp eq i32 [[V_35]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_35]], label [[FOR_END_35:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.35:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_36:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_36:%.*]] = icmp eq i32 [[V_36]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_36]], label [[FOR_END_36:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.36:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_37:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_37:%.*]] = icmp eq i32 [[V_37]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_37]], label [[FOR_END_37:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.37:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_38:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_38:%.*]] = icmp eq i32 [[V_38]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_38]], label [[FOR_END_38:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.38:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_39:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_39:%.*]] = icmp eq i32 [[V_39]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_39]], label [[FOR_END_39:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.39:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_40:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_40:%.*]] = icmp eq i32 [[V_40]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_40]], label [[FOR_END_40:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.40:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_41:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_41:%.*]] = icmp eq i32 [[V_41]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_41]], label [[FOR_END_41:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.41:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_42:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_42:%.*]] = icmp eq i32 [[V_42]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_42]], label [[FOR_END_42:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.42:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_43:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_43:%.*]] = icmp eq i32 [[V_43]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_43]], label [[FOR_END_43:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.43:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_44:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_44:%.*]] = icmp eq i32 [[V_44]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_44]], label [[FOR_END_44:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.44:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_45:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_45:%.*]] = icmp eq i32 [[V_45]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_45]], label [[FOR_END_45:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.45:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    [[V_46:%.*]] = load i32, ptr [[P]], align 4
+; CHECK-NEXT:    [[TOBOOL_NOT_46:%.*]] = icmp eq i32 [[V_46]], 0
+; CHECK-NEXT:    br i1 [[TOBOOL_NOT_46]], label [[FOR_END_46:%.*]], label [[FOR_INC11]]
+; CHECK:       for.end.46:
+; CHECK-NEXT:    store i32 0, ptr @a, align 4
+; CHECK-NEXT:    store ptr @e, ptr @d, align 8
+; CHECK-NEXT:    br label [[FOR_INC11]]
+; CHECK:       for.inc11:
+; CHECK-NEXT:    [[INC12]] = add nuw nsw i32 [[H_017]], 1
+; CHECK-NEXT:    [[CMP:%.*]] = icmp ult i32 [[H_017]], 21
+; CHECK-NEXT:    br i1 [[CMP]], label [[FOR_COND1_PREHEADER]], label [[FOR_END13:%.*]]
+; CHECK:       for.end13:
+; CHECK-NEXT:    ret i8 undef
+;
+entry:
+  br label %for.cond1.preheader
+
+for.cond1.preheader:                              ; preds = %entry, %for.inc11
+  %h.017 = phi i32 [ 0, %entry ], [ %inc12, %for.inc11 ]
+  br label %for.body4
+
+for.body4:                                        ; preds = %for.cond1.preheader, %for.end
+  %g.016 = phi i16 [ -22, %for.cond1.preheader ], [ %inc9, %for.end ]
+  %v = load i32, ptr %p, align 4
+  %tobool.not = icmp eq i32 %v, 0
+  br i1 %tobool.not, label %for.cond5.preheader, label %for.inc11
+
+for.cond5.preheader:                              ; preds = %for.body4
+  br label %for.end
+
+for.end:                                          ; preds = %for.cond5.preheader
+  store i32 0, ptr @a, align 4
+  store ptr @e, ptr @d, align 8
+  %inc9 = add i16 %g.016, 1
+  %cmp2 = icmp slt i16 %inc9, 25
+  br i1 %cmp2, label %for.body4, label %for.inc11
+
+for.inc11:                                        ; preds = %for.end, %for.body4
+  %inc12 = add nuw nsw i32 %h.017, 1
+  %cmp = icmp ult i32 %h.017, 21
+  br i1 %cmp, label %for.cond1.preheader, label %for.end13
+
+for.end13:                                        ; preds = %for.inc11
+  ret i8 undef
+}
diff --git a/llvm/test/Transforms/LoopUnroll/runtime-loop-multiexit-dom-verify.ll b/llvm/test/Transforms/LoopUnroll/runtime-loop-multiexit-dom-verify.ll
index de548523134568f..7027cac9c9ea355 100644
--- a/llvm/test/Transforms/LoopUnroll/runtime-loop-multiexit-dom-verify.ll...
[truncated]

``````````

</details>


https://github.com/llvm/llvm-project/pull/67137


More information about the llvm-commits mailing list