<div dir="ltr">It's an amazingly severe regression, but it's also all due to branch mispredicts (about 3x without this). The code layout looks ok so there's probably something else to deal with. I'm not sure there's anything we can reasonably do so we'll just have to take the hit for now and wait for another code reorganization to make the branch predictor a bit more happy :)<div><br></div><div>Thanks for giving us some time to investigate and feel free to recommit whenever you'd like.</div><div><br></div><div>-eric</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Aug 21, 2020 at 5:33 PM Roman Lebedev via llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>
Author: Roman Lebedev<br>
Date: 2020-08-22T00:33:22+03:00<br>
New Revision: 503deec2183d466dad64b763bab4e15fd8804239<br>
<br>
URL: <a href="https://github.com/llvm/llvm-project/commit/503deec2183d466dad64b763bab4e15fd8804239" rel="noreferrer" target="_blank">https://github.com/llvm/llvm-project/commit/503deec2183d466dad64b763bab4e15fd8804239</a><br>
DIFF: <a href="https://github.com/llvm/llvm-project/commit/503deec2183d466dad64b763bab4e15fd8804239.diff" rel="noreferrer" target="_blank">https://github.com/llvm/llvm-project/commit/503deec2183d466dad64b763bab4e15fd8804239.diff</a><br>
<br>
LOG: Temporairly revert "[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline"<br>
<br>
As disscussed in post-commit review starting with<br>
        <a href="https://reviews.llvm.org/D84108#2227365" rel="noreferrer" target="_blank">https://reviews.llvm.org/D84108#2227365</a><br>
while this appears to be mostly a win overall, especially code-size-wise,<br>
this appears to shake //certain// code pattens in a way that is extremely<br>
unfavorable for performance (+30% runtime regression)<br>
on certain CPU's (i personally can't reproduce).<br>
<br>
So until the behaviour is better understood, and a path forward is mapped,<br>
let's back this out for now.<br>
<br>
This reverts commit 1d51dc38d89bd33fb8874e242ab87b265b4dec1c.<br>
<br>
Added: <br>
<br>
<br>
Modified: <br>
    llvm/include/llvm/Transforms/Utils/SimplifyCFGOptions.h<br>
    llvm/lib/Passes/PassBuilder.cpp<br>
    llvm/lib/Target/AArch64/AArch64TargetMachine.cpp<br>
    llvm/lib/Target/ARM/ARMTargetMachine.cpp<br>
    llvm/lib/Target/Hexagon/HexagonTargetMachine.cpp<br>
    llvm/lib/Transforms/IPO/PassManagerBuilder.cpp<br>
    llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp<br>
    llvm/test/Transforms/PGOProfile/chr.ll<br>
    llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll<br>
    llvm/test/Transforms/SimplifyCFG/common-code-hoisting.ll<br>
<br>
Removed: <br>
<br>
<br>
<br>
################################################################################<br>
diff  --git a/llvm/include/llvm/Transforms/Utils/SimplifyCFGOptions.h b/llvm/include/llvm/Transforms/Utils/SimplifyCFGOptions.h<br>
index fb3a7490346f..46f6ca0462f8 100644<br>
--- a/llvm/include/llvm/Transforms/Utils/SimplifyCFGOptions.h<br>
+++ b/llvm/include/llvm/Transforms/Utils/SimplifyCFGOptions.h<br>
@@ -25,7 +25,7 @@ struct SimplifyCFGOptions {<br>
   bool ForwardSwitchCondToPhi = false;<br>
   bool ConvertSwitchToLookupTable = false;<br>
   bool NeedCanonicalLoop = true;<br>
-  bool HoistCommonInsts = false;<br>
+  bool HoistCommonInsts = true;<br>
   bool SinkCommonInsts = false;<br>
   bool SimplifyCondBranch = true;<br>
   bool FoldTwoEntryPHINode = true;<br>
<br>
diff  --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp<br>
index 010567f1c7cd..23288bb1ac07 100644<br>
--- a/llvm/lib/Passes/PassBuilder.cpp<br>
+++ b/llvm/lib/Passes/PassBuilder.cpp<br>
@@ -1147,14 +1147,11 @@ ModulePassManager PassBuilder::buildModuleOptimizationPipeline(<br>
   // convert to more optimized IR using more aggressive simplify CFG options.<br>
   // The extra sinking transform can create larger basic blocks, so do this<br>
   // before SLP vectorization.<br>
-  // FIXME: study whether hoisting and/or sinking of common instructions should<br>
-  //        be delayed until after SLP vectorizer.<br>
-  OptimizePM.addPass(SimplifyCFGPass(SimplifyCFGOptions()<br>
-                                         .forwardSwitchCondToPhi(true)<br>
-                                         .convertSwitchToLookupTable(true)<br>
-                                         .needCanonicalLoops(false)<br>
-                                         .hoistCommonInsts(true)<br>
-                                         .sinkCommonInsts(true)));<br>
+  OptimizePM.addPass(SimplifyCFGPass(SimplifyCFGOptions().<br>
+                                     forwardSwitchCondToPhi(true).<br>
+                                     convertSwitchToLookupTable(true).<br>
+                                     needCanonicalLoops(false).<br>
+                                     sinkCommonInsts(true)));<br>
<br>
   // Optimize parallel scalar instruction chains into SIMD instructions.<br>
   if (PTO.SLPVectorization)<br>
<br>
diff  --git a/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp b/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp<br>
index 819f7b614106..40d71def6d09 100644<br>
--- a/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp<br>
+++ b/llvm/lib/Target/AArch64/AArch64TargetMachine.cpp<br>
@@ -457,7 +457,6 @@ void AArch64PassConfig::addIRPasses() {<br>
                                             .forwardSwitchCondToPhi(true)<br>
                                             .convertSwitchToLookupTable(true)<br>
                                             .needCanonicalLoops(false)<br>
-                                            .hoistCommonInsts(true)<br>
                                             .sinkCommonInsts(true)));<br>
<br>
   // Run LoopDataPrefetch<br>
<br>
diff  --git a/llvm/lib/Target/ARM/ARMTargetMachine.cpp b/llvm/lib/Target/ARM/ARMTargetMachine.cpp<br>
index 87a106474c5e..b316b1041f2c 100644<br>
--- a/llvm/lib/Target/ARM/ARMTargetMachine.cpp<br>
+++ b/llvm/lib/Target/ARM/ARMTargetMachine.cpp<br>
@@ -409,8 +409,7 @@ void ARMPassConfig::addIRPasses() {<br>
   // ldrex/strex loops to simplify this, but it needs tidying up.<br>
   if (TM->getOptLevel() != CodeGenOpt::None && EnableAtomicTidy)<br>
     addPass(createCFGSimplificationPass(<br>
-        SimplifyCFGOptions().hoistCommonInsts(true).sinkCommonInsts(true),<br>
-        [this](const Function &F) {<br>
+        SimplifyCFGOptions().sinkCommonInsts(true), [this](const Function &F) {<br>
           const auto &ST = this->TM->getSubtarget<ARMSubtarget>(F);<br>
           return ST.hasAnyDataBarrier() && !ST.isThumb1Only();<br>
         }));<br>
<br>
diff  --git a/llvm/lib/Target/Hexagon/HexagonTargetMachine.cpp b/llvm/lib/Target/Hexagon/HexagonTargetMachine.cpp<br>
index 9bcdc89f2956..48b817d1341d 100644<br>
--- a/llvm/lib/Target/Hexagon/HexagonTargetMachine.cpp<br>
+++ b/llvm/lib/Target/Hexagon/HexagonTargetMachine.cpp<br>
@@ -329,7 +329,6 @@ void HexagonPassConfig::addIRPasses() {<br>
                                               .forwardSwitchCondToPhi(true)<br>
                                               .convertSwitchToLookupTable(true)<br>
                                               .needCanonicalLoops(false)<br>
-                                              .hoistCommonInsts(true)<br>
                                               .sinkCommonInsts(true)));<br>
     if (EnableLoopPrefetch)<br>
       addPass(createLoopDataPrefetchPass());<br>
<br>
diff  --git a/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp b/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp<br>
index 22daa8e812f6..c045c277706b 100644<br>
--- a/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp<br>
+++ b/llvm/lib/Transforms/IPO/PassManagerBuilder.cpp<br>
@@ -784,13 +784,10 @@ void PassManagerBuilder::populateModulePassManager(<br>
   // convert to more optimized IR using more aggressive simplify CFG options.<br>
   // The extra sinking transform can create larger basic blocks, so do this<br>
   // before SLP vectorization.<br>
-  // FIXME: study whether hoisting and/or sinking of common instructions should<br>
-  //        be delayed until after SLP vectorizer.<br>
   MPM.add(createCFGSimplificationPass(SimplifyCFGOptions()<br>
                                           .forwardSwitchCondToPhi(true)<br>
                                           .convertSwitchToLookupTable(true)<br>
                                           .needCanonicalLoops(false)<br>
-                                          .hoistCommonInsts(true)<br>
                                           .sinkCommonInsts(true)));<br>
<br>
   if (SLPVectorize) {<br>
<br>
diff  --git a/llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp b/llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp<br>
index b0435bf6e4ea..db5211df397a 100644<br>
--- a/llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp<br>
+++ b/llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp<br>
@@ -63,8 +63,8 @@ static cl::opt<bool> UserForwardSwitchCond(<br>
     cl::desc("Forward switch condition to phi ops (default = false)"));<br>
<br>
 static cl::opt<bool> UserHoistCommonInsts(<br>
-    "hoist-common-insts", cl::Hidden, cl::init(false),<br>
-    cl::desc("hoist common instructions (default = false)"));<br>
+    "hoist-common-insts", cl::Hidden, cl::init(true),<br>
+    cl::desc("hoist common instructions (default = true)"));<br>
<br>
 static cl::opt<bool> UserSinkCommonInsts(<br>
     "sink-common-insts", cl::Hidden, cl::init(false),<br>
<br>
diff  --git a/llvm/test/Transforms/PGOProfile/chr.ll b/llvm/test/Transforms/PGOProfile/chr.ll<br>
index 776501402b19..e2161d617022 100644<br>
--- a/llvm/test/Transforms/PGOProfile/chr.ll<br>
+++ b/llvm/test/Transforms/PGOProfile/chr.ll<br>
@@ -2008,16 +2008,9 @@ define i64 @test_chr_22(i1 %i, i64* %j, i64 %v0) !prof !14 {<br>
 ; CHECK-NEXT:  bb0:<br>
 ; CHECK-NEXT:    [[REASS_ADD:%.*]] = shl i64 [[V0:%.*]], 1<br>
 ; CHECK-NEXT:    [[V2:%.*]] = add i64 [[REASS_ADD]], 3<br>
-; CHECK-NEXT:    [[C1:%.*]] = icmp slt i64 [[V2]], 100<br>
-; CHECK-NEXT:    br i1 [[C1]], label [[BB0_SPLIT:%.*]], label [[BB0_SPLIT_NONCHR:%.*]], !prof !15<br>
-; CHECK:       bb0.split:<br>
 ; CHECK-NEXT:    [[V299:%.*]] = mul i64 [[V2]], 7860086430977039991<br>
 ; CHECK-NEXT:    store i64 [[V299]], i64* [[J:%.*]], align 4<br>
 ; CHECK-NEXT:    ret i64 99<br>
-; CHECK:       bb0.split.nonchr:<br>
-; CHECK-NEXT:    [[V299_NONCHR:%.*]] = mul i64 [[V2]], 7860086430977039991<br>
-; CHECK-NEXT:    store i64 [[V299_NONCHR]], i64* [[J]], align 4<br>
-; CHECK-NEXT:    ret i64 99<br>
 ;<br>
 bb0:<br>
   %v1 = add i64 %v0, 3<br>
<br>
diff  --git a/llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll b/llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll<br>
index 314af1c14145..1d8cce6879e9 100644<br>
--- a/llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll<br>
+++ b/llvm/test/Transforms/PhaseOrdering/loop-rotation-vs-common-code-hoisting.ll<br>
@@ -5,11 +5,14 @@<br>
 ; RUN: opt -O3 -rotation-max-header-size=1 -S < %s                    | FileCheck %s --check-prefixes=HOIST,THR1,FALLBACK2<br>
 ; RUN: opt -passes='default<O3>' -rotation-max-header-size=1 -S < %s  | FileCheck %s --check-prefixes=HOIST,THR1,FALLBACK3<br>
<br>
-; RUN: opt -O3 -rotation-max-header-size=2 -S < %s                    | FileCheck %s --check-prefixes=ROTATED_LATER,ROTATED_LATER_OLDPM,FALLBACK4<br>
-; RUN: opt -passes='default<O3>' -rotation-max-header-size=2 -S < %s  | FileCheck %s --check-prefixes=ROTATED_LATER,ROTATED_LATER_NEWPM,FALLBACK5<br>
+; RUN: opt -O3 -rotation-max-header-size=2 -S < %s                    | FileCheck %s --check-prefixes=HOIST,THR2,FALLBACK4<br>
+; RUN: opt -passes='default<O3>' -rotation-max-header-size=2 -S < %s  | FileCheck %s --check-prefixes=HOIST,THR2,FALLBACK5<br>
<br>
-; RUN: opt -O3 -rotation-max-header-size=3 -S < %s                    | FileCheck %s --check-prefixes=ROTATE,ROTATE_OLDPM,FALLBACK6<br>
-; RUN: opt -passes='default<O3>' -rotation-max-header-size=3 -S < %s  | FileCheck %s --check-prefixes=ROTATE,ROTATE_NEWPM,FALLBACK7<br>
+; RUN: opt -O3 -rotation-max-header-size=3 -S < %s                    | FileCheck %s --check-prefixes=ROTATED_LATER,ROTATED_LATER_OLDPM,FALLBACK6<br>
+; RUN: opt -passes='default<O3>' -rotation-max-header-size=3 -S < %s  | FileCheck %s --check-prefixes=ROTATED_LATER,ROTATED_LATER_NEWPM,FALLBACK7<br>
+<br>
+; RUN: opt -O3 -rotation-max-header-size=4 -S < %s                    | FileCheck %s --check-prefixes=ROTATE,ROTATE_OLDPM,FALLBACK8<br>
+; RUN: opt -passes='default<O3>' -rotation-max-header-size=4 -S < %s  | FileCheck %s --check-prefixes=ROTATE,ROTATE_NEWPM,FALLBACK9<br>
<br>
 ; This example is produced from a very basic C code:<br>
 ;<br>
@@ -58,8 +61,8 @@ define void @_Z4loopi(i32 %width) {<br>
 ; HOIST-NEXT:    br label [[FOR_COND:%.*]]<br>
 ; HOIST:       for.cond:<br>
 ; HOIST-NEXT:    [[I_0:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY:%.*]] ], [ 0, [[FOR_COND_PREHEADER]] ]<br>
-; HOIST-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i32 [[I_0]], [[TMP0]]<br>
 ; HOIST-NEXT:    tail call void @f0()<br>
+; HOIST-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i32 [[I_0]], [[TMP0]]<br>
 ; HOIST-NEXT:    br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY]]<br>
 ; HOIST:       for.cond.cleanup:<br>
 ; HOIST-NEXT:    tail call void @f2()<br>
@@ -77,17 +80,17 @@ define void @_Z4loopi(i32 %width) {<br>
 ; ROTATED_LATER_OLDPM-NEXT:    br i1 [[CMP]], label [[RETURN:%.*]], label [[FOR_COND_PREHEADER:%.*]]<br>
 ; ROTATED_LATER_OLDPM:       for.cond.preheader:<br>
 ; ROTATED_LATER_OLDPM-NEXT:    [[TMP0:%.*]] = add nsw i32 [[WIDTH]], -1<br>
+; ROTATED_LATER_OLDPM-NEXT:    tail call void @f0()<br>
 ; ROTATED_LATER_OLDPM-NEXT:    [[EXITCOND_NOT3:%.*]] = icmp eq i32 [[TMP0]], 0<br>
 ; ROTATED_LATER_OLDPM-NEXT:    br i1 [[EXITCOND_NOT3]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY:%.*]]<br>
 ; ROTATED_LATER_OLDPM:       for.cond.cleanup:<br>
-; ROTATED_LATER_OLDPM-NEXT:    tail call void @f0()<br>
 ; ROTATED_LATER_OLDPM-NEXT:    tail call void @f2()<br>
 ; ROTATED_LATER_OLDPM-NEXT:    br label [[RETURN]]<br>
 ; ROTATED_LATER_OLDPM:       for.body:<br>
 ; ROTATED_LATER_OLDPM-NEXT:    [[I_04:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY]] ], [ 0, [[FOR_COND_PREHEADER]] ]<br>
-; ROTATED_LATER_OLDPM-NEXT:    tail call void @f0()<br>
 ; ROTATED_LATER_OLDPM-NEXT:    tail call void @f1()<br>
 ; ROTATED_LATER_OLDPM-NEXT:    [[INC]] = add nuw i32 [[I_04]], 1<br>
+; ROTATED_LATER_OLDPM-NEXT:    tail call void @f0()<br>
 ; ROTATED_LATER_OLDPM-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i32 [[INC]], [[TMP0]]<br>
 ; ROTATED_LATER_OLDPM-NEXT:    br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]]<br>
 ; ROTATED_LATER_OLDPM:       return:<br>
@@ -99,19 +102,19 @@ define void @_Z4loopi(i32 %width) {<br>
 ; ROTATED_LATER_NEWPM-NEXT:    br i1 [[CMP]], label [[RETURN:%.*]], label [[FOR_COND_PREHEADER:%.*]]<br>
 ; ROTATED_LATER_NEWPM:       for.cond.preheader:<br>
 ; ROTATED_LATER_NEWPM-NEXT:    [[TMP0:%.*]] = add nsw i32 [[WIDTH]], -1<br>
+; ROTATED_LATER_NEWPM-NEXT:    tail call void @f0()<br>
 ; ROTATED_LATER_NEWPM-NEXT:    [[EXITCOND_NOT3:%.*]] = icmp eq i32 [[TMP0]], 0<br>
 ; ROTATED_LATER_NEWPM-NEXT:    br i1 [[EXITCOND_NOT3]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_COND_PREHEADER_FOR_BODY_CRIT_EDGE:%.*]]<br>
 ; ROTATED_LATER_NEWPM:       for.cond.preheader.for.body_crit_edge:<br>
 ; ROTATED_LATER_NEWPM-NEXT:    [[INC_1:%.*]] = add nuw i32 0, 1<br>
 ; ROTATED_LATER_NEWPM-NEXT:    br label [[FOR_BODY:%.*]]<br>
 ; ROTATED_LATER_NEWPM:       for.cond.cleanup:<br>
-; ROTATED_LATER_NEWPM-NEXT:    tail call void @f0()<br>
 ; ROTATED_LATER_NEWPM-NEXT:    tail call void @f2()<br>
 ; ROTATED_LATER_NEWPM-NEXT:    br label [[RETURN]]<br>
 ; ROTATED_LATER_NEWPM:       for.body:<br>
 ; ROTATED_LATER_NEWPM-NEXT:    [[INC_PHI:%.*]] = phi i32 [ [[INC_0:%.*]], [[FOR_BODY_FOR_BODY_CRIT_EDGE:%.*]] ], [ [[INC_1]], [[FOR_COND_PREHEADER_FOR_BODY_CRIT_EDGE]] ]<br>
-; ROTATED_LATER_NEWPM-NEXT:    tail call void @f0()<br>
 ; ROTATED_LATER_NEWPM-NEXT:    tail call void @f1()<br>
+; ROTATED_LATER_NEWPM-NEXT:    tail call void @f0()<br>
 ; ROTATED_LATER_NEWPM-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i32 [[INC_PHI]], [[TMP0]]<br>
 ; ROTATED_LATER_NEWPM-NEXT:    br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY_FOR_BODY_CRIT_EDGE]]<br>
 ; ROTATED_LATER_NEWPM:       for.body.for.body_crit_edge:<br>
@@ -126,19 +129,19 @@ define void @_Z4loopi(i32 %width) {<br>
 ; ROTATE_OLDPM-NEXT:    br i1 [[CMP]], label [[RETURN:%.*]], label [[FOR_COND_PREHEADER:%.*]]<br>
 ; ROTATE_OLDPM:       for.cond.preheader:<br>
 ; ROTATE_OLDPM-NEXT:    [[CMP13_NOT:%.*]] = icmp eq i32 [[WIDTH]], 1<br>
+; ROTATE_OLDPM-NEXT:    tail call void @f0()<br>
 ; ROTATE_OLDPM-NEXT:    br i1 [[CMP13_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY_PREHEADER:%.*]]<br>
 ; ROTATE_OLDPM:       for.body.preheader:<br>
 ; ROTATE_OLDPM-NEXT:    [[TMP0:%.*]] = add nsw i32 [[WIDTH]], -1<br>
 ; ROTATE_OLDPM-NEXT:    br label [[FOR_BODY:%.*]]<br>
 ; ROTATE_OLDPM:       for.cond.cleanup:<br>
-; ROTATE_OLDPM-NEXT:    tail call void @f0()<br>
 ; ROTATE_OLDPM-NEXT:    tail call void @f2()<br>
 ; ROTATE_OLDPM-NEXT:    br label [[RETURN]]<br>
 ; ROTATE_OLDPM:       for.body:<br>
 ; ROTATE_OLDPM-NEXT:    [[I_04:%.*]] = phi i32 [ [[INC:%.*]], [[FOR_BODY]] ], [ 0, [[FOR_BODY_PREHEADER]] ]<br>
-; ROTATE_OLDPM-NEXT:    tail call void @f0()<br>
 ; ROTATE_OLDPM-NEXT:    tail call void @f1()<br>
 ; ROTATE_OLDPM-NEXT:    [[INC]] = add nuw nsw i32 [[I_04]], 1<br>
+; ROTATE_OLDPM-NEXT:    tail call void @f0()<br>
 ; ROTATE_OLDPM-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i32 [[INC]], [[TMP0]]<br>
 ; ROTATE_OLDPM-NEXT:    br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY]]<br>
 ; ROTATE_OLDPM:       return:<br>
@@ -150,19 +153,19 @@ define void @_Z4loopi(i32 %width) {<br>
 ; ROTATE_NEWPM-NEXT:    br i1 [[CMP]], label [[RETURN:%.*]], label [[FOR_COND_PREHEADER:%.*]]<br>
 ; ROTATE_NEWPM:       for.cond.preheader:<br>
 ; ROTATE_NEWPM-NEXT:    [[CMP13_NOT:%.*]] = icmp eq i32 [[WIDTH]], 1<br>
+; ROTATE_NEWPM-NEXT:    tail call void @f0()<br>
 ; ROTATE_NEWPM-NEXT:    br i1 [[CMP13_NOT]], label [[FOR_COND_CLEANUP:%.*]], label [[FOR_BODY_PREHEADER:%.*]]<br>
 ; ROTATE_NEWPM:       for.body.preheader:<br>
 ; ROTATE_NEWPM-NEXT:    [[TMP0:%.*]] = add nsw i32 [[WIDTH]], -1<br>
 ; ROTATE_NEWPM-NEXT:    [[INC_1:%.*]] = add nuw nsw i32 0, 1<br>
 ; ROTATE_NEWPM-NEXT:    br label [[FOR_BODY:%.*]]<br>
 ; ROTATE_NEWPM:       for.cond.cleanup:<br>
-; ROTATE_NEWPM-NEXT:    tail call void @f0()<br>
 ; ROTATE_NEWPM-NEXT:    tail call void @f2()<br>
 ; ROTATE_NEWPM-NEXT:    br label [[RETURN]]<br>
 ; ROTATE_NEWPM:       for.body:<br>
 ; ROTATE_NEWPM-NEXT:    [[INC_PHI:%.*]] = phi i32 [ [[INC_0:%.*]], [[FOR_BODY_FOR_BODY_CRIT_EDGE:%.*]] ], [ [[INC_1]], [[FOR_BODY_PREHEADER]] ]<br>
-; ROTATE_NEWPM-NEXT:    tail call void @f0()<br>
 ; ROTATE_NEWPM-NEXT:    tail call void @f1()<br>
+; ROTATE_NEWPM-NEXT:    tail call void @f0()<br>
 ; ROTATE_NEWPM-NEXT:    [[EXITCOND_NOT:%.*]] = icmp eq i32 [[INC_PHI]], [[TMP0]]<br>
 ; ROTATE_NEWPM-NEXT:    br i1 [[EXITCOND_NOT]], label [[FOR_COND_CLEANUP]], label [[FOR_BODY_FOR_BODY_CRIT_EDGE]]<br>
 ; ROTATE_NEWPM:       for.body.for.body_crit_edge:<br>
<br>
diff  --git a/llvm/test/Transforms/SimplifyCFG/common-code-hoisting.ll b/llvm/test/Transforms/SimplifyCFG/common-code-hoisting.ll<br>
index 37cbc4640e41..b58017ba7ef0 100644<br>
--- a/llvm/test/Transforms/SimplifyCFG/common-code-hoisting.ll<br>
+++ b/llvm/test/Transforms/SimplifyCFG/common-code-hoisting.ll<br>
@@ -1,7 +1,7 @@<br>
 ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py<br>
 ; RUN: opt -simplifycfg -hoist-common-insts=1 -S < %s                    | FileCheck %s --check-prefixes=HOIST<br>
 ; RUN: opt -simplifycfg -hoist-common-insts=0 -S < %s                    | FileCheck %s --check-prefixes=NOHOIST<br>
-; RUN: opt -simplifycfg                       -S < %s                    | FileCheck %s --check-prefixes=NOHOIST,DEFAULT<br>
+; RUN: opt -simplifycfg                       -S < %s                    | FileCheck %s --check-prefixes=HOIST,DEFAULT<br>
<br>
 ; This example is produced from a very basic C code:<br>
 ;<br>
<br>
<br>
<br>
_______________________________________________<br>
llvm-commits mailing list<br>
<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a><br>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits</a><br>
</blockquote></div>