<div dir="ltr">This lead to segfaults when compiling for PPC. I have reverted this in r296295 and will forward reproduction instructions to you.</div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Feb 24, 2017 at 7:41 PM, Michael Kuperstein via llvm-commits <span dir="ltr"><<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Author: mkuper<br>
Date: Fri Feb 24 12:41:32 2017<br>
New Revision: 296149<br>
<br>
URL: <a href="http://llvm.org/viewvc/llvm-project?rev=296149&view=rev" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project?rev=296149&view=rev</a><br>
Log:<br>
[CGP] Split some critical edges coming out of indirect branches<br>
<br>
Splitting critical edges when one of the source edges is an indirectbr<br>
is hard in general (because it requires changing the memory the indirectbr<br>
reads). But if a block only has a single indirectbr predecessor (which is<br>
the common case), we can simulate splitting that edge by splitting<br>
the destination block, and retargeting the *direct* branches.<br>
<br>
This is motivated by the use of computed gotos in python 2.7: PyEval_EvalFrame()<br>
ends up using an indirect branch with ~100 successors, and passing a constant to<br>
each of those. Since MachineSink can't break indirect critical edges on demand<br>
(and doing this in MIR doesn't look feasible), this causes us to emit about ~100<br>
defs of registers containing constants, which we in the predecessor block, where<br>
only one of those constants is used in each successor. So, at each computed goto,<br>
we needlessly spill about a 100 constants to stack. The end result is that a<br>
clang-compiled python interpreter can be about ~2.5x slower on a simple python<br>
reduction loop than a gcc-compiled interpreter.<br>
<br>
Differential Revision: <a href="https://reviews.llvm.org/D29916" rel="noreferrer" target="_blank">https://reviews.llvm.org/<wbr>D29916</a><br>
<br>
Added:<br>
    llvm/trunk/test/Transforms/<wbr>CodeGenPrepare/X86/<wbr>computedgoto.ll<br>
      - copied unchanged from r296063, llvm/trunk/test/Transforms/<wbr>CodeGenPrepare/computedgoto.ll<br>
Modified:<br>
    llvm/trunk/lib/CodeGen/<wbr>CodeGenPrepare.cpp<br>
    llvm/trunk/test/CodeGen/ARM/<wbr>indirectbr.ll<br>
    llvm/trunk/test/CodeGen/<wbr>MSP430/indirectbr2.ll<br>
    llvm/trunk/test/CodeGen/<wbr>PowerPC/indirectbr.ll<br>
<br>
Modified: llvm/trunk/lib/CodeGen/<wbr>CodeGenPrepare.cpp<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp?rev=296149&r1=296148&r2=296149&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project/llvm/trunk/lib/<wbr>CodeGen/CodeGenPrepare.cpp?<wbr>rev=296149&r1=296148&r2=<wbr>296149&view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/lib/CodeGen/<wbr>CodeGenPrepare.cpp (original)<br>
+++ llvm/trunk/lib/CodeGen/<wbr>CodeGenPrepare.cpp Fri Feb 24 12:41:32 2017<br>
@@ -15,10 +15,12 @@<br>
<br>
 #include "llvm/CodeGen/Passes.h"<br>
 #include "llvm/ADT/DenseMap.h"<br>
+#include "llvm/ADT/SetVector.h"<br>
 #include "llvm/ADT/SmallSet.h"<br>
 #include "llvm/ADT/Statistic.h"<br>
 #include "llvm/Analysis/<wbr>BlockFrequencyInfo.h"<br>
 #include "llvm/Analysis/<wbr>BranchProbabilityInfo.h"<br>
+#include "llvm/Analysis/CFG.h"<br>
 #include "llvm/Analysis/<wbr>InstructionSimplify.h"<br>
 #include "llvm/Analysis/LoopInfo.h"<br>
 #include "llvm/Analysis/<wbr>ProfileSummaryInfo.h"<br>
@@ -53,8 +55,10 @@<br>
 #include "llvm/Transforms/Utils/<wbr>BasicBlockUtils.h"<br>
 #include "llvm/Transforms/Utils/<wbr>BuildLibCalls.h"<br>
 #include "llvm/Transforms/Utils/<wbr>BypassSlowDivision.h"<br>
+#include "llvm/Transforms/Utils/<wbr>Cloning.h"<br>
 #include "llvm/Transforms/Utils/Local.<wbr>h"<br>
 #include "llvm/Transforms/Utils/<wbr>SimplifyLibCalls.h"<br>
+#include "llvm/Transforms/Utils/<wbr>ValueMapper.h"<br>
 using namespace llvm;<br>
 using namespace llvm::PatternMatch;<br>
<br>
@@ -222,6 +226,7 @@ class TypePromotionTransaction;<br>
                         unsigned CreatedInstCost);<br>
     bool splitBranchCondition(Function &F);<br>
     bool simplifyOffsetableRelocate(<wbr>Instruction &I);<br>
+    bool splitIndirectCriticalEdges(<wbr>Function &F);<br>
   };<br>
 }<br>
<br>
@@ -296,6 +301,10 @@ bool CodeGenPrepare::runOnFunction(<wbr>Funct<br>
   if (!DisableBranchOpts)<br>
     EverMadeChange |= splitBranchCondition(F);<br>
<br>
+  // Split some critical edges where one of the sources is an indirect branch,<br>
+  // to help generate sane code for PHIs involving such edges.<br>
+  EverMadeChange |= splitIndirectCriticalEdges(F);<br>
+<br>
   bool MadeChange = true;<br>
   while (MadeChange) {<br>
     MadeChange = false;<br>
@@ -429,6 +438,152 @@ BasicBlock *CodeGenPrepare::findDestBloc<br>
   return DestBB;<br>
 }<br>
<br>
+// Return the unique indirectbr predecessor of a block. This may return null<br>
+// even if such a predecessor exists, if it's not useful for splitting.<br>
+// If a predecessor is found, OtherPreds will contain all other (non-indirectbr)<br>
+// predecessors of BB.<br>
+static BasicBlock *<br>
+findIBRPredecessor(BasicBlock *BB, SmallVectorImpl<BasicBlock *> &OtherPreds) {<br>
+  // If the block doesn't have any PHIs, we don't care about it, since there's<br>
+  // no point in splitting it.<br>
+  PHINode *PN = dyn_cast<PHINode>(BB->begin())<wbr>;<br>
+  if (!PN)<br>
+    return nullptr;<br>
+<br>
+  // Verify we have exactly one IBR predecessor.<br>
+  // Conservatively bail out if one of the other predecessors is not a "regular"<br>
+  // terminator (that is, not a switch or a br).<br>
+  BasicBlock *IBB = nullptr;<br>
+  for (unsigned Pred = 0, E = PN->getNumIncomingValues(); Pred != E; ++Pred) {<br>
+    BasicBlock *PredBB = PN->getIncomingBlock(Pred);<br>
+    TerminatorInst *PredTerm = PredBB->getTerminator();<br>
+    switch (PredTerm->getOpcode()) {<br>
+    case Instruction::IndirectBr:<br>
+      if (IBB)<br>
+        return nullptr;<br>
+      IBB = PredBB;<br>
+      break;<br>
+    case Instruction::Br:<br>
+    case Instruction::Switch:<br>
+      OtherPreds.push_back(PredBB);<br>
+      continue;<br>
+    default:<br>
+      return nullptr;<br>
+    }<br>
+  }<br>
+<br>
+  return IBB;<br>
+}<br>
+<br>
+// Split critical edges where the source of the edge is an indirectbr<br>
+// instruction. This isn't always possible, but we can handle some easy cases.<br>
+// This is useful because MI is unable to split such critical edges,<br>
+// which means it will not be able to sink instructions along those edges.<br>
+// This is especially painful for indirect branches with many successors, where<br>
+// we end up having to prepare all outgoing values in the origin block.<br>
+//<br>
+// Our normal algorithm for splitting critical edges requires us to update<br>
+// the outgoing edges of the edge origin block, but for an indirectbr this<br>
+// is hard, since it would require finding and updating the block addresses<br>
+// the indirect branch uses. But if a block only has a single indirectbr<br>
+// predecessor, with the others being regular branches, we can do it in a<br>
+// different way.<br>
+// Say we have A -> D, B -> D, I -> D where only I -> D is an indirectbr.<br>
+// We can split D into D0 and D1, where D0 contains only the PHIs from D,<br>
+// and D1 is the D block body. We can then duplicate D0 as D0A and D0B, and<br>
+// create the following structure:<br>
+// A -> D0A, B -> D0A, I -> D0B, D0A -> D1, D0B -> D1<br>
+bool CodeGenPrepare::<wbr>splitIndirectCriticalEdges(<wbr>Function &F) {<br>
+  // Check whether the function has any indirectbrs, and collect which blocks<br>
+  // they may jump to. Since most functions don't have indirect branches,<br>
+  // this lowers the common case's overhead to O(Blocks) instead of O(Edges).<br>
+  SmallSetVector<BasicBlock *, 16> Targets;<br>
+  for (auto &BB : F) {<br>
+    auto *IBI = dyn_cast<IndirectBrInst>(BB.<wbr>getTerminator());<br>
+    if (!IBI)<br>
+      continue;<br>
+<br>
+    for (unsigned Succ = 0, E = IBI->getNumSuccessors(); Succ != E; ++Succ)<br>
+      Targets.insert(IBI-><wbr>getSuccessor(Succ));<br>
+  }<br>
+<br>
+  if (Targets.empty())<br>
+    return false;<br>
+<br>
+  bool Changed = false;<br>
+  for (BasicBlock *Target : Targets) {<br>
+    SmallVector<BasicBlock *, 16> OtherPreds;<br>
+    BasicBlock *IBRPred = findIBRPredecessor(Target, OtherPreds);<br>
+    if (!IBRPred)<br>
+      continue;<br>
+<br>
+    // Don't even think about ehpads/landingpads.<br>
+    Instruction *FirstNonPHI = Target->getFirstNonPHI();<br>
+    if (FirstNonPHI->isEHPad() || Target->isLandingPad())<br>
+      continue;<br>
+<br>
+    BasicBlock *BodyBlock = Target->splitBasicBlock(<wbr>FirstNonPHI, ".split");<br>
+    // It's possible Target was its own successor through an indirectbr.<br>
+    // In this case, the indirectbr now comes from BodyBlock.<br>
+    if (IBRPred == Target)<br>
+      IBRPred = BodyBlock;<br>
+<br>
+    // At this point Target only has PHIs, and BodyBlock has the rest of the<br>
+    // block's body. Create a copy of Target that will be used by the "direct"<br>
+    // preds.<br>
+    ValueToValueMapTy VMap;<br>
+    BasicBlock *DirectSucc = CloneBasicBlock(Target, VMap, ".clone", &F);<br>
+<br>
+    for (BasicBlock *Pred : OtherPreds)<br>
+      Pred->getTerminator()-><wbr>replaceUsesOfWith(Target, DirectSucc);<br>
+<br>
+    // Ok, now fix up the PHIs. We know the two blocks only have PHIs, and that<br>
+    // they are clones, so the number of PHIs are the same.<br>
+    // (a) Remove the edge coming from IBRPred from the "Direct" PHI<br>
+    // (b) Leave that as the only edge in the "Indirect" PHI.<br>
+    // (c) Merge the two in the body block.<br>
+    BasicBlock::iterator Indirect = Target->begin(),<br>
+                         End = Target->getFirstNonPHI()-><wbr>getIterator();<br>
+    BasicBlock::iterator Direct = DirectSucc->begin();<br>
+    BasicBlock::iterator MergeInsert = BodyBlock-><wbr>getFirstInsertionPt();<br>
+<br>
+    assert(&*End == Target->getTerminator() &&<br>
+           "Block was expected to only contain PHIs");<br>
+<br>
+    while (Indirect != End) {<br>
+      PHINode *DirPHI = cast<PHINode>(Direct);<br>
+      PHINode *IndPHI = cast<PHINode>(Indirect);<br>
+<br>
+      // Now, clean up - the direct block shouldn't get the indirect value,<br>
+      // and vice versa.<br>
+      DirPHI->removeIncomingValue(<wbr>IBRPred);<br>
+      Direct++;<br>
+<br>
+      // Advance the pointer here, to avoid invalidation issues when the old<br>
+      // PHI is erased.<br>
+      Indirect++;<br>
+<br>
+      PHINode *NewIndPHI = PHINode::Create(IndPHI-><wbr>getType(), 1, "ind", IndPHI);<br>
+      NewIndPHI->addIncoming(IndPHI-<wbr>>getIncomingValueForBlock(<wbr>IBRPred),<br>
+                             IBRPred);<br>
+<br>
+      // Create a PHI in the body block, to merge the direct and indirect<br>
+      // predecessors.<br>
+      PHINode *MergePHI =<br>
+          PHINode::Create(IndPHI-><wbr>getType(), 2, "merge", &*MergeInsert);<br>
+      MergePHI->addIncoming(<wbr>NewIndPHI, Target);<br>
+      MergePHI->addIncoming(DirPHI, DirectSucc);<br>
+<br>
+      IndPHI->replaceAllUsesWith(<wbr>MergePHI);<br>
+      IndPHI->eraseFromParent();<br>
+    }<br>
+<br>
+    Changed = true;<br>
+  }<br>
+<br>
+  return Changed;<br>
+}<br>
+<br>
 /// Eliminate blocks that contain only PHI nodes, debug info directives, and an<br>
 /// unconditional branch. Passes before isel (e.g. LSR/loopsimplify) often split<br>
 /// edges in ways that are non-optimal for isel. Start by eliminating these<br>
<br>
Modified: llvm/trunk/test/CodeGen/ARM/<wbr>indirectbr.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/ARM/indirectbr.ll?rev=296149&r1=296148&r2=296149&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project/llvm/trunk/test/<wbr>CodeGen/ARM/indirectbr.ll?rev=<wbr>296149&r1=296148&r2=296149&<wbr>view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/test/CodeGen/ARM/<wbr>indirectbr.ll (original)<br>
+++ llvm/trunk/test/CodeGen/ARM/<wbr>indirectbr.ll Fri Feb 24 12:41:32 2017<br>
@@ -47,6 +47,7 @@ L3:<br>
   br label %L2<br>
<br>
 L2:                                               ; preds = %L3, %bb2<br>
+; THUMB-LABEL: %L1.clone<br>
 ; THUMB: muls<br>
   %res.2 = phi i32 [ %res.1, %L3 ], [ 1, %bb2 ]   ; <i32> [#uses=1]<br>
   %phitmp = mul i32 %res.2, 6                     ; <i32> [#uses=1]<br>
<br>
Modified: llvm/trunk/test/CodeGen/<wbr>MSP430/indirectbr2.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/MSP430/indirectbr2.ll?rev=296149&r1=296148&r2=296149&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project/llvm/trunk/test/<wbr>CodeGen/MSP430/indirectbr2.ll?<wbr>rev=296149&r1=296148&r2=<wbr>296149&view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/test/CodeGen/<wbr>MSP430/indirectbr2.ll (original)<br>
+++ llvm/trunk/test/CodeGen/<wbr>MSP430/indirectbr2.ll Fri Feb 24 12:41:32 2017<br>
@@ -5,7 +5,7 @@ define internal i16 @foo(i16 %i) nounwin<br>
 entry:<br>
   %tmp1 = getelementptr inbounds [5 x i8*], [5 x i8*]* @C.0.2070, i16 0, i16 %i ; <i8**> [#uses=1]<br>
   %gotovar.4.0 = load i8*, i8** %tmp1, align 4        ; <i8*> [#uses=1]<br>
-; CHECK: br .LC.0.2070(r12)<br>
+; CHECK: br .LC.0.2070(r15)<br>
   indirectbr i8* %gotovar.4.0, [label %L5, label %L4, label %L3, label %L2, label %L1]<br>
<br>
 L5:                                               ; preds = %bb2<br>
<br>
Modified: llvm/trunk/test/CodeGen/<wbr>PowerPC/indirectbr.ll<br>
URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/PowerPC/indirectbr.ll?rev=296149&r1=296148&r2=296149&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-<wbr>project/llvm/trunk/test/<wbr>CodeGen/PowerPC/indirectbr.ll?<wbr>rev=296149&r1=296148&r2=<wbr>296149&view=diff</a><br>
==============================<wbr>==============================<wbr>==================<br>
--- llvm/trunk/test/CodeGen/<wbr>PowerPC/indirectbr.ll (original)<br>
+++ llvm/trunk/test/CodeGen/<wbr>PowerPC/indirectbr.ll Fri Feb 24 12:41:32 2017<br>
@@ -17,23 +17,35 @@ entry:<br>
 bb2:                                              ; preds = %entry, %bb3<br>
   %gotovar.4.0 = phi i8* [ %gotovar.4.0.pre, %bb3 ], [ %0, %entry ] ; <i8*> [#uses=1]<br>
 ; PIC: mtctr<br>
-; PIC-NEXT: li<br>
-; PIC-NEXT: li<br>
-; PIC-NEXT: li<br>
-; PIC-NEXT: li<br>
 ; PIC-NEXT: bctr<br>
+; PIC: li<br>
+; PIC: b LBB<br>
+; PIC: li<br>
+; PIC: b LBB<br>
+; PIC: li<br>
+; PIC: b LBB<br>
+; PIC: li<br>
+; PIC: b LBB<br>
 ; STATIC: mtctr<br>
-; STATIC-NEXT: li<br>
-; STATIC-NEXT: li<br>
-; STATIC-NEXT: li<br>
-; STATIC-NEXT: li<br>
 ; STATIC-NEXT: bctr<br>
+; STATIC: li<br>
+; STATIC: b LBB<br>
+; STATIC: li<br>
+; STATIC: b LBB<br>
+; STATIC: li<br>
+; STATIC: b LBB<br>
+; STATIC: li<br>
+; STATIC: b LBB<br>
 ; PPC64: mtctr<br>
-; PPC64-NEXT: li<br>
-; PPC64-NEXT: li<br>
-; PPC64-NEXT: li<br>
-; PPC64-NEXT: li<br>
 ; PPC64-NEXT: bctr<br>
+; PPC64: li<br>
+; PPC64: b LBB<br>
+; PPC64: li<br>
+; PPC64: b LBB<br>
+; PPC64: li<br>
+; PPC64: b LBB<br>
+; PPC64: li<br>
+; PPC64: b LBB<br>
   indirectbr i8* %gotovar.4.0, [label %L5, label %L4, label %L3, label %L2, label %L1]<br>
<br>
 bb3:                                              ; preds = %entry<br>
<br>
<br>
______________________________<wbr>_________________<br>
llvm-commits mailing list<br>
<a href="mailto:llvm-commits@lists.llvm.org">llvm-commits@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-commits</a><br>
</blockquote></div><br></div>