[llvm] r296149 - [CGP] Split some critical edges coming out of indirect branches

Michael Kuperstein via llvm-commits llvm-commits at lists.llvm.org
Mon Feb 27 16:23:52 PST 2017


Recommitted as r296416, with one less stupid bug.

Michael

On Sun, Feb 26, 2017 at 3:21 AM, Daniel Jasper <djasper at google.com> wrote:

> This lead to segfaults when compiling for PPC. I have reverted this in
> r296295 and will forward reproduction instructions to you.
>
> On Fri, Feb 24, 2017 at 7:41 PM, Michael Kuperstein via llvm-commits <
> llvm-commits at lists.llvm.org> wrote:
>
>> Author: mkuper
>> Date: Fri Feb 24 12:41:32 2017
>> New Revision: 296149
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=296149&view=rev
>> Log:
>> [CGP] Split some critical edges coming out of indirect branches
>>
>> Splitting critical edges when one of the source edges is an indirectbr
>> is hard in general (because it requires changing the memory the indirectbr
>> reads). But if a block only has a single indirectbr predecessor (which is
>> the common case), we can simulate splitting that edge by splitting
>> the destination block, and retargeting the *direct* branches.
>>
>> This is motivated by the use of computed gotos in python 2.7:
>> PyEval_EvalFrame()
>> ends up using an indirect branch with ~100 successors, and passing a
>> constant to
>> each of those. Since MachineSink can't break indirect critical edges on
>> demand
>> (and doing this in MIR doesn't look feasible), this causes us to emit
>> about ~100
>> defs of registers containing constants, which we in the predecessor
>> block, where
>> only one of those constants is used in each successor. So, at each
>> computed goto,
>> we needlessly spill about a 100 constants to stack. The end result is
>> that a
>> clang-compiled python interpreter can be about ~2.5x slower on a simple
>> python
>> reduction loop than a gcc-compiled interpreter.
>>
>> Differential Revision: https://reviews.llvm.org/D29916
>>
>> Added:
>>     llvm/trunk/test/Transforms/CodeGenPrepare/X86/computedgoto.ll
>>       - copied unchanged from r296063, llvm/trunk/test/Transforms/Cod
>> eGenPrepare/computedgoto.ll
>> Modified:
>>     llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp
>>     llvm/trunk/test/CodeGen/ARM/indirectbr.ll
>>     llvm/trunk/test/CodeGen/MSP430/indirectbr2.ll
>>     llvm/trunk/test/CodeGen/PowerPC/indirectbr.ll
>>
>> Modified: llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/
>> CodeGenPrepare.cpp?rev=296149&r1=296148&r2=296149&view=diff
>> ============================================================
>> ==================
>> --- llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp (original)
>> +++ llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp Fri Feb 24 12:41:32 2017
>> @@ -15,10 +15,12 @@
>>
>>  #include "llvm/CodeGen/Passes.h"
>>  #include "llvm/ADT/DenseMap.h"
>> +#include "llvm/ADT/SetVector.h"
>>  #include "llvm/ADT/SmallSet.h"
>>  #include "llvm/ADT/Statistic.h"
>>  #include "llvm/Analysis/BlockFrequencyInfo.h"
>>  #include "llvm/Analysis/BranchProbabilityInfo.h"
>> +#include "llvm/Analysis/CFG.h"
>>  #include "llvm/Analysis/InstructionSimplify.h"
>>  #include "llvm/Analysis/LoopInfo.h"
>>  #include "llvm/Analysis/ProfileSummaryInfo.h"
>> @@ -53,8 +55,10 @@
>>  #include "llvm/Transforms/Utils/BasicBlockUtils.h"
>>  #include "llvm/Transforms/Utils/BuildLibCalls.h"
>>  #include "llvm/Transforms/Utils/BypassSlowDivision.h"
>> +#include "llvm/Transforms/Utils/Cloning.h"
>>  #include "llvm/Transforms/Utils/Local.h"
>>  #include "llvm/Transforms/Utils/SimplifyLibCalls.h"
>> +#include "llvm/Transforms/Utils/ValueMapper.h"
>>  using namespace llvm;
>>  using namespace llvm::PatternMatch;
>>
>> @@ -222,6 +226,7 @@ class TypePromotionTransaction;
>>                          unsigned CreatedInstCost);
>>      bool splitBranchCondition(Function &F);
>>      bool simplifyOffsetableRelocate(Instruction &I);
>> +    bool splitIndirectCriticalEdges(Function &F);
>>    };
>>  }
>>
>> @@ -296,6 +301,10 @@ bool CodeGenPrepare::runOnFunction(Funct
>>    if (!DisableBranchOpts)
>>      EverMadeChange |= splitBranchCondition(F);
>>
>> +  // Split some critical edges where one of the sources is an indirect
>> branch,
>> +  // to help generate sane code for PHIs involving such edges.
>> +  EverMadeChange |= splitIndirectCriticalEdges(F);
>> +
>>    bool MadeChange = true;
>>    while (MadeChange) {
>>      MadeChange = false;
>> @@ -429,6 +438,152 @@ BasicBlock *CodeGenPrepare::findDestBloc
>>    return DestBB;
>>  }
>>
>> +// Return the unique indirectbr predecessor of a block. This may return
>> null
>> +// even if such a predecessor exists, if it's not useful for splitting.
>> +// If a predecessor is found, OtherPreds will contain all other
>> (non-indirectbr)
>> +// predecessors of BB.
>> +static BasicBlock *
>> +findIBRPredecessor(BasicBlock *BB, SmallVectorImpl<BasicBlock *>
>> &OtherPreds) {
>> +  // If the block doesn't have any PHIs, we don't care about it, since
>> there's
>> +  // no point in splitting it.
>> +  PHINode *PN = dyn_cast<PHINode>(BB->begin());
>> +  if (!PN)
>> +    return nullptr;
>> +
>> +  // Verify we have exactly one IBR predecessor.
>> +  // Conservatively bail out if one of the other predecessors is not a
>> "regular"
>> +  // terminator (that is, not a switch or a br).
>> +  BasicBlock *IBB = nullptr;
>> +  for (unsigned Pred = 0, E = PN->getNumIncomingValues(); Pred != E;
>> ++Pred) {
>> +    BasicBlock *PredBB = PN->getIncomingBlock(Pred);
>> +    TerminatorInst *PredTerm = PredBB->getTerminator();
>> +    switch (PredTerm->getOpcode()) {
>> +    case Instruction::IndirectBr:
>> +      if (IBB)
>> +        return nullptr;
>> +      IBB = PredBB;
>> +      break;
>> +    case Instruction::Br:
>> +    case Instruction::Switch:
>> +      OtherPreds.push_back(PredBB);
>> +      continue;
>> +    default:
>> +      return nullptr;
>> +    }
>> +  }
>> +
>> +  return IBB;
>> +}
>> +
>> +// Split critical edges where the source of the edge is an indirectbr
>> +// instruction. This isn't always possible, but we can handle some easy
>> cases.
>> +// This is useful because MI is unable to split such critical edges,
>> +// which means it will not be able to sink instructions along those
>> edges.
>> +// This is especially painful for indirect branches with many
>> successors, where
>> +// we end up having to prepare all outgoing values in the origin block.
>> +//
>> +// Our normal algorithm for splitting critical edges requires us to
>> update
>> +// the outgoing edges of the edge origin block, but for an indirectbr
>> this
>> +// is hard, since it would require finding and updating the block
>> addresses
>> +// the indirect branch uses. But if a block only has a single indirectbr
>> +// predecessor, with the others being regular branches, we can do it in a
>> +// different way.
>> +// Say we have A -> D, B -> D, I -> D where only I -> D is an indirectbr.
>> +// We can split D into D0 and D1, where D0 contains only the PHIs from D,
>> +// and D1 is the D block body. We can then duplicate D0 as D0A and D0B,
>> and
>> +// create the following structure:
>> +// A -> D0A, B -> D0A, I -> D0B, D0A -> D1, D0B -> D1
>> +bool CodeGenPrepare::splitIndirectCriticalEdges(Function &F) {
>> +  // Check whether the function has any indirectbrs, and collect which
>> blocks
>> +  // they may jump to. Since most functions don't have indirect branches,
>> +  // this lowers the common case's overhead to O(Blocks) instead of
>> O(Edges).
>> +  SmallSetVector<BasicBlock *, 16> Targets;
>> +  for (auto &BB : F) {
>> +    auto *IBI = dyn_cast<IndirectBrInst>(BB.getTerminator());
>> +    if (!IBI)
>> +      continue;
>> +
>> +    for (unsigned Succ = 0, E = IBI->getNumSuccessors(); Succ != E;
>> ++Succ)
>> +      Targets.insert(IBI->getSuccessor(Succ));
>> +  }
>> +
>> +  if (Targets.empty())
>> +    return false;
>> +
>> +  bool Changed = false;
>> +  for (BasicBlock *Target : Targets) {
>> +    SmallVector<BasicBlock *, 16> OtherPreds;
>> +    BasicBlock *IBRPred = findIBRPredecessor(Target, OtherPreds);
>> +    if (!IBRPred)
>> +      continue;
>> +
>> +    // Don't even think about ehpads/landingpads.
>> +    Instruction *FirstNonPHI = Target->getFirstNonPHI();
>> +    if (FirstNonPHI->isEHPad() || Target->isLandingPad())
>> +      continue;
>> +
>> +    BasicBlock *BodyBlock = Target->splitBasicBlock(FirstNonPHI,
>> ".split");
>> +    // It's possible Target was its own successor through an indirectbr.
>> +    // In this case, the indirectbr now comes from BodyBlock.
>> +    if (IBRPred == Target)
>> +      IBRPred = BodyBlock;
>> +
>> +    // At this point Target only has PHIs, and BodyBlock has the rest of
>> the
>> +    // block's body. Create a copy of Target that will be used by the
>> "direct"
>> +    // preds.
>> +    ValueToValueMapTy VMap;
>> +    BasicBlock *DirectSucc = CloneBasicBlock(Target, VMap, ".clone", &F);
>> +
>> +    for (BasicBlock *Pred : OtherPreds)
>> +      Pred->getTerminator()->replaceUsesOfWith(Target, DirectSucc);
>> +
>> +    // Ok, now fix up the PHIs. We know the two blocks only have PHIs,
>> and that
>> +    // they are clones, so the number of PHIs are the same.
>> +    // (a) Remove the edge coming from IBRPred from the "Direct" PHI
>> +    // (b) Leave that as the only edge in the "Indirect" PHI.
>> +    // (c) Merge the two in the body block.
>> +    BasicBlock::iterator Indirect = Target->begin(),
>> +                         End = Target->getFirstNonPHI()->getIterator();
>> +    BasicBlock::iterator Direct = DirectSucc->begin();
>> +    BasicBlock::iterator MergeInsert = BodyBlock->getFirstInsertionPt();
>> +
>> +    assert(&*End == Target->getTerminator() &&
>> +           "Block was expected to only contain PHIs");
>> +
>> +    while (Indirect != End) {
>> +      PHINode *DirPHI = cast<PHINode>(Direct);
>> +      PHINode *IndPHI = cast<PHINode>(Indirect);
>> +
>> +      // Now, clean up - the direct block shouldn't get the indirect
>> value,
>> +      // and vice versa.
>> +      DirPHI->removeIncomingValue(IBRPred);
>> +      Direct++;
>> +
>> +      // Advance the pointer here, to avoid invalidation issues when the
>> old
>> +      // PHI is erased.
>> +      Indirect++;
>> +
>> +      PHINode *NewIndPHI = PHINode::Create(IndPHI->getType(), 1, "ind",
>> IndPHI);
>> +      NewIndPHI->addIncoming(IndPHI->getIncomingValueForBlock(IBRPred),
>> +                             IBRPred);
>> +
>> +      // Create a PHI in the body block, to merge the direct and indirect
>> +      // predecessors.
>> +      PHINode *MergePHI =
>> +          PHINode::Create(IndPHI->getType(), 2, "merge", &*MergeInsert);
>> +      MergePHI->addIncoming(NewIndPHI, Target);
>> +      MergePHI->addIncoming(DirPHI, DirectSucc);
>> +
>> +      IndPHI->replaceAllUsesWith(MergePHI);
>> +      IndPHI->eraseFromParent();
>> +    }
>> +
>> +    Changed = true;
>> +  }
>> +
>> +  return Changed;
>> +}
>> +
>>  /// Eliminate blocks that contain only PHI nodes, debug info directives,
>> and an
>>  /// unconditional branch. Passes before isel (e.g. LSR/loopsimplify)
>> often split
>>  /// edges in ways that are non-optimal for isel. Start by eliminating
>> these
>>
>> Modified: llvm/trunk/test/CodeGen/ARM/indirectbr.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>> ARM/indirectbr.ll?rev=296149&r1=296148&r2=296149&view=diff
>> ============================================================
>> ==================
>> --- llvm/trunk/test/CodeGen/ARM/indirectbr.ll (original)
>> +++ llvm/trunk/test/CodeGen/ARM/indirectbr.ll Fri Feb 24 12:41:32 2017
>> @@ -47,6 +47,7 @@ L3:
>>    br label %L2
>>
>>  L2:                                               ; preds = %L3, %bb2
>> +; THUMB-LABEL: %L1.clone
>>  ; THUMB: muls
>>    %res.2 = phi i32 [ %res.1, %L3 ], [ 1, %bb2 ]   ; <i32> [#uses=1]
>>    %phitmp = mul i32 %res.2, 6                     ; <i32> [#uses=1]
>>
>> Modified: llvm/trunk/test/CodeGen/MSP430/indirectbr2.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>> MSP430/indirectbr2.ll?rev=296149&r1=296148&r2=296149&view=diff
>> ============================================================
>> ==================
>> --- llvm/trunk/test/CodeGen/MSP430/indirectbr2.ll (original)
>> +++ llvm/trunk/test/CodeGen/MSP430/indirectbr2.ll Fri Feb 24 12:41:32
>> 2017
>> @@ -5,7 +5,7 @@ define internal i16 @foo(i16 %i) nounwin
>>  entry:
>>    %tmp1 = getelementptr inbounds [5 x i8*], [5 x i8*]* @C.0.2070, i16 0,
>> i16 %i ; <i8**> [#uses=1]
>>    %gotovar.4.0 = load i8*, i8** %tmp1, align 4        ; <i8*> [#uses=1]
>> -; CHECK: br .LC.0.2070(r12)
>> +; CHECK: br .LC.0.2070(r15)
>>    indirectbr i8* %gotovar.4.0, [label %L5, label %L4, label %L3, label
>> %L2, label %L1]
>>
>>  L5:                                               ; preds = %bb2
>>
>> Modified: llvm/trunk/test/CodeGen/PowerPC/indirectbr.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>> PowerPC/indirectbr.ll?rev=296149&r1=296148&r2=296149&view=diff
>> ============================================================
>> ==================
>> --- llvm/trunk/test/CodeGen/PowerPC/indirectbr.ll (original)
>> +++ llvm/trunk/test/CodeGen/PowerPC/indirectbr.ll Fri Feb 24 12:41:32
>> 2017
>> @@ -17,23 +17,35 @@ entry:
>>  bb2:                                              ; preds = %entry, %bb3
>>    %gotovar.4.0 = phi i8* [ %gotovar.4.0.pre, %bb3 ], [ %0, %entry ] ;
>> <i8*> [#uses=1]
>>  ; PIC: mtctr
>> -; PIC-NEXT: li
>> -; PIC-NEXT: li
>> -; PIC-NEXT: li
>> -; PIC-NEXT: li
>>  ; PIC-NEXT: bctr
>> +; PIC: li
>> +; PIC: b LBB
>> +; PIC: li
>> +; PIC: b LBB
>> +; PIC: li
>> +; PIC: b LBB
>> +; PIC: li
>> +; PIC: b LBB
>>  ; STATIC: mtctr
>> -; STATIC-NEXT: li
>> -; STATIC-NEXT: li
>> -; STATIC-NEXT: li
>> -; STATIC-NEXT: li
>>  ; STATIC-NEXT: bctr
>> +; STATIC: li
>> +; STATIC: b LBB
>> +; STATIC: li
>> +; STATIC: b LBB
>> +; STATIC: li
>> +; STATIC: b LBB
>> +; STATIC: li
>> +; STATIC: b LBB
>>  ; PPC64: mtctr
>> -; PPC64-NEXT: li
>> -; PPC64-NEXT: li
>> -; PPC64-NEXT: li
>> -; PPC64-NEXT: li
>>  ; PPC64-NEXT: bctr
>> +; PPC64: li
>> +; PPC64: b LBB
>> +; PPC64: li
>> +; PPC64: b LBB
>> +; PPC64: li
>> +; PPC64: b LBB
>> +; PPC64: li
>> +; PPC64: b LBB
>>    indirectbr i8* %gotovar.4.0, [label %L5, label %L4, label %L3, label
>> %L2, label %L1]
>>
>>  bb3:                                              ; preds = %entry
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170227/9efe6a94/attachment.html>


More information about the llvm-commits mailing list