[patch] simplifyCFG for review

Ye, Mei Mei.Ye at amd.com
Mon Jul 8 15:10:26 PDT 2013


Thanks Tom.   Attached is a new patch that adds comments as requested. I also made a new diff against most-recent trunk.  But it looks like many files have locks, so some diffs are based on an earlier revision.

With regard to your question:
"I'm a little confused about why we need to add a BasicTargetTransformInfo and
also an AMDGPUTargetTransformInfo.  What is the reason for this?"

My answer is that same thing happens to X86, ARM, and other targets.  I didn't track whether BasicTargetTransformInfo is always needed, but I think you can build compiler for more than one targets, and for targets that you do not supply a target-spefici Tranform Info, you can always default back to use the basic one.

-Mei


-----Original Message-----
From: Tom Stellard [mailto:tom at stellard.net] 
Sent: Monday, July 08, 2013 11:14 AM
To: Ye, Mei
Cc: Evan Cheng; Renato Golin; Sean Silva; llvm-commits at cs.uiuc.edu
Subject: Re: [patch] simplifyCFG for review

On Fri, Jun 28, 2013 at 09:46:13PM +0000, Ye, Mei wrote:
> Hi all
> 
> Thank you for your comments.   Attached is an updated patch.
> - add TargetTransformInfo to R600 .
> - add hasBranchDivergence to query whether a target has branch divergence.
> - Invoke branch reduction opts only when the underlying target has branch divergence.  So currently it only gets triggered for R600.
> - fixed a bug in opt.cpp
> - correct clang-format issues.
> 
> I have tested the correctness on x86 (by adding branch divergence to X86TargetTransformInfo in my local workspace) using existing testing infrastructure and CPU2006, CPU2000.  Tom Stellard agrees to run R600 testings (thanks a lot, Tom).
> 
> -Mei

> Index: test/Transforms/SimplifyCFG/R600/lit.local.cfg
> ===================================================================
> --- test/Transforms/SimplifyCFG/R600/lit.local.cfg	(revision 0)
> +++ test/Transforms/SimplifyCFG/R600/lit.local.cfg	(revision 0)
> @@ -0,0 +1,6 @@
> +config.suffixes = ['.ll', '.c', '.cpp']
> +
> +targets = set(config.root.targets_to_build.split())
> +if not 'R600' in targets:
> +    config.unsupported = True
> +
> Index: test/Transforms/SimplifyCFG/R600/parallelorifcollapse.ll
> ===================================================================
> --- test/Transforms/SimplifyCFG/R600/parallelorifcollapse.ll	(revision 0)
> +++ test/Transforms/SimplifyCFG/R600/parallelorifcollapse.ll	(revision 0)
> @@ -0,0 +1,51 @@
> +; Function Attrs: nounwind
> +; RUN: opt < %s -mtriple=r600-unknown-linux-gnu -simplifycfg -basicaa -S | FileCheck %s
> +; CHECK: or i1
> +; CHECK-NEXT: br
> +; CHECK: br
> +; CHECK: ret

It's not clear to me what the expected output of this test is supposed
to be.  Could you add a comment explaining what the simplifycfg pass is
supposed to be doing here.

> +define void @_Z9chk1D_512v() #0 {
> +entry:
> +  %a0 = alloca i32, align 4
> +  %b0 = alloca i32, align 4
> +  %c0 = alloca i32, align 4
> +  %d0 = alloca i32, align 4
> +  %a1 = alloca i32, align 4
> +  %b1 = alloca i32, align 4
> +  %c1 = alloca i32, align 4
> +  %d1 = alloca i32, align 4
> +  %data = alloca i32, align 4
> +  %0 = load i32* %a0, align 4
> +  %1 = load i32* %b0, align 4
> +  %cmp = icmp ne i32 %0, %1
> +  br i1 %cmp, label %land.lhs.true, label %if.end
> +
> +land.lhs.true:                                    ; preds = %entry
> +  %2 = load i32* %c0, align 4
> +  %3 = load i32* %d0, align 4
> +  %cmp1 = icmp ne i32 %2, %3
> +  br i1 %cmp1, label %if.then, label %if.end
> +
> +if.then:                                          ; preds = %land.lhs.true
> +  store i32 1, i32* %data, align 4
> +  br label %if.end
> +
> +if.end:                                           ; preds = %if.then, %land.lhs.true, %entry
> +  %4 = load i32* %a1, align 4
> +  %5 = load i32* %b1, align 4
> +  %cmp2 = icmp ne i32 %4, %5
> +  br i1 %cmp2, label %land.lhs.true3, label %if.end6
> +
> +land.lhs.true3:                                   ; preds = %if.end
> +  %6 = load i32* %c1, align 4
> +  %7 = load i32* %d1, align 4
> +  %cmp4 = icmp ne i32 %6, %7
> +  br i1 %cmp4, label %if.then5, label %if.end6
> +
> +if.then5:                                         ; preds = %land.lhs.true3
> +  store i32 1, i32* %data, align 4
> +  br label %if.end6
> +
> +if.end6:                                          ; preds = %if.then5, %land.lhs.true3, %if.end
> +  ret void
> +}
> Index: test/Transforms/SimplifyCFG/R600/parallelandifcollapse.ll
> ===================================================================
> --- test/Transforms/SimplifyCFG/R600/parallelandifcollapse.ll	(revision 0)
> +++ test/Transforms/SimplifyCFG/R600/parallelandifcollapse.ll	(revision 0)
> @@ -0,0 +1,58 @@
> +; Function Attrs: nounwind
> +; RUN: opt < %s -mtriple=r600-unknown-linux-gnu -simplifycfg -basicaa -S | FileCheck %s
> +; CHECK: or i1
> +; CHECK-NEXT: br
> +; CHECK: br
> +; CHECK: ret

Same thing here, a comment explaining what the expected output should be
would be helpful.

> +define void @_Z9chk1D_512v() #0 {
> +entry:
> +  %a0 = alloca i32, align 4
> +  %b0 = alloca i32, align 4
> +  %c0 = alloca i32, align 4
> +  %d0 = alloca i32, align 4
> +  %a1 = alloca i32, align 4
> +  %b1 = alloca i32, align 4
> +  %c1 = alloca i32, align 4
> +  %d1 = alloca i32, align 4
> +  %data = alloca i32, align 4
> +  %0 = load i32* %a0, align 4
> +  %1 = load i32* %b0, align 4
> +  %cmp = icmp ne i32 %0, %1
> +  br i1 %cmp, label %land.lhs.true, label %if.else
> +
> +land.lhs.true:                                    ; preds = %entry
> +  %2 = load i32* %c0, align 4
> +  %3 = load i32* %d0, align 4
> +  %cmp1 = icmp ne i32 %2, %3
> +  br i1 %cmp1, label %if.then, label %if.else
> +
> +if.then:                                          ; preds = %land.lhs.true
> +  br label %if.end
> +
> +if.else:                                          ; preds = %land.lhs.true, %entry
> +  store i32 1, i32* %data, align 4
> +  br label %if.end
> +
> +if.end:                                           ; preds = %if.else, %if.then
> +  %4 = load i32* %a1, align 4
> +  %5 = load i32* %b1, align 4
> +  %cmp2 = icmp ne i32 %4, %5
> +  br i1 %cmp2, label %land.lhs.true3, label %if.else6
> +
> +land.lhs.true3:                                   ; preds = %if.end
> +  %6 = load i32* %c1, align 4
> +  %7 = load i32* %d1, align 4
> +  %cmp4 = icmp ne i32 %6, %7
> +  br i1 %cmp4, label %if.then5, label %if.else6
> +
> +if.then5:                                         ; preds = %land.lhs.true3
> +  br label %if.end7
> +
> +if.else6:                                         ; preds = %land.lhs.true3, %if.end
> +  store i32 1, i32* %data, align 4
> +  br label %if.end7
> +
> +if.end7:                                          ; preds = %if.else6, %if.then5
> +  ret void
> +}
> +
> Index: include/llvm/Analysis/TargetTransformInfo.h
> ===================================================================
> --- include/llvm/Analysis/TargetTransformInfo.h	(revision 183763)
> +++ include/llvm/Analysis/TargetTransformInfo.h	(working copy)
> @@ -171,6 +171,9 @@
>    /// comments for a detailed explanation of the cost values.
>    virtual unsigned getUserCost(const User *U) const;
>  
> +  /// \brief hasBranchDivergence - Return true if branch divergence exists.
> +  virtual bool hasBranchDivergence() const;
> +

What exactly is branch divergence?  Could you explain this more in
the comment and give an example.

>    /// \brief Test whether calls to a function lower to actual program function
>    /// calls.
>    ///
> Index: include/llvm/Transforms/Utils/Local.h
> ===================================================================
> --- include/llvm/Transforms/Utils/Local.h	(revision 184607)
> +++ include/llvm/Transforms/Utils/Local.h	(working copy)
> @@ -39,6 +39,7 @@
>  class TargetLibraryInfo;
>  class TargetTransformInfo;
>  class DIBuilder;
> +class AliasAnalysis;
>  
>  template<typename T> class SmallVectorImpl;
>    
> @@ -136,7 +137,7 @@
>  /// the basic block that was pointed to.
>  ///
>  bool SimplifyCFG(BasicBlock *BB, const TargetTransformInfo &TTI,
> -                 const DataLayout *TD = 0);
> +                 const DataLayout *TD = 0, AliasAnalysis * AA = 0);
>  
>  /// FoldBranchToCommonDest - If this basic block is ONLY a setcc and a branch,
>  /// and if a predecessor branches to us and one of our successors, fold the
> Index: tools/opt/opt.cpp
> ===================================================================
> --- tools/opt/opt.cpp	(revision 183763)
> +++ tools/opt/opt.cpp	(working copy)
> @@ -668,6 +668,9 @@
>      FPasses.reset(new FunctionPassManager(M.get()));
>      if (TD)
>        FPasses->add(new DataLayout(*TD));
> +    if (TM.get())
> +      TM->addAnalysisPasses(*FPasses);
> +
>    }
>  
>    if (PrintBreakpoints) {
> Index: lib/Analysis/TargetTransformInfo.cpp
> ===================================================================
> --- lib/Analysis/TargetTransformInfo.cpp	(revision 183763)
> +++ lib/Analysis/TargetTransformInfo.cpp	(working copy)
> @@ -88,6 +88,8 @@
>    return PrevTTI->getUserCost(U);
>  }
>  
> +bool TargetTransformInfo::hasBranchDivergence() const { return false; }
> +
>  bool TargetTransformInfo::isLoweredToCall(const Function *F) const {
>    return PrevTTI->isLoweredToCall(F);
>  }
> Index: lib/Target/R600/AMDGPUTargetMachine.cpp
> ===================================================================
> --- lib/Target/R600/AMDGPUTargetMachine.cpp	(revision 183763)
> +++ lib/Target/R600/AMDGPUTargetMachine.cpp	(working copy)
> @@ -105,6 +105,18 @@
>    return new AMDGPUPassConfig(this, PM);
>  }
>  
> +//===----------------------------------------------------------------------===//
> +// AMDGPU Analysis Pass Setup
> +//===----------------------------------------------------------------------===//
> +
> +void AMDGPUTargetMachine::addAnalysisPasses(PassManagerBase &PM) {
> +  // Add first the target-independent BasicTTI pass, then our AMDGPU pass. This
> +  // allows the AMDGPU pass to delegate to the target independent layer when
> +  // appropriate.
> +  PM.add(createBasicTargetTransformInfoPass(getTargetLowering()));
> +  PM.add(createAMDGPUTargetTransformInfoPass(this));

I'm a little confused about why we need to add a BasicTargetTransformInfo and
also an AMDGPUTargetTransformInfo.  What is the reason for this?

> +}
> +
>  bool
>  AMDGPUPassConfig::addPreISel() {
>    const AMDGPUSubtarget &ST = TM->getSubtarget<AMDGPUSubtarget>();
> Index: lib/Target/R600/AMDGPUTargetTransformInfo.cpp
> ===================================================================
> --- lib/Target/R600/AMDGPUTargetTransformInfo.cpp	(revision 0)
> +++ lib/Target/R600/AMDGPUTargetTransformInfo.cpp	(revision 0)

It looks like you forgot to add this new file to CMakeLists.txt.

> @@ -0,0 +1,90 @@
> +//===-- AMDGPUTargetTransformInfo.cpp - AMDGPU specific TTI pass
> +//----------------===//
> +//
> +//                     The LLVM Compiler Infrastructure
> +//
> +// This file is distributed under the University of Illinois Open Source
> +// License. See LICENSE.TXT for details.
> +//
> +//===----------------------------------------------------------------------===//
> +/// \file
> +/// This file implements a TargetTransformInfo analysis pass specific to the
> +/// AMDGPU target machine. It uses the target's detailed information to provide
> +/// more precise answers to certain TTI queries, while letting the target
> +/// independent and default TTI implementations handle the rest.
> +///
> +//===----------------------------------------------------------------------===//
> +
> +#define DEBUG_TYPE "AMDGPUtti"
> +#include "AMDGPU.h"
> +#include "AMDGPUTargetMachine.h"
> +#include "llvm/Analysis/TargetTransformInfo.h"
> +#include "llvm/Support/Debug.h"
> +#include "llvm/Target/TargetLowering.h"
> +#include "llvm/Target/CostTable.h"
> +using namespace llvm;
> +
> +// Declare the pass initialization routine locally as target-specific passes
> +// don't have a target-wide initialization entry point, and so we rely on the
> +// pass constructor initialization.
> +namespace llvm {
> +void initializeAMDGPUTTIPass(PassRegistry &);
> +}
> +
> +namespace {
> +
> +class AMDGPUTTI : public ImmutablePass, public TargetTransformInfo {
> +  const AMDGPUTargetMachine *TM;
> +  const AMDGPUSubtarget *ST;
> +  const AMDGPUTargetLowering *TLI;
> +
> +  /// Estimate the overhead of scalarizing an instruction. Insert and Extract
> +  /// are set if the result needs to be inserted and/or extracted from vectors.
> +  unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) const;
> +
> +public:
> +  AMDGPUTTI() : ImmutablePass(ID), TM(0), ST(0), TLI(0) {
> +    llvm_unreachable("This pass cannot be directly constructed");
> +  }
> +
> +  AMDGPUTTI(const AMDGPUTargetMachine *TM)
> +      : ImmutablePass(ID), TM(TM), ST(TM->getSubtargetImpl()),
> +        TLI(TM->getTargetLowering()) {
> +    initializeAMDGPUTTIPass(*PassRegistry::getPassRegistry());
> +  }
> +
> +  virtual void initializePass() { pushTTIStack(this); }
> +
> +  virtual void finalizePass() { popTTIStack(); }
> +
> +  virtual void getAnalysisUsage(AnalysisUsage &AU) const {
> +    TargetTransformInfo::getAnalysisUsage(AU);
> +  }
> +
> +  /// Pass identification.
> +  static char ID;
> +
> +  /// Provide necessary pointer adjustments for the two base classes.
> +  virtual void *getAdjustedAnalysisPointer(const void *ID) {
> +    if (ID == &TargetTransformInfo::ID)
> +      return (TargetTransformInfo *)this;
> +    return this;
> +  }
> +
> +  virtual bool hasBranchDivergence() const;
> +
> +  /// @}
> +};
> +
> +} // end anonymous namespace
> +
> +INITIALIZE_AG_PASS(AMDGPUTTI, TargetTransformInfo, "AMDGPUtti",
> +                   "AMDGPU Target Transform Info", true, true, false)
> +char AMDGPUTTI::ID = 0;
> +
> +ImmutablePass *
> +llvm::createAMDGPUTargetTransformInfoPass(const AMDGPUTargetMachine *TM) {
> +  return new AMDGPUTTI(TM);
> +}
> +
> +bool AMDGPUTTI::hasBranchDivergence() const { return true; }
> Index: lib/Target/R600/AMDGPUTargetMachine.h
> ===================================================================
> --- lib/Target/R600/AMDGPUTargetMachine.h	(revision 183763)
> +++ lib/Target/R600/AMDGPUTargetMachine.h	(working copy)
> @@ -63,6 +63,9 @@
>    }
>    virtual const DataLayout *getDataLayout() const { return &Layout; }
>    virtual TargetPassConfig *createPassConfig(PassManagerBase &PM);
> +
> +  /// \brief Register R600 analysis passes with a pass manager.
> +  virtual void addAnalysisPasses(PassManagerBase &PM);
>  };
>  
>  } // End namespace llvm
> Index: lib/Target/R600/AMDGPU.h
> ===================================================================
> --- lib/Target/R600/AMDGPU.h	(revision 183763)
> +++ lib/Target/R600/AMDGPU.h	(working copy)
> @@ -46,6 +46,10 @@
>  FunctionPass *createAMDGPUIndirectAddressingPass(TargetMachine &tm);
>  FunctionPass *createAMDGPUISelDag(TargetMachine &tm);
>  
> +/// \brief Creates an AMDGPU-specific Target Transformation Info pass.
> +ImmutablePass *
> +    createAMDGPUTargetTransformInfoPass(const AMDGPUTargetMachine *TM);
> +
>  extern Target TheAMDGPUTarget;
>  
>  } // End namespace llvm
> Index: lib/Transforms/Utils/SimplifyCFG.cpp
> ===================================================================
> --- lib/Transforms/Utils/SimplifyCFG.cpp	(revision 184607)
> +++ lib/Transforms/Utils/SimplifyCFG.cpp	(working copy)
> @@ -17,8 +17,10 @@
>  #include "llvm/ADT/STLExtras.h"
>  #include "llvm/ADT/SetVector.h"
>  #include "llvm/ADT/SmallPtrSet.h"
> +#include "llvm/ADT/SmallSet.h"
>  #include "llvm/ADT/SmallVector.h"
>  #include "llvm/ADT/Statistic.h"
> +#include "llvm/Analysis/AliasAnalysis.h"
>  #include "llvm/Analysis/InstructionSimplify.h"
>  #include "llvm/Analysis/TargetTransformInfo.h"
>  #include "llvm/Analysis/ValueTracking.h"
> @@ -63,6 +65,10 @@
>  HoistCondStores("simplifycfg-hoist-cond-stores", cl::Hidden, cl::init(true),
>         cl::desc("Hoist conditional stores if an unconditional store preceeds"));
>  
> +static cl::opt<bool>
> +    ParallelAndOr("simplifycfg-parallel-and-or", cl::Hidden, cl::init(true),
> +                  cl::desc("Use parallel-and-or mode for branch conditions"));
> +
>  STATISTIC(NumBitMaps, "Number of switch instructions turned into bitmaps");
>  STATISTIC(NumLookupTables, "Number of switch instructions turned into lookup tables");
>  STATISTIC(NumSinkCommons, "Number of common instructions sunk down to the end block");
> @@ -88,6 +94,7 @@
>  class SimplifyCFGOpt {
>    const TargetTransformInfo &TTI;
>    const DataLayout *const TD;
> +  AliasAnalysis *AA;
>  
>    Value *isValueEqualityComparison(TerminatorInst *TI);
>    BasicBlock *GetValueEqualityComparisonCases(TerminatorInst *TI,
> @@ -105,10 +112,25 @@
>    bool SimplifyIndirectBr(IndirectBrInst *IBI);
>    bool SimplifyUncondBranch(BranchInst *BI, IRBuilder <> &Builder);
>    bool SimplifyCondBranch(BranchInst *BI, IRBuilder <>&Builder);
> +  /// \brief Use parallel-and or parallel-or to generate conditions for
> +  /// conditional branches.
> +  bool SimplifyParallelAndOr(BasicBlock *BB, IRBuilder<> &Builder, Pass *P = 0);
> +  /// \brief If \param BB is the merge block of an if-region, attempt to merge
> +  /// the if-region with an adjacent if-region upstream if two if-regions
> +  /// contain identical instructions.
> +  bool MergeIfRegion(BasicBlock *BB, IRBuilder<> &Builder, Pass *P = 0);
> +  /// \brief Compare a pair of blocks: \param Block1 and \param Block2, which
> +  /// are from two if-regions whose head blocks are \param Head1 and \param
> +  /// Head2.  \returns true if \param Block1 and \param Block2 contain identical
> +  /// instructions, and none of the instructions alias with \param Head2.
> +  /// This is used as a legality check for merging if-regions.
> +  bool CompareBlock(BasicBlock *Head1, BasicBlock *Head2, BasicBlock *Block1,
> +                    BasicBlock *Block2);
>  
>  public:
> -  SimplifyCFGOpt(const TargetTransformInfo &TTI, const DataLayout *TD)
> -      : TTI(TTI), TD(TD) {}
> +  SimplifyCFGOpt(const TargetTransformInfo &TTI, const DataLayout *TD,
> +                 AliasAnalysis *aa)
> +      : TTI(TTI), TD(TD), AA(aa) {}
>    bool run(BasicBlock *BB);
>  };
>  }
> @@ -195,8 +217,8 @@
>  }
>  
>  
> -/// GetIfCondition - Given a basic block (BB) with two predecessors (and at
> -/// least one PHI node in it), check to see if the merge at this block is due
> +/// GetIfCondition - Given a basic block (BB) with two predecessors,
> +/// check to see if the merge at this block is due
>  /// to an "if condition".  If so, return the boolean condition that determines
>  /// which entry into BB will be taken.  Also, return by references the block
>  /// that will be entered from if the condition is true, and the block that will
> @@ -206,12 +228,31 @@
>  /// instructions in them.
>  static Value *GetIfCondition(BasicBlock *BB, BasicBlock *&IfTrue,
>                               BasicBlock *&IfFalse) {
> -  PHINode *SomePHI = cast<PHINode>(BB->begin());
> -  assert(SomePHI->getNumIncomingValues() == 2 &&
> -         "Function can only handle blocks with 2 predecessors!");
> -  BasicBlock *Pred1 = SomePHI->getIncomingBlock(0);
> -  BasicBlock *Pred2 = SomePHI->getIncomingBlock(1);
> +  PHINode *SomePHI = dyn_cast<PHINode>(BB->begin());
> +  BasicBlock *Pred1 = NULL;
> +  BasicBlock *Pred2 = NULL;
>  
> +  if (SomePHI) {
> +    if (SomePHI->getNumIncomingValues() != 2)
> +       return NULL;
> +    Pred1 = SomePHI->getIncomingBlock(0);
> +    Pred2 = SomePHI->getIncomingBlock(1);
> +  } else {
> +    SmallSetVector<BasicBlock *, 16> Preds(pred_begin(BB), pred_end(BB));
> +    for (SmallSetVector<BasicBlock *, 16>::iterator PI = Preds.begin(),
> +                                                    PE = Preds.end();
> +         PI != PE; ++PI) {
> +      if (Pred1 == NULL)
> +        Pred1 = *PI;
> +      else if (Pred2 == NULL)
> +        Pred2 = *PI;
> +      else
> +        return NULL;
> +    }
> +    if (!Pred1 || !Pred2)
> +       return NULL;
> +  }
> +
>    // We can only handle branches.  Other control flow will be lowered to
>    // branches if possible anyway.
>    BranchInst *Pred1Br = dyn_cast<BranchInst>(Pred1->getTerminator());
> @@ -4039,6 +4080,402 @@
>    return false;
>  }
>  
> +/// If \param [in] BB has more than one predecessor that is a conditional
> +/// branch, attempt to use parallel and/or for the branch condition. \returns
> +/// true on success.
> +///
> +/// Before:
> +///   ......
> +///   %cmp10 = fcmp une float %tmp1, %tmp2
> +///   br i1 %cmp1, label %if.then, label %lor.rhs
> +///
> +/// lor.rhs:
> +///   ......
> +///   %cmp11 = fcmp une float %tmp3, %tmp4
> +///   br i1 %cmp11, label %if.then, label %ifend
> +///
> +/// if.end:  // the merge block
> +///   ......
> +///
> +/// if.then: // has two predecessors, both of them contains conditional branch.
> +///   ......
> +///   br label %if.end;
> +///
> +/// After:
> +///  ......
> +///  %cmp10 = fcmp une float %tmp1, %tmp2
> +///  ......
> +///  %cmp11 = fcmp une float %tmp3, %tmp4
> +///  %cmp12 = or i1 %cmp10, %cmp11    // parallel-or mode.
> +///  br i1 %cmp12, label %if.then, label %ifend
> +///
> +///  if.end:
> +///    ......
> +/// 
> +///  if.then:
> +///    ......
> +///    br label %if.end;
> +///
> +///  Current implementation handles two cases.
> +///  Case 1: \param BB is on the else-path.
> +///
> +///          BB1
> +///        /     |
> +///       BB2    |
> +///      /   \   |
> +///     BB3   \  |     where, BB1, BB2 contain conditional branches.
> +///      \    |  /     BB3 contains unconditional branch.
> +///       \   | /      BB4 corresponds to \param BB which is also the merge.
> +///  BB => BB4
> +///
> +///
> +///  Corresponding source code:
> +///  
> +///  if (a == b && c == d)
> +///    statement; // BB3
> +///
> +///  Case 2: \param BB BB is on the then-path.
> +///
> +///             BB1
> +///          /      |
> +///         |      BB2
> +///         \    /    |  where BB1, BB2 contain conditional branches.
> +///  BB =>   BB3      |  BB3 contains unconditiona branch and corresponds
> +///           \     /    to \param BB.  BB4 is the merge.
> +///             BB4
> +///
> +///  Corresponding source code:
> +///
> +///  if (a == b || c == d)
> +///    statement;  // BB3
> +///
> +///  In both cases,  \param BB is the common successor of conditional branches.
> +///  In Case 1, \param BB (BB4) has an unconditional branch (BB3) as
> +///  its predecessor.  In Case 2, \param BB (BB3) only has conditional branches
> +///  as its predecessors.
> +///
> +bool SimplifyCFGOpt::SimplifyParallelAndOr(BasicBlock *BB, IRBuilder<> &Builder,
> +                                           Pass *P) {
> +  PHINode *PHI = dyn_cast<PHINode>(&BB->front());
> +  if (PHI)
> +    return false; // For simplicity, avoid cases containing PHI nodes.
> +  
> +  BasicBlock *LCond = NULL;
> +  BasicBlock *FCond = NULL;
> +  BasicBlock *UCond = NULL;
> +  int Idx = -1;
> +
> +  SmallSetVector<BasicBlock *, 16> Preds(pred_begin(BB), pred_end(BB));
> +  // Check predecessors of \param BB.
> +  for (SmallSetVector<BasicBlock *, 16>::iterator PI = Preds.begin(),
> +                                                  PE = Preds.end();
> +       PI != PE; ++PI) {
> +    BasicBlock *Pred = *PI;
> +    TerminatorInst *PTI = Pred->getTerminator();
> +    BranchInst *PBI = dyn_cast<BranchInst>(PTI);
> +
> +    // All predecessors should terminate with a branch.
> +    if (!PBI)
> +      return false;
> +        
> +    BasicBlock *PP = Pred->getSinglePredecessor();
> +       
> +    if (PBI->isUnconditional()) {
> +      // Case 1: Pred (BB3) is an unconditional block, it should
> +      // have a single successor and a single predecessor (BB2) that
> +      // is also a predecessor of \param BB (BB4) and should not have
> +      // address-taken.  There should exist only one such unconditional
> +      // branch among the predecessors.
> +      if (UCond || !PP || (Preds.count(PP) == 0) ||
> +          (PTI->getNumSuccessors() != 1) || Pred->hasAddressTaken())
> +        return false;
> + 
> +      UCond = Pred;
> +      continue;
> +    }
> +
> +    // Only conditional branches are allowed beyond this point.
> +    if (!PBI->isConditional())
> +      return false;
> +
> +    // Condition's unique use should be the branch instruction.
> +    Value *PC = PBI->getCondition();
> +    if (!PC || (PC->getNumUses() != 1))
> +      return false;
> +
> +    if (PP && (Preds.count(PP) != 0)) {
> +      // These are internal condition blocks to be merged from, e.g.,
> +      // BB2 in both cases.
> +      // Should not be address-taken.
> +      if (Pred->hasAddressTaken())
> +        return false;
> +
> +      // Instructions in the internal condition blocks should be safe
> +      // to hoist up.
> +      for (BasicBlock::iterator BI = Pred->begin(), BE = PBI; BI != BE;) {
> +        Instruction *CI = BI++;
> +        if (isa<PHINode>(CI) || CI->mayHaveSideEffects() ||
> +            !isSafeToSpeculativelyExecute(CI))
> +          return false;
> +      }
> +    } else {
> +      // This is the condition block to be merged into, e.g. BB1 in
> +      // both cases.
> +      if (FCond)
> +        return false;
> +      FCond = Pred;
> +    }
> +
> +    // The terminator must have exactly two successors.
> +    if (PTI->getNumSuccessors() != 2)
> +      return false;
> +
> +    // Find whether BB is uniformly on the true (or false) path
> +    // for all of its predecessors.
> +    BasicBlock *PS1 = PTI->getSuccessor(0);
> +    BasicBlock *PS2 = PTI->getSuccessor(1);
> +    BasicBlock *PS = (PS1 == BB) ? PS2 : PS1;
> +    int CIdx = (PS1 == BB) ? 0 : 1;
> +
> +    if (Idx == -1)
> +      Idx = CIdx;
> +    else if (CIdx != Idx)
> +      return false;
> +
> +    // PS is the successor which is not BB. Check successors to identify
> +    // the last conditional branch.
> +    if (Preds.count(PS) == 0) {
> +      // Case 2.
> +      // BB must have an unique successor.
> +      TerminatorInst *TBB = BB->getTerminator();
> +      if (TBB->getNumSuccessors() != 1)
> +        return false;
> +
> +      BasicBlock *SBB = TBB->getSuccessor(0);
> +      PHI = dyn_cast<PHINode>(&SBB->front());
> +      if (PHI)
> +        return false;
> +
> +      // PS (BB4) should be BB's successor.
> +      if (SBB != PS)
> +        return false;
> +      LCond = Pred;
> +    } else {
> +      TerminatorInst *TPS = PS->getTerminator();
> +      BranchInst *BPS = dyn_cast<BranchInst>(TPS);
> +      if (BPS && BPS->isUnconditional()) {
> +        // Case 1: PS(BB3) should be an unconditional branch.
> +        LCond = Pred;
> +      }
> +    }
> +  }
> +
> +  if (FCond && LCond && (FCond != LCond)) {
> +    // Do the transformation.
> +    BasicBlock *CB;
> +    bool ITER = true;
> +    BasicBlock::iterator ItOld = Builder.GetInsertPoint();
> +    TerminatorInst *PTI = FCond->getTerminator();
> +    BranchInst *PBI = dyn_cast<BranchInst>(PTI);
> +    Value *PC = PBI->getCondition();
> +    do {
> +      CB = PBI->getSuccessor(1 - Idx);
> +      // Delete the conditional branch.
> +      FCond->getInstList().pop_back();
> +      FCond->getInstList().splice(FCond->end(), CB->getInstList());
> +      PTI = FCond->getTerminator();
> +      PBI = dyn_cast<BranchInst>(PTI);
> +      Value *CC = PBI->getCondition();
> +      // Merge conditions.
> +      Builder.SetInsertPoint(PTI);
> +      Value *NC;
> +      if (Idx == 0)
> +        // Case 2, use parallel or.
> +        NC = Builder.CreateOr(PC, CC);
> +      else
> +        // Case 1, use parallel and.
> +        NC = Builder.CreateAnd(PC, CC);
> +    
> +      PBI->replaceUsesOfWith(CC, NC);
> +      PC = NC;
> +      if (CB == LCond)
> +        ITER = false;
> +      // Remove internal conditional branches.
> +      CB->dropAllReferences();
> +      // make CB unreachable and let downstream to delete the block.
> +      new UnreachableInst(CB->getContext(), CB);
> +    } while (ITER);
> +    
> +    Builder.SetInsertPoint(ItOld);
> +    DEBUG(dbgs() << "Use parallel and/or in:\n" << *FCond);
> +    return true;
> +  }
> +  
> +  return false;
> +}
> +
> +/// Compare blocks from two if-regions, where \param Head1 is the head of the
> +/// 1st if-region. \param Head2 is the head of the 2nd if-region. \param
> +/// Block1 is a block in the 1st if-region to compare. \param Block2 is a block
> +//  in the 2nd if-region to compare.  \returns true if the blocks have identical
> +/// instructions that do not alias with \param Head2 and it is legal to merge
> +/// the two blocks so that only one instance of each instruction is kept.
> +///
> +bool SimplifyCFGOpt::CompareBlock(BasicBlock *Head1, BasicBlock *Head2,
> +                                  BasicBlock *Block1, BasicBlock *Block2) {
> +  TerminatorInst *PTI2 = Head2->getTerminator();
> +  Instruction *PBI2 = Head2->begin();
> +
> +  if (Block1 == Head1) {
> +    if (Block2 != Head2)
> +      return false;
> +  } else if (Block2 == Head2)
> +    return false;
> +  else {
> +    // Check whether instructions in Block1 and Block2 are identical
> +    // and do not alias with instructions in Head2.
> +    BasicBlock::iterator iter1 = Block1->begin();
> +    BasicBlock::iterator end1 = Block1->getTerminator();
> +    BasicBlock::iterator iter2 = Block2->begin();
> +    BasicBlock::iterator end2 = Block2->getTerminator();
> +
> +    while (1) {
> +      if (iter1 == end1) {
> +        if (iter2 != end2)
> +          return false;
> +        break;
> +      }
> +
> +      if (!iter1->isIdenticalTo(iter2))
> +        return false;
> +     
> +      // Illegal to remove instructions with side effects except
> +      // non-volatile stores.
> +      if (iter1->mayHaveSideEffects()) {
> +        Instruction *CurI = &*iter1;
> +        StoreInst *SI = dyn_cast<StoreInst>(CurI);
> +        if (!SI || SI->isVolatile())
> +          return false;
> +      }
> +
> +      // For simplicity and speed, data dependency check can be
> +      // avoided if read from memory doesn't exist.
> +      if (iter1->mayReadFromMemory())
> +        return false;
> +      
> +      if (iter1->mayWriteToMemory()) {
> +        for (BasicBlock::iterator BI = PBI2, BE = PTI2; BI != BE; ++BI) {
> +          if (BI->mayReadFromMemory() || BI->mayWriteToMemory()) {
> +            // Check alias with Head2.
> +            if (!AA || AA->alias(iter1, BI))
> +              return false;
> +          }
> +        }
> +      }
> +      ++iter1;
> +      ++iter2;
> +    }
> +    ;
> +  }
> +
> +  return true;
> +}
> +
> +/// Check whether \param BB is the merge block of a if-region.  If yes, check
> +/// whether there exists an adjacent if-region upstream, the two if-regions
> +/// contain identical instuctions and can be legally merged.  \returns true if
> +/// the two if-regions are merged.
> +///
> +/// From:
> +/// if (a)
> +///   statement;
> +/// if (b)
> +///   statement;
> +///
> +/// To:
> +/// if (a || b)
> +///   statement;
> +///
> +bool SimplifyCFGOpt::MergeIfRegion(BasicBlock *BB, IRBuilder<> &Builder,
> +                                   Pass *P) {
> +  BasicBlock *IfTrue2, *IfFalse2;
> +  Value *IfCond2 = GetIfCondition(BB, IfTrue2, IfFalse2);
> +  if (!IfCond2)
> +    return false;
> +
> +  Instruction *CInst2 = dyn_cast<Instruction>(IfCond2);
> +  if (!CInst2)
> +    return false;
> +
> +  BasicBlock *Head2 = CInst2->getParent();
> +  if (Head2->hasAddressTaken())
> +    return false;
> +
> +  BasicBlock *IfTrue1, *IfFalse1;
> +  Value *IfCond1 = GetIfCondition(Head2, IfTrue1, IfFalse1);
> +  if (!IfCond1)
> +    return false;
> +
> +  Instruction *CInst1 = dyn_cast<Instruction>(IfCond1);
> +  if (!CInst1)
> +    return false;
> +
> +  BasicBlock *Head1 = CInst1->getParent();
> +
> +  // Either then-path or else-path should be empty.
> +  if ((IfTrue1 != Head1) && (IfFalse1 != Head1))
> +    return false;
> +  if ((IfTrue2 != Head2) && (IfFalse2 != Head2))
> +    return false;
> +
> +  TerminatorInst *PTI2 = Head2->getTerminator();
> +  Instruction *PBI2 = Head2->begin();
> +
> +  if (!CompareBlock(Head1, Head2, IfTrue1, IfTrue2))
> +    return false;
> +
> +  if (!CompareBlock(Head1, Head2, IfFalse1, IfFalse2))
> +    return false;
> +
> +  // Check whether \param Head2 has side-effect and is safe to speculate.
> +  for (BasicBlock::iterator BI = PBI2, BE = PTI2; BI != BE; ++BI) {
> +    Instruction *CI = BI;
> +    if (isa<PHINode>(CI) || CI->mayHaveSideEffects() ||
> +        !isSafeToSpeculativelyExecute(CI))
> +      return false;
> +  }
> +
> +  // Merge \param Head2 into \param Head1.
> +  Head1->getInstList().pop_back();
> +  Head1->getInstList().splice(Head1->end(), Head2->getInstList());
> +  TerminatorInst *PTI = Head1->getTerminator();
> +  BranchInst *PBI = dyn_cast<BranchInst>(PTI);
> +  Value *CC = PBI->getCondition();
> +  BasicBlock::iterator ItOld = Builder.GetInsertPoint();
> +  Builder.SetInsertPoint(PTI);
> +  Value *NC = Builder.CreateOr(CInst1, CC);
> +  PBI->replaceUsesOfWith(CC, NC);
> +  Builder.SetInsertPoint(ItOld);
> +
> +  // Remove IfTrue1
> +  if (IfTrue1 != Head1) {
> +    IfTrue1->dropAllReferences();
> +    IfTrue1->eraseFromParent();
> +  }
> +
> +  // Remove IfFalse1
> +  if (IfFalse1 != Head1) {
> +    IfFalse1->dropAllReferences();
> +    IfFalse1->eraseFromParent();
> +  }
> +  
> +  // Remove \param Head2
> +  Head2->dropAllReferences();
> +  Head2->eraseFromParent();
> +  DEBUG(dbgs() << "If conditions merged into:\n" << *Head1);
> +  return true;
> +}
> +
>  /// Check if passing a value to an instruction will cause undefined behavior.
>  static bool passingValueIsAlwaysUndefined(Value *V, Instruction *I) {
>    Constant *C = dyn_cast<Constant>(V);
> @@ -4142,6 +4579,10 @@
>  
>    IRBuilder<> Builder(BB);
>  
> +  if (ParallelAndOr && TTI.hasBranchDivergence() &&
> +      SimplifyParallelAndOr(BB, Builder))
> +    return true;
> +
>    // If there is a trivial two-entry PHI node in this basic block, and we can
>    // eliminate it, do so now.
>    if (PHINode *PN = dyn_cast<PHINode>(BB->begin()))
> @@ -4169,6 +4610,9 @@
>      if (SimplifyIndirectBr(IBI)) return true;
>    }
>  
> +  if (ParallelAndOr && TTI.hasBranchDivergence() && MergeIfRegion(BB, Builder))
> +    return true;
> +
>    return Changed;
>  }
>  
> @@ -4178,6 +4622,6 @@
>  /// of the CFG.  It returns true if a modification was made.
>  ///
>  bool llvm::SimplifyCFG(BasicBlock *BB, const TargetTransformInfo &TTI,
> -                       const DataLayout *TD) {
> -  return SimplifyCFGOpt(TTI, TD).run(BB);
> +                       const DataLayout *TD, AliasAnalysis *AA) {
> +  return SimplifyCFGOpt(TTI, TD, AA).run(BB);
>  }
> Index: lib/Transforms/Scalar/SimplifyCFGPass.cpp
> ===================================================================
> --- lib/Transforms/Scalar/SimplifyCFGPass.cpp	(revision 184607)
> +++ lib/Transforms/Scalar/SimplifyCFGPass.cpp	(working copy)
> @@ -27,6 +27,7 @@
>  #include "llvm/ADT/SmallVector.h"
>  #include "llvm/ADT/Statistic.h"
>  #include "llvm/Analysis/TargetTransformInfo.h"
> +#include "llvm/Analysis/AliasAnalysis.h"
>  #include "llvm/IR/Attributes.h"
>  #include "llvm/IR/Constants.h"
>  #include "llvm/IR/DataLayout.h"
> @@ -49,10 +50,14 @@
>  
>      virtual bool runOnFunction(Function &F);
>  
> -    virtual void getAnalysisUsage(AnalysisUsage &AU) const {
> -      AU.addRequired<TargetTransformInfo>();
> -    }
> -  };
> +  virtual void getAnalysisUsage(AnalysisUsage &AU) const {
> +    AU.addRequired<TargetTransformInfo>();
> +    AU.addRequired<AliasAnalysis>();
> +    AU.addRequired<AliasAnalysis>();
> +  }
> +private:
> +  AliasAnalysis *AA;
> +};
>  }
>  
>  char CFGSimplifyPass::ID = 0;
> @@ -301,7 +306,7 @@
>  /// iterativelySimplifyCFG - Call SimplifyCFG on all the blocks in the function,
>  /// iterating until no more changes are made.
>  static bool iterativelySimplifyCFG(Function &F, const TargetTransformInfo &TTI,
> -                                   const DataLayout *TD) {
> +                                   const DataLayout *TD, AliasAnalysis * AA) {
>    bool Changed = false;
>    bool LocalChange = true;
>    while (LocalChange) {
> @@ -310,7 +315,7 @@
>      // Loop over all of the basic blocks and remove them if they are unneeded...
>      //
>      for (Function::iterator BBIt = F.begin(); BBIt != F.end(); ) {
> -      if (SimplifyCFG(BBIt++, TTI, TD)) {
> +      if (SimplifyCFG(BBIt++, TTI, TD, AA)) {
>          LocalChange = true;
>          ++NumSimpl;
>        }
> @@ -324,11 +329,12 @@
>  // simplify the CFG.
>  //
>  bool CFGSimplifyPass::runOnFunction(Function &F) {
> +  AA = &getAnalysis<AliasAnalysis>();
>    const TargetTransformInfo &TTI = getAnalysis<TargetTransformInfo>();
>    const DataLayout *TD = getAnalysisIfAvailable<DataLayout>();
>    bool EverChanged = removeUnreachableBlocksFromFn(F);
>    EverChanged |= mergeEmptyReturnBlocks(F);
> -  EverChanged |= iterativelySimplifyCFG(F, TTI, TD);
> +  EverChanged |= iterativelySimplifyCFG(F, TTI, TD, AA);
>  
>    // If neither pass changed anything, we're done.
>    if (!EverChanged) return false;
> @@ -342,7 +348,7 @@
>      return true;
>  
>    do {
> -    EverChanged = iterativelySimplifyCFG(F, TTI, TD);
> +    EverChanged = iterativelySimplifyCFG(F, TTI, TD, AA);
>      EverChanged |= removeUnreachableBlocksFromFn(F);
>    } while (EverChanged);
>  
> Index: lib/CodeGen/BasicTargetTransformInfo.cpp
> ===================================================================
> --- lib/CodeGen/BasicTargetTransformInfo.cpp	(revision 183763)
> +++ lib/CodeGen/BasicTargetTransformInfo.cpp	(working copy)
> @@ -63,6 +63,8 @@
>      return this;
>    }
>  
> +  virtual bool hasBranchDivergence() const;
> +
>    /// \name Scalar TTI Implementations
>    /// @{
>  
> @@ -122,6 +124,7 @@
>    return new BasicTTI(TLI);
>  }
>  
> +bool BasicTTI::hasBranchDivergence() const { return false; }
>  
>  bool BasicTTI::isLegalAddImmediate(int64_t imm) const {
>    return TLI->isLegalAddImmediate(imm);


-------------- next part --------------
A non-text attachment was scrubbed...
Name: odc_v2
Type: application/octet-stream
Size: 34960 bytes
Desc: odc_v2
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130708/af6d4b7b/attachment.obj>


More information about the llvm-commits mailing list