[patch] simplifyCFG for review

Mon Jul 15 14:10:35 PDT 2013

I am ok with adding this transformation into simplifycfg and have it being controlled by a target hook. I think it's a reasonable compromise for now. In the future, once we start to accumulate more GPU specific transformations, I'd prefer they get factored out into a separate passes (or even libraries).

As for the specific patch, I haven't had a chance to read all of it. I'm somewhat concerned that now that simplifycfg requires AA. That adds overhead for all targets. Is it really necessary?

Evan

On Jul 12, 2013, at 3:39 PM, "Ye, Mei" <Mei.Ye at amd.com> wrote:

> The fundamental issue is whether target-specific works are allowed inside "machine-independent" passes.  In fact, almost all compiler optimizations are target-dependent.  Having an good infrastructure to enable target-specific tunings or optimizations will promote code-sharing and code-quality.  The common practice that I have seen is that compiler vendors keep their work "if-defed" inside their own branches and do not contribute back to the trunk.  There are IP and business strategy reasons.  But in the long run, code base diverges and merging gets more difficult if not impossible.  Not only the vendors will suffer, but also the larger community if less and less contributions flow back into the trunk.
> 
> If we have a good repository of optimizations, each target can selectively enable/disable them to avoid unexpected performance impact.  To safeguard the correctness, the community can create a super target (just clone X86) that tests all optimizations.  
> 
> From the aspect of implementation, this patch fits into SimplifyCFG quite naturally.  It is often advantageous to change the "machine-independent" codes since you already traverse the whole function and reach the point where certain patterns are recognized in a similar pass.   Adding a new optimization/transformation at that point is free, while creating a separated pass will have to repeat works and increases compilation time.  JIT and on-line compilation models are very stingy in compilation time.  I unfortunately live in a world of compiling-from-source model, adding a new pass is prohibited in my organization.   Besides, this patch is not GPU specific. It is good for CPU in general unless you happen to have patterns of very expensive floating point comparisons.  Even in that case, it might still win due to reduction in branches.  
> 
> And, yes, GPU needs to do CFG transformations.  But code gen can introduce new control flows.  Vendors will need to recognize irreducible CFG in their jitters anyway as a verification.
> 
> -Mei 
> 
> 
> -----Original Message-----
> From: Nick Lewycky [mailto:nicholas at mxc.ca] 
> Sent: Thursday, July 11, 2013 10:40 PM
> To: Ye, Mei
> Cc: Tom Stellard; llvm-commits at cs.uiuc.edu
> Subject: Re: [patch] simplifyCFG for review
> 
> The fundamental issue with this patch is that you're saying that 
> simplifycfg should be responsible for structuring the CFG differently 
> based on whether the target supports branches that go different ways in 
> different threads (a common GPU restriction).
> 
> We try very, very hard to avoid doing target-specific things inside 
> passes like this. Doing so requires an analysis of why it couldn't be 
> done differently. (Don't GPUs have problems with irreducable CFG anyhow? 
> Don't you already need to rewrite the CFG for that anyways? Why not have 
> a GPU CFG simplification pass right before CodeGenPrep?)
> 
> I have some comments on the details of the patch, but the above issue is 
> big and needs wider discussion and consensus first.
> 
> Nick
> 
> Ye, Mei wrote:
>> Thanks Tom.   Attached is a new patch that adds comments as requested. I also made a new diff against most-recent trunk.  But it looks like many files have locks, so some diffs are based on an earlier revision.
>> 
>> With regard to your question:
>> "I'm a little confused about why we need to add a BasicTargetTransformInfo and
>> also an AMDGPUTargetTransformInfo.  What is the reason for this?"
>> 
>> My answer is that same thing happens to X86, ARM, and other targets.  I didn't track whether BasicTargetTransformInfo is always needed, but I think you can build compiler for more than one targets, and for targets that you do not supply a target-spefici Tranform Info, you can always default back to use the basic one.
>> 
>> -Mei
>> 
>> 
>> -----Original Message-----
>> From: Tom Stellard [mailto:tom at stellard.net]
>> Sent: Monday, July 08, 2013 11:14 AM
>> To: Ye, Mei
>> Cc: Evan Cheng; Renato Golin; Sean Silva; llvm-commits at cs.uiuc.edu
>> Subject: Re: [patch] simplifyCFG for review
>> 
>> On Fri, Jun 28, 2013 at 09:46:13PM +0000, Ye, Mei wrote:
>>> Hi all
>>> 
>>> Thank you for your comments.   Attached is an updated patch.
>>> - add TargetTransformInfo to R600 .
>>> - add hasBranchDivergence to query whether a target has branch divergence.
>>> - Invoke branch reduction opts only when the underlying target has branch divergence.  So currently it only gets triggered for R600.
>>> - fixed a bug in opt.cpp
>>> - correct clang-format issues.
>>> 
>>> I have tested the correctness on x86 (by adding branch divergence to X86TargetTransformInfo in my local workspace) using existing testing infrastructure and CPU2006, CPU2000.  Tom Stellard agrees to run R600 testings (thanks a lot, Tom).
>>> 
>>> -Mei
>> 
>>> Index: test/Transforms/SimplifyCFG/R600/lit.local.cfg
>>> ===================================================================
>>> --- test/Transforms/SimplifyCFG/R600/lit.local.cfg	(revision 0)
>>> +++ test/Transforms/SimplifyCFG/R600/lit.local.cfg	(revision 0)
>>> @@ -0,0 +1,6 @@
>>> +config.suffixes = ['.ll', '.c', '.cpp']
>>> +
>>> +targets = set(config.root.targets_to_build.split())
>>> +if not 'R600' in targets:
>>> +    config.unsupported = True
>>> +
>>> Index: test/Transforms/SimplifyCFG/R600/parallelorifcollapse.ll
>>> ===================================================================
>>> --- test/Transforms/SimplifyCFG/R600/parallelorifcollapse.ll	(revision 0)
>>> +++ test/Transforms/SimplifyCFG/R600/parallelorifcollapse.ll	(revision 0)
>>> @@ -0,0 +1,51 @@
>>> +; Function Attrs: nounwind
>>> +; RUN: opt<  %s -mtriple=r600-unknown-linux-gnu -simplifycfg -basicaa -S | FileCheck %s
>>> +; CHECK: or i1
>>> +; CHECK-NEXT: br
>>> +; CHECK: br
>>> +; CHECK: ret
>> 
>> It's not clear to me what the expected output of this test is supposed
>> to be.  Could you add a comment explaining what the simplifycfg pass is
>> supposed to be doing here.
>> 
>>> +define void @_Z9chk1D_512v() #0 {
>>> +entry:
>>> +  %a0 = alloca i32, align 4
>>> +  %b0 = alloca i32, align 4
>>> +  %c0 = alloca i32, align 4
>>> +  %d0 = alloca i32, align 4
>>> +  %a1 = alloca i32, align 4
>>> +  %b1 = alloca i32, align 4
>>> +  %c1 = alloca i32, align 4
>>> +  %d1 = alloca i32, align 4
>>> +  %data = alloca i32, align 4
>>> +  %0 = load i32* %a0, align 4
>>> +  %1 = load i32* %b0, align 4
>>> +  %cmp = icmp ne i32 %0, %1
>>> +  br i1 %cmp, label %land.lhs.true, label %if.end
>>> +
>>> +land.lhs.true:                                    ; preds = %entry
>>> +  %2 = load i32* %c0, align 4
>>> +  %3 = load i32* %d0, align 4
>>> +  %cmp1 = icmp ne i32 %2, %3
>>> +  br i1 %cmp1, label %if.then, label %if.end
>>> +
>>> +if.then:                                          ; preds = %land.lhs.true
>>> +  store i32 1, i32* %data, align 4
>>> +  br label %if.end
>>> +
>>> +if.end:                                           ; preds = %if.then, %land.lhs.true, %entry
>>> +  %4 = load i32* %a1, align 4
>>> +  %5 = load i32* %b1, align 4
>>> +  %cmp2 = icmp ne i32 %4, %5
>>> +  br i1 %cmp2, label %land.lhs.true3, label %if.end6
>>> +
>>> +land.lhs.true3:                                   ; preds = %if.end
>>> +  %6 = load i32* %c1, align 4
>>> +  %7 = load i32* %d1, align 4
>>> +  %cmp4 = icmp ne i32 %6, %7
>>> +  br i1 %cmp4, label %if.then5, label %if.end6
>>> +
>>> +if.then5:                                         ; preds = %land.lhs.true3
>>> +  store i32 1, i32* %data, align 4
>>> +  br label %if.end6
>>> +
>>> +if.end6:                                          ; preds = %if.then5, %land.lhs.true3, %if.end
>>> +  ret void
>>> +}
>>> Index: test/Transforms/SimplifyCFG/R600/parallelandifcollapse.ll
>>> ===================================================================
>>> --- test/Transforms/SimplifyCFG/R600/parallelandifcollapse.ll	(revision 0)
>>> +++ test/Transforms/SimplifyCFG/R600/parallelandifcollapse.ll	(revision 0)
>>> @@ -0,0 +1,58 @@
>>> +; Function Attrs: nounwind
>>> +; RUN: opt<  %s -mtriple=r600-unknown-linux-gnu -simplifycfg -basicaa -S | FileCheck %s
>>> +; CHECK: or i1
>>> +; CHECK-NEXT: br
>>> +; CHECK: br
>>> +; CHECK: ret
>> 
>> Same thing here, a comment explaining what the expected output should be
>> would be helpful.
>> 
>>> +define void @_Z9chk1D_512v() #0 {
>>> +entry:
>>> +  %a0 = alloca i32, align 4
>>> +  %b0 = alloca i32, align 4
>>> +  %c0 = alloca i32, align 4
>>> +  %d0 = alloca i32, align 4
>>> +  %a1 = alloca i32, align 4
>>> +  %b1 = alloca i32, align 4
>>> +  %c1 = alloca i32, align 4
>>> +  %d1 = alloca i32, align 4
>>> +  %data = alloca i32, align 4
>>> +  %0 = load i32* %a0, align 4
>>> +  %1 = load i32* %b0, align 4
>>> +  %cmp = icmp ne i32 %0, %1
>>> +  br i1 %cmp, label %land.lhs.true, label %if.else
>>> +
>>> +land.lhs.true:                                    ; preds = %entry
>>> +  %2 = load i32* %c0, align 4
>>> +  %3 = load i32* %d0, align 4
>>> +  %cmp1 = icmp ne i32 %2, %3
>>> +  br i1 %cmp1, label %if.then, label %if.else
>>> +
>>> +if.then:                                          ; preds = %land.lhs.true
>>> +  br label %if.end
>>> +
>>> +if.else:                                          ; preds = %land.lhs.true, %entry
>>> +  store i32 1, i32* %data, align 4
>>> +  br label %if.end
>>> +
>>> +if.end:                                           ; preds = %if.else, %if.then
>>> +  %4 = load i32* %a1, align 4
>>> +  %5 = load i32* %b1, align 4
>>> +  %cmp2 = icmp ne i32 %4, %5
>>> +  br i1 %cmp2, label %land.lhs.true3, label %if.else6
>>> +
>>> +land.lhs.true3:                                   ; preds = %if.end
>>> +  %6 = load i32* %c1, align 4
>>> +  %7 = load i32* %d1, align 4
>>> +  %cmp4 = icmp ne i32 %6, %7
>>> +  br i1 %cmp4, label %if.then5, label %if.else6
>>> +
>>> +if.then5:                                         ; preds = %land.lhs.true3
>>> +  br label %if.end7
>>> +
>>> +if.else6:                                         ; preds = %land.lhs.true3, %if.end
>>> +  store i32 1, i32* %data, align 4
>>> +  br label %if.end7
>>> +
>>> +if.end7:                                          ; preds = %if.else6, %if.then5
>>> +  ret void
>>> +}
>>> +
>>> Index: include/llvm/Analysis/TargetTransformInfo.h
>>> ===================================================================
>>> --- include/llvm/Analysis/TargetTransformInfo.h	(revision 183763)
>>> +++ include/llvm/Analysis/TargetTransformInfo.h	(working copy)
>>> @@ -171,6 +171,9 @@
>>>    /// comments for a detailed explanation of the cost values.
>>>    virtual unsigned getUserCost(const User *U) const;
>>> 
>>> +  /// \brief hasBranchDivergence - Return true if branch divergence exists.
>>> +  virtual bool hasBranchDivergence() const;
>>> +
>> 
>> What exactly is branch divergence?  Could you explain this more in
>> the comment and give an example.
>> 
>>>    /// \brief Test whether calls to a function lower to actual program function
>>>    /// calls.
>>>    ///
>>> Index: include/llvm/Transforms/Utils/Local.h
>>> ===================================================================
>>> --- include/llvm/Transforms/Utils/Local.h	(revision 184607)
>>> +++ include/llvm/Transforms/Utils/Local.h	(working copy)
>>> @@ -39,6 +39,7 @@
>>>  class TargetLibraryInfo;
>>>  class TargetTransformInfo;
>>>  class DIBuilder;
>>> +class AliasAnalysis;
>>> 
>>>  template<typename T>  class SmallVectorImpl;
>>> 
>>> @@ -136,7 +137,7 @@
>>>  /// the basic block that was pointed to.
>>>  ///
>>>  bool SimplifyCFG(BasicBlock *BB, const TargetTransformInfo&TTI,
>>> -                 const DataLayout *TD = 0);
>>> +                 const DataLayout *TD = 0, AliasAnalysis * AA = 0);
>>> 
>>>  /// FoldBranchToCommonDest - If this basic block is ONLY a setcc and a branch,
>>>  /// and if a predecessor branches to us and one of our successors, fold the
>>> Index: tools/opt/opt.cpp
>>> ===================================================================
>>> --- tools/opt/opt.cpp	(revision 183763)
>>> +++ tools/opt/opt.cpp	(working copy)
>>> @@ -668,6 +668,9 @@
>>>      FPasses.reset(new FunctionPassManager(M.get()));
>>>      if (TD)
>>>        FPasses->add(new DataLayout(*TD));
>>> +    if (TM.get())
>>> +      TM->addAnalysisPasses(*FPasses);
>>> +
>>>    }
>>> 
>>>    if (PrintBreakpoints) {
>>> Index: lib/Analysis/TargetTransformInfo.cpp
>>> ===================================================================
>>> --- lib/Analysis/TargetTransformInfo.cpp	(revision 183763)
>>> +++ lib/Analysis/TargetTransformInfo.cpp	(working copy)
>>> @@ -88,6 +88,8 @@
>>>    return PrevTTI->getUserCost(U);
>>>  }
>>> 
>>> +bool TargetTransformInfo::hasBranchDivergence() const { return false; }
>>> +
>>>  bool TargetTransformInfo::isLoweredToCall(const Function *F) const {
>>>    return PrevTTI->isLoweredToCall(F);
>>>  }
>>> Index: lib/Target/R600/AMDGPUTargetMachine.cpp
>>> ===================================================================
>>> --- lib/Target/R600/AMDGPUTargetMachine.cpp	(revision 183763)
>>> +++ lib/Target/R600/AMDGPUTargetMachine.cpp	(working copy)
>>> @@ -105,6 +105,18 @@
>>>    return new AMDGPUPassConfig(this, PM);
>>>  }
>>> 
>>> +//===----------------------------------------------------------------------===//
>>> +// AMDGPU Analysis Pass Setup
>>> +//===----------------------------------------------------------------------===//
>>> +
>>> +void AMDGPUTargetMachine::addAnalysisPasses(PassManagerBase&PM) {
>>> +  // Add first the target-independent BasicTTI pass, then our AMDGPU pass. This
>>> +  // allows the AMDGPU pass to delegate to the target independent layer when
>>> +  // appropriate.
>>> +  PM.add(createBasicTargetTransformInfoPass(getTargetLowering()));
>>> +  PM.add(createAMDGPUTargetTransformInfoPass(this));
>> 
>> I'm a little confused about why we need to add a BasicTargetTransformInfo and
>> also an AMDGPUTargetTransformInfo.  What is the reason for this?
>> 
>>> +}
>>> +
>>>  bool
>>>  AMDGPUPassConfig::addPreISel() {
>>>    const AMDGPUSubtarget&ST = TM->getSubtarget<AMDGPUSubtarget>();
>>> Index: lib/Target/R600/AMDGPUTargetTransformInfo.cpp
>>> ===================================================================
>>> --- lib/Target/R600/AMDGPUTargetTransformInfo.cpp	(revision 0)
>>> +++ lib/Target/R600/AMDGPUTargetTransformInfo.cpp	(revision 0)
>> 
>> It looks like you forgot to add this new file to CMakeLists.txt.
>> 
>>> @@ -0,0 +1,90 @@
>>> +//===-- AMDGPUTargetTransformInfo.cpp - AMDGPU specific TTI pass
>>> +//----------------===//
>>> +//
>>> +//                     The LLVM Compiler Infrastructure
>>> +//
>>> +// This file is distributed under the University of Illinois Open Source
>>> +// License. See LICENSE.TXT for details.
>>> +//
>>> +//===----------------------------------------------------------------------===//
>>> +/// \file
>>> +/// This file implements a TargetTransformInfo analysis pass specific to the
>>> +/// AMDGPU target machine. It uses the target's detailed information to provide
>>> +/// more precise answers to certain TTI queries, while letting the target
>>> +/// independent and default TTI implementations handle the rest.
>>> +///
>>> +//===----------------------------------------------------------------------===//
>>> +
>>> +#define DEBUG_TYPE "AMDGPUtti"
>>> +#include "AMDGPU.h"
>>> +#include "AMDGPUTargetMachine.h"
>>> +#include "llvm/Analysis/TargetTransformInfo.h"
>>> +#include "llvm/Support/Debug.h"
>>> +#include "llvm/Target/TargetLowering.h"
>>> +#include "llvm/Target/CostTable.h"
>>> +using namespace llvm;
>>> +
>>> +// Declare the pass initialization routine locally as target-specific passes
>>> +// don't have a target-wide initialization entry point, and so we rely on the
>>> +// pass constructor initialization.
>>> +namespace llvm {
>>> +void initializeAMDGPUTTIPass(PassRegistry&);
>>> +}
>>> +
>>> +namespace {
>>> +
>>> +class AMDGPUTTI : public ImmutablePass, public TargetTransformInfo {
>>> +  const AMDGPUTargetMachine *TM;
>>> +  const AMDGPUSubtarget *ST;
>>> +  const AMDGPUTargetLowering *TLI;
>>> +
>>> +  /// Estimate the overhead of scalarizing an instruction. Insert and Extract
>>> +  /// are set if the result needs to be inserted and/or extracted from vectors.
>>> +  unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) const;
>>> +
>>> +public:
>>> +  AMDGPUTTI() : ImmutablePass(ID), TM(0), ST(0), TLI(0) {
>>> +    llvm_unreachable("This pass cannot be directly constructed");
>>> +  }
>>> +
>>> +  AMDGPUTTI(const AMDGPUTargetMachine *TM)
>>> +      : ImmutablePass(ID), TM(TM), ST(TM->getSubtargetImpl()),
>>> +        TLI(TM->getTargetLowering()) {
>>> +    initializeAMDGPUTTIPass(*PassRegistry::getPassRegistry());
>>> +  }
>>> +
>>> +  virtual void initializePass() { pushTTIStack(this); }
>>> +
>>> +  virtual void finalizePass() { popTTIStack(); }
>>> +
>>> +  virtual void getAnalysisUsage(AnalysisUsage&AU) const {
>>> +    TargetTransformInfo::getAnalysisUsage(AU);
>>> +  }
>>> +
>>> +  /// Pass identification.
>>> +  static char ID;
>>> +
>>> +  /// Provide necessary pointer adjustments for the two base classes.
>>> +  virtual void *getAdjustedAnalysisPointer(const void *ID) {
>>> +    if (ID ==&TargetTransformInfo::ID)
>>> +      return (TargetTransformInfo *)this;
>>> +    return this;
>>> +  }
>>> +
>>> +  virtual bool hasBranchDivergence() const;
>>> +
>>> +  /// @}
>>> +};
>>> +
>>> +} // end anonymous namespace
>>> +
>>> +INITIALIZE_AG_PASS(AMDGPUTTI, TargetTransformInfo, "AMDGPUtti",
>>> +                   "AMDGPU Target Transform Info", true, true, false)
>>> +char AMDGPUTTI::ID = 0;
>>> +
>>> +ImmutablePass *
>>> +llvm::createAMDGPUTargetTransformInfoPass(const AMDGPUTargetMachine *TM) {
>>> +  return new AMDGPUTTI(TM);
>>> +}
>>> +
>>> +bool AMDGPUTTI::hasBranchDivergence() const { return true; }
>>> Index: lib/Target/R600/AMDGPUTargetMachine.h
>>> ===================================================================
>>> --- lib/Target/R600/AMDGPUTargetMachine.h	(revision 183763)
>>> +++ lib/Target/R600/AMDGPUTargetMachine.h	(working copy)
>>> @@ -63,6 +63,9 @@
>>>    }
>>>    virtual const DataLayout *getDataLayout() const { return&Layout; }
>>>    virtual TargetPassConfig *createPassConfig(PassManagerBase&PM);
>>> +
>>> +  /// \brief Register R600 analysis passes with a pass manager.
>>> +  virtual void addAnalysisPasses(PassManagerBase&PM);
>>>  };
>>> 
>>>  } // End namespace llvm
>>> Index: lib/Target/R600/AMDGPU.h
>>> ===================================================================
>>> --- lib/Target/R600/AMDGPU.h	(revision 183763)
>>> +++ lib/Target/R600/AMDGPU.h	(working copy)
>>> @@ -46,6 +46,10 @@
>>>  FunctionPass *createAMDGPUIndirectAddressingPass(TargetMachine&tm);
>>>  FunctionPass *createAMDGPUISelDag(TargetMachine&tm);
>>> 
>>> +/// \brief Creates an AMDGPU-specific Target Transformation Info pass.
>>> +ImmutablePass *
>>> +    createAMDGPUTargetTransformInfoPass(const AMDGPUTargetMachine *TM);
>>> +
>>>  extern Target TheAMDGPUTarget;
>>> 
>>>  } // End namespace llvm
>>> Index: lib/Transforms/Utils/SimplifyCFG.cpp
>>> ===================================================================
>>> --- lib/Transforms/Utils/SimplifyCFG.cpp	(revision 184607)
>>> +++ lib/Transforms/Utils/SimplifyCFG.cpp	(working copy)
>>> @@ -17,8 +17,10 @@
>>>  #include "llvm/ADT/STLExtras.h"
>>>  #include "llvm/ADT/SetVector.h"
>>>  #include "llvm/ADT/SmallPtrSet.h"
>>> +#include "llvm/ADT/SmallSet.h"
>>>  #include "llvm/ADT/SmallVector.h"
>>>  #include "llvm/ADT/Statistic.h"
>>> +#include "llvm/Analysis/AliasAnalysis.h"
>>>  #include "llvm/Analysis/InstructionSimplify.h"
>>>  #include "llvm/Analysis/TargetTransformInfo.h"
>>>  #include "llvm/Analysis/ValueTracking.h"
>>> @@ -63,6 +65,10 @@
>>>  HoistCondStores("simplifycfg-hoist-cond-stores", cl::Hidden, cl::init(true),
>>>         cl::desc("Hoist conditional stores if an unconditional store preceeds"));
>>> 
>>> +static cl::opt<bool>
>>> +    ParallelAndOr("simplifycfg-parallel-and-or", cl::Hidden, cl::init(true),
>>> +                  cl::desc("Use parallel-and-or mode for branch conditions"));
>>> +
>>>  STATISTIC(NumBitMaps, "Number of switch instructions turned into bitmaps");
>>>  STATISTIC(NumLookupTables, "Number of switch instructions turned into lookup tables");
>>>  STATISTIC(NumSinkCommons, "Number of common instructions sunk down to the end block");
>>> @@ -88,6 +94,7 @@
>>>  class SimplifyCFGOpt {
>>>    const TargetTransformInfo&TTI;
>>>    const DataLayout *const TD;
>>> +  AliasAnalysis *AA;
>>> 
>>>    Value *isValueEqualityComparison(TerminatorInst *TI);
>>>    BasicBlock *GetValueEqualityComparisonCases(TerminatorInst *TI,
>>> @@ -105,10 +112,25 @@
>>>    bool SimplifyIndirectBr(IndirectBrInst *IBI);
>>>    bool SimplifyUncondBranch(BranchInst *BI, IRBuilder<>  &Builder);
>>>    bool SimplifyCondBranch(BranchInst *BI, IRBuilder<>&Builder);
>>> +  /// \brief Use parallel-and or parallel-or to generate conditions for
>>> +  /// conditional branches.
>>> +  bool SimplifyParallelAndOr(BasicBlock *BB, IRBuilder<>  &Builder, Pass *P = 0);
>>> +  /// \brief If \param BB is the merge block of an if-region, attempt to merge
>>> +  /// the if-region with an adjacent if-region upstream if two if-regions
>>> +  /// contain identical instructions.
>>> +  bool MergeIfRegion(BasicBlock *BB, IRBuilder<>  &Builder, Pass *P = 0);
>>> +  /// \brief Compare a pair of blocks: \param Block1 and \param Block2, which
>>> +  /// are from two if-regions whose head blocks are \param Head1 and \param
>>> +  /// Head2.  \returns true if \param Block1 and \param Block2 contain identical
>>> +  /// instructions, and none of the instructions alias with \param Head2.
>>> +  /// This is used as a legality check for merging if-regions.
>>> +  bool CompareBlock(BasicBlock *Head1, BasicBlock *Head2, BasicBlock *Block1,
>>> +                    BasicBlock *Block2);
>>> 
>>>  public:
>>> -  SimplifyCFGOpt(const TargetTransformInfo&TTI, const DataLayout *TD)
>>> -      : TTI(TTI), TD(TD) {}
>>> +  SimplifyCFGOpt(const TargetTransformInfo&TTI, const DataLayout *TD,
>>> +                 AliasAnalysis *aa)
>>> +      : TTI(TTI), TD(TD), AA(aa) {}
>>>    bool run(BasicBlock *BB);
>>>  };
>>>  }
>>> @@ -195,8 +217,8 @@
>>>  }
>>> 
>>> 
>>> -/// GetIfCondition - Given a basic block (BB) with two predecessors (and at
>>> -/// least one PHI node in it), check to see if the merge at this block is due
>>> +/// GetIfCondition - Given a basic block (BB) with two predecessors,
>>> +/// check to see if the merge at this block is due
>>>  /// to an "if condition".  If so, return the boolean condition that determines
>>>  /// which entry into BB will be taken.  Also, return by references the block
>>>  /// that will be entered from if the condition is true, and the block that will
>>> @@ -206,12 +228,31 @@
>>>  /// instructions in them.
>>>  static Value *GetIfCondition(BasicBlock *BB, BasicBlock *&IfTrue,
>>>                               BasicBlock *&IfFalse) {
>>> -  PHINode *SomePHI = cast<PHINode>(BB->begin());
>>> -  assert(SomePHI->getNumIncomingValues() == 2&&
>>> -         "Function can only handle blocks with 2 predecessors!");
>>> -  BasicBlock *Pred1 = SomePHI->getIncomingBlock(0);
>>> -  BasicBlock *Pred2 = SomePHI->getIncomingBlock(1);
>>> +  PHINode *SomePHI = dyn_cast<PHINode>(BB->begin());
>>> +  BasicBlock *Pred1 = NULL;
>>> +  BasicBlock *Pred2 = NULL;
>>> 
>>> +  if (SomePHI) {
>>> +    if (SomePHI->getNumIncomingValues() != 2)
>>> +       return NULL;
>>> +    Pred1 = SomePHI->getIncomingBlock(0);
>>> +    Pred2 = SomePHI->getIncomingBlock(1);
>>> +  } else {
>>> +    SmallSetVector<BasicBlock *, 16>  Preds(pred_begin(BB), pred_end(BB));
>>> +    for (SmallSetVector<BasicBlock *, 16>::iterator PI = Preds.begin(),
>>> +                                                    PE = Preds.end();
>>> +         PI != PE; ++PI) {
>>> +      if (Pred1 == NULL)
>>> +        Pred1 = *PI;
>>> +      else if (Pred2 == NULL)
>>> +        Pred2 = *PI;
>>> +      else
>>> +        return NULL;
>>> +    }
>>> +    if (!Pred1 || !Pred2)
>>> +       return NULL;
>>> +  }
>>> +
>>>    // We can only handle branches.  Other control flow will be lowered to
>>>    // branches if possible anyway.
>>>    BranchInst *Pred1Br = dyn_cast<BranchInst>(Pred1->getTerminator());
>>> @@ -4039,6 +4080,402 @@
>>>    return false;
>>>  }
>>> 
>>> +/// If \param [in] BB has more than one predecessor that is a conditional
>>> +/// branch, attempt to use parallel and/or for the branch condition. \returns
>>> +/// true on success.
>>> +///
>>> +/// Before:
>>> +///   ......
>>> +///   %cmp10 = fcmp une float %tmp1, %tmp2
>>> +///   br i1 %cmp1, label %if.then, label %lor.rhs
>>> +///
>>> +/// lor.rhs:
>>> +///   ......
>>> +///   %cmp11 = fcmp une float %tmp3, %tmp4
>>> +///   br i1 %cmp11, label %if.then, label %ifend
>>> +///
>>> +/// if.end:  // the merge block
>>> +///   ......
>>> +///
>>> +/// if.then: // has two predecessors, both of them contains conditional branch.
>>> +///   ......
>>> +///   br label %if.end;
>>> +///
>>> +/// After:
>>> +///  ......
>>> +///  %cmp10 = fcmp une float %tmp1, %tmp2
>>> +///  ......
>>> +///  %cmp11 = fcmp une float %tmp3, %tmp4
>>> +///  %cmp12 = or i1 %cmp10, %cmp11    // parallel-or mode.
>>> +///  br i1 %cmp12, label %if.then, label %ifend
>>> +///
>>> +///  if.end:
>>> +///    ......
>>> +///
>>> +///  if.then:
>>> +///    ......
>>> +///    br label %if.end;
>>> +///
>>> +///  Current implementation handles two cases.
>>> +///  Case 1: \param BB is on the else-path.
>>> +///
>>> +///          BB1
>>> +///        /     |
>>> +///       BB2    |
>>> +///      /   \   |
>>> +///     BB3   \  |     where, BB1, BB2 contain conditional branches.
>>> +///      \    |  /     BB3 contains unconditional branch.
>>> +///       \   | /      BB4 corresponds to \param BB which is also the merge.
>>> +///  BB =>  BB4
>>> +///
>>> +///
>>> +///  Corresponding source code:
>>> +///
>>> +///  if (a == b&&  c == d)
>>> +///    statement; // BB3
>>> +///
>>> +///  Case 2: \param BB BB is on the then-path.
>>> +///
>>> +///             BB1
>>> +///          /      |
>>> +///         |      BB2
>>> +///         \    /    |  where BB1, BB2 contain conditional branches.
>>> +///  BB =>    BB3      |  BB3 contains unconditiona branch and corresponds
>>> +///           \     /    to \param BB.  BB4 is the merge.
>>> +///             BB4
>>> +///
>>> +///  Corresponding source code:
>>> +///
>>> +///  if (a == b || c == d)
>>> +///    statement;  // BB3
>>> +///
>>> +///  In both cases,  \param BB is the common successor of conditional branches.
>>> +///  In Case 1, \param BB (BB4) has an unconditional branch (BB3) as
>>> +///  its predecessor.  In Case 2, \param BB (BB3) only has conditional branches
>>> +///  as its predecessors.
>>> +///
>>> +bool SimplifyCFGOpt::SimplifyParallelAndOr(BasicBlock *BB, IRBuilder<>  &Builder,
>>> +                                           Pass *P) {
>>> +  PHINode *PHI = dyn_cast<PHINode>(&BB->front());
>>> +  if (PHI)
>>> +    return false; // For simplicity, avoid cases containing PHI nodes.
>>> +
>>> +  BasicBlock *LCond = NULL;
>>> +  BasicBlock *FCond = NULL;
>>> +  BasicBlock *UCond = NULL;
>>> +  int Idx = -1;
>>> +
>>> +  SmallSetVector<BasicBlock *, 16>  Preds(pred_begin(BB), pred_end(BB));
>>> +  // Check predecessors of \param BB.
>>> +  for (SmallSetVector<BasicBlock *, 16>::iterator PI = Preds.begin(),
>>> +                                                  PE = Preds.end();
>>> +       PI != PE; ++PI) {
>>> +    BasicBlock *Pred = *PI;
>>> +    TerminatorInst *PTI = Pred->getTerminator();
>>> +    BranchInst *PBI = dyn_cast<BranchInst>(PTI);
>>> +
>>> +    // All predecessors should terminate with a branch.
>>> +    if (!PBI)
>>> +      return false;
>>> +
>>> +    BasicBlock *PP = Pred->getSinglePredecessor();
>>> +
>>> +    if (PBI->isUnconditional()) {
>>> +      // Case 1: Pred (BB3) is an unconditional block, it should
>>> +      // have a single successor and a single predecessor (BB2) that
>>> +      // is also a predecessor of \param BB (BB4) and should not have
>>> +      // address-taken.  There should exist only one such unconditional
>>> +      // branch among the predecessors.
>>> +      if (UCond || !PP || (Preds.count(PP) == 0) ||
>>> +          (PTI->getNumSuccessors() != 1) || Pred->hasAddressTaken())
>>> +        return false;
>>> +
>>> +      UCond = Pred;
>>> +      continue;
>>> +    }
>>> +
>>> +    // Only conditional branches are allowed beyond this point.
>>> +    if (!PBI->isConditional())
>>> +      return false;
>>> +
>>> +    // Condition's unique use should be the branch instruction.
>>> +    Value *PC = PBI->getCondition();
>>> +    if (!PC || (PC->getNumUses() != 1))
>>> +      return false;
>>> +
>>> +    if (PP&&  (Preds.count(PP) != 0)) {
>>> +      // These are internal condition blocks to be merged from, e.g.,
>>> +      // BB2 in both cases.
>>> +      // Should not be address-taken.
>>> +      if (Pred->hasAddressTaken())
>>> +        return false;
>>> +
>>> +      // Instructions in the internal condition blocks should be safe
>>> +      // to hoist up.
>>> +      for (BasicBlock::iterator BI = Pred->begin(), BE = PBI; BI != BE;) {
>>> +        Instruction *CI = BI++;
>>> +        if (isa<PHINode>(CI) || CI->mayHaveSideEffects() ||
>>> +            !isSafeToSpeculativelyExecute(CI))
>>> +          return false;
>>> +      }
>>> +    } else {
>>> +      // This is the condition block to be merged into, e.g. BB1 in
>>> +      // both cases.
>>> +      if (FCond)
>>> +        return false;
>>> +      FCond = Pred;
>>> +    }
>>> +
>>> +    // The terminator must have exactly two successors.
>>> +    if (PTI->getNumSuccessors() != 2)
>>> +      return false;
>>> +
>>> +    // Find whether BB is uniformly on the true (or false) path
>>> +    // for all of its predecessors.
>>> +    BasicBlock *PS1 = PTI->getSuccessor(0);
>>> +    BasicBlock *PS2 = PTI->getSuccessor(1);
>>> +    BasicBlock *PS = (PS1 == BB) ? PS2 : PS1;
>>> +    int CIdx = (PS1 == BB) ? 0 : 1;
>>> +
>>> +    if (Idx == -1)
>>> +      Idx = CIdx;
>>> +    else if (CIdx != Idx)
>>> +      return false;
>>> +
>>> +    // PS is the successor which is not BB. Check successors to identify
>>> +    // the last conditional branch.
>>> +    if (Preds.count(PS) == 0) {
>>> +      // Case 2.
>>> +      // BB must have an unique successor.
>>> +      TerminatorInst *TBB = BB->getTerminator();
>>> +      if (TBB->getNumSuccessors() != 1)
>>> +        return false;
>>> +
>>> +      BasicBlock *SBB = TBB->getSuccessor(0);
>>> +      PHI = dyn_cast<PHINode>(&SBB->front());
>>> +      if (PHI)
>>> +        return false;
>>> +
>>> +      // PS (BB4) should be BB's successor.
>>> +      if (SBB != PS)
>>> +        return false;
>>> +      LCond = Pred;
>>> +    } else {
>>> +      TerminatorInst *TPS = PS->getTerminator();
>>> +      BranchInst *BPS = dyn_cast<BranchInst>(TPS);
>>> +      if (BPS&&  BPS->isUnconditional()) {
>>> +        // Case 1: PS(BB3) should be an unconditional branch.
>>> +        LCond = Pred;
>>> +      }
>>> +    }
>>> +  }
>>> +
>>> +  if (FCond&&  LCond&&  (FCond != LCond)) {
>>> +    // Do the transformation.
>>> +    BasicBlock *CB;
>>> +    bool ITER = true;
>>> +    BasicBlock::iterator ItOld = Builder.GetInsertPoint();
>>> +    TerminatorInst *PTI = FCond->getTerminator();
>>> +    BranchInst *PBI = dyn_cast<BranchInst>(PTI);
>>> +    Value *PC = PBI->getCondition();
>>> +    do {
>>> +      CB = PBI->getSuccessor(1 - Idx);
>>> +      // Delete the conditional branch.
>>> +      FCond->getInstList().pop_back();
>>> +      FCond->getInstList().splice(FCond->end(), CB->getInstList());
>>> +      PTI = FCond->getTerminator();
>>> +      PBI = dyn_cast<BranchInst>(PTI);
>>> +      Value *CC = PBI->getCondition();
>>> +      // Merge conditions.
>>> +      Builder.SetInsertPoint(PTI);
>>> +      Value *NC;
>>> +      if (Idx == 0)
>>> +        // Case 2, use parallel or.
>>> +        NC = Builder.CreateOr(PC, CC);
>>> +      else
>>> +        // Case 1, use parallel and.
>>> +        NC = Builder.CreateAnd(PC, CC);
>>> +
>>> +      PBI->replaceUsesOfWith(CC, NC);
>>> +      PC = NC;
>>> +      if (CB == LCond)
>>> +        ITER = false;
>>> +      // Remove internal conditional branches.
>>> +      CB->dropAllReferences();
>>> +      // make CB unreachable and let downstream to delete the block.
>>> +      new UnreachableInst(CB->getContext(), CB);
>>> +    } while (ITER);
>>> +
>>> +    Builder.SetInsertPoint(ItOld);
>>> +    DEBUG(dbgs()<<  "Use parallel and/or in:\n"<<  *FCond);
>>> +    return true;
>>> +  }
>>> +
>>> +  return false;
>>> +}
>>> +
>>> +/// Compare blocks from two if-regions, where \param Head1 is the head of the
>>> +/// 1st if-region. \param Head2 is the head of the 2nd if-region. \param
>>> +/// Block1 is a block in the 1st if-region to compare. \param Block2 is a block
>>> +//  in the 2nd if-region to compare.  \returns true if the blocks have identical
>>> +/// instructions that do not alias with \param Head2 and it is legal to merge
>>> +/// the two blocks so that only one instance of each instruction is kept.
>>> +///
>>> +bool SimplifyCFGOpt::CompareBlock(BasicBlock *Head1, BasicBlock *Head2,
>>> +                                  BasicBlock *Block1, BasicBlock *Block2) {
>>> +  TerminatorInst *PTI2 = Head2->getTerminator();
>>> +  Instruction *PBI2 = Head2->begin();
>>> +
>>> +  if (Block1 == Head1) {
>>> +    if (Block2 != Head2)
>>> +      return false;
>>> +  } else if (Block2 == Head2)
>>> +    return false;
>>> +  else {
>>> +    // Check whether instructions in Block1 and Block2 are identical
>>> +    // and do not alias with instructions in Head2.
>>> +    BasicBlock::iterator iter1 = Block1->begin();
>>> +    BasicBlock::iterator end1 = Block1->getTerminator();
>>> +    BasicBlock::iterator iter2 = Block2->begin();
>>> +    BasicBlock::iterator end2 = Block2->getTerminator();
>>> +
>>> +    while (1) {
>>> +      if (iter1 == end1) {
>>> +        if (iter2 != end2)
>>> +          return false;
>>> +        break;
>>> +      }
>>> +
>>> +      if (!iter1->isIdenticalTo(iter2))
>>> +        return false;
>>> +
>>> +      // Illegal to remove instructions with side effects except
>>> +      // non-volatile stores.
>>> +      if (iter1->mayHaveSideEffects()) {
>>> +        Instruction *CurI =&*iter1;
>>> +        StoreInst *SI = dyn_cast<StoreInst>(CurI);
>>> +        if (!SI || SI->isVolatile())
>>> +          return false;
>>> +      }
>>> +
>>> +      // For simplicity and speed, data dependency check can be
>>> +      // avoided if read from memory doesn't exist.
>>> +      if (iter1->mayReadFromMemory())
>>> +        return false;
>>> +
>>> +      if (iter1->mayWriteToMemory()) {
>>> +        for (BasicBlock::iterator BI = PBI2, BE = PTI2; BI != BE; ++BI) {
>>> +          if (BI->mayReadFromMemory() || BI->mayWriteToMemory()) {
>>> +            // Check alias with Head2.
>>> +            if (!AA || AA->alias(iter1, BI))
>>> +              return false;
>>> +          }
>>> +        }
>>> +      }
>>> +      ++iter1;
>>> +      ++iter2;
>>> +    }
>>> +    ;
>>> +  }
>>> +
>>> +  return true;
>>> +}
>>> +
>>> +/// Check whether \param BB is the merge block of a if-region.  If yes, check
>>> +/// whether there exists an adjacent if-region upstream, the two if-regions
>>> +/// contain identical instuctions and can be legally merged.  \returns true if
>>> +/// the two if-regions are merged.
>>> +///
>>> +/// From:
>>> +/// if (a)
>>> +///   statement;
>>> +/// if (b)
>>> +///   statement;
>>> +///
>>> +/// To:
>>> +/// if (a || b)
>>> +///   statement;
>>> +///
>>> +bool SimplifyCFGOpt::MergeIfRegion(BasicBlock *BB, IRBuilder<>  &Builder,
>>> +                                   Pass *P) {
>>> +  BasicBlock *IfTrue2, *IfFalse2;
>>> +  Value *IfCond2 = GetIfCondition(BB, IfTrue2, IfFalse2);
>>> +  if (!IfCond2)
>>> +    return false;
>>> +
>>> +  Instruction *CInst2 = dyn_cast<Instruction>(IfCond2);
>>> +  if (!CInst2)
>>> +    return false;
>>> +
>>> +  BasicBlock *Head2 = CInst2->getParent();
>>> +  if (Head2->hasAddressTaken())
>>> +    return false;
>>> +
>>> +  BasicBlock *IfTrue1, *IfFalse1;
>>> +  Value *IfCond1 = GetIfCondition(Head2, IfTrue1, IfFalse1);
>>> +  if (!IfCond1)
>>> +    return false;
>>> +
>>> +  Instruction *CInst1 = dyn_cast<Instruction>(IfCond1);
>>> +  if (!CInst1)
>>> +    return false;
>>> +
>>> +  BasicBlock *Head1 = CInst1->getParent();
>>> +
>>> +  // Either then-path or else-path should be empty.
>>> +  if ((IfTrue1 != Head1)&&  (IfFalse1 != Head1))
>>> +    return false;
>>> +  if ((IfTrue2 != Head2)&&  (IfFalse2 != Head2))
>>> +    return false;
>>> +
>>> +  TerminatorInst *PTI2 = Head2->getTerminator();
>>> +  Instruction *PBI2 = Head2->begin();
>>> +
>>> +  if (!CompareBlock(Head1, Head2, IfTrue1, IfTrue2))
>>> +    return false;
>>> +
>>> +  if (!CompareBlock(Head1, Head2, IfFalse1, IfFalse2))
>>> +    return false;
>>> +
>>> +  // Check whether \param Head2 has side-effect and is safe to speculate.
>>> +  for (BasicBlock::iterator BI = PBI2, BE = PTI2; BI != BE; ++BI) {
>>> +    Instruction *CI = BI;
>>> +    if (isa<PHINode>(CI) || CI->mayHaveSideEffects() ||
>>> +        !isSafeToSpeculativelyExecute(CI))
>>> +      return false;
>>> +  }
>>> +
>>> +  // Merge \param Head2 into \param Head1.
>>> +  Head1->getInstList().pop_back();
>>> +  Head1->getInstList().splice(Head1->end(), Head2->getInstList());
>>> +  TerminatorInst *PTI = Head1->getTerminator();
>>> +  BranchInst *PBI = dyn_cast<BranchInst>(PTI);
>>> +  Value *CC = PBI->getCondition();
>>> +  BasicBlock::iterator ItOld = Builder.GetInsertPoint();
>>> +  Builder.SetInsertPoint(PTI);
>>> +  Value *NC = Builder.CreateOr(CInst1, CC);
>>> +  PBI->replaceUsesOfWith(CC, NC);
>>> +  Builder.SetInsertPoint(ItOld);
>>> +
>>> +  // Remove IfTrue1
>>> +  if (IfTrue1 != Head1) {
>>> +    IfTrue1->dropAllReferences();
>>> +    IfTrue1->eraseFromParent();
>>> +  }
>>> +
>>> +  // Remove IfFalse1
>>> +  if (IfFalse1 != Head1) {
>>> +    IfFalse1->dropAllReferences();
>>> +    IfFalse1->eraseFromParent();
>>> +  }
>>> +
>>> +  // Remove \param Head2
>>> +  Head2->dropAllReferences();
>>> +  Head2->eraseFromParent();
>>> +  DEBUG(dbgs()<<  "If conditions merged into:\n"<<  *Head1);
>>> +  return true;
>>> +}
>>> +
>>>  /// Check if passing a value to an instruction will cause undefined behavior.
>>>  static bool passingValueIsAlwaysUndefined(Value *V, Instruction *I) {
>>>    Constant *C = dyn_cast<Constant>(V);
>>> @@ -4142,6 +4579,10 @@
>>> 
>>>    IRBuilder<>  Builder(BB);
>>> 
>>> +  if (ParallelAndOr&&  TTI.hasBranchDivergence()&&
>>> +      SimplifyParallelAndOr(BB, Builder))
>>> +    return true;
>>> +
>>>    // If there is a trivial two-entry PHI node in this basic block, and we can
>>>    // eliminate it, do so now.
>>>    if (PHINode *PN = dyn_cast<PHINode>(BB->begin()))
>>> @@ -4169,6 +4610,9 @@
>>>      if (SimplifyIndirectBr(IBI)) return true;
>>>    }
>>> 
>>> +  if (ParallelAndOr&&  TTI.hasBranchDivergence()&&  MergeIfRegion(BB, Builder))
>>> +    return true;
>>> +
>>>    return Changed;
>>>  }
>>> 
>>> @@ -4178,6 +4622,6 @@
>>>  /// of the CFG.  It returns true if a modification was made.
>>>  ///
>>>  bool llvm::SimplifyCFG(BasicBlock *BB, const TargetTransformInfo&TTI,
>>> -                       const DataLayout *TD) {
>>> -  return SimplifyCFGOpt(TTI, TD).run(BB);
>>> +                       const DataLayout *TD, AliasAnalysis *AA) {
>>> +  return SimplifyCFGOpt(TTI, TD, AA).run(BB);
>>>  }
>>> Index: lib/Transforms/Scalar/SimplifyCFGPass.cpp
>>> ===================================================================
>>> --- lib/Transforms/Scalar/SimplifyCFGPass.cpp	(revision 184607)
>>> +++ lib/Transforms/Scalar/SimplifyCFGPass.cpp	(working copy)
>>> @@ -27,6 +27,7 @@
>>>  #include "llvm/ADT/SmallVector.h"
>>>  #include "llvm/ADT/Statistic.h"
>>>  #include "llvm/Analysis/TargetTransformInfo.h"
>>> +#include "llvm/Analysis/AliasAnalysis.h"
>>>  #include "llvm/IR/Attributes.h"
>>>  #include "llvm/IR/Constants.h"
>>>  #include "llvm/IR/DataLayout.h"
>>> @@ -49,10 +50,14 @@
>>> 
>>>      virtual bool runOnFunction(Function&F);
>>> 
>>> -    virtual void getAnalysisUsage(AnalysisUsage&AU) const {
>>> -      AU.addRequired<TargetTransformInfo>();
>>> -    }
>>> -  };
>>> +  virtual void getAnalysisUsage(AnalysisUsage&AU) const {
>>> +    AU.addRequired<TargetTransformInfo>();
>>> +    AU.addRequired<AliasAnalysis>();
>>> +    AU.addRequired<AliasAnalysis>();
>>> +  }
>>> +private:
>>> +  AliasAnalysis *AA;
>>> +};
>>>  }
>>> 
>>>  char CFGSimplifyPass::ID = 0;
>>> @@ -301,7 +306,7 @@
>>>  /// iterativelySimplifyCFG - Call SimplifyCFG on all the blocks in the function,
>>>  /// iterating until no more changes are made.
>>>  static bool iterativelySimplifyCFG(Function&F, const TargetTransformInfo&TTI,
>>> -                                   const DataLayout *TD) {
>>> +                                   const DataLayout *TD, AliasAnalysis * AA) {
>>>    bool Changed = false;
>>>    bool LocalChange = true;
>>>    while (LocalChange) {
>>> @@ -310,7 +315,7 @@
>>>      // Loop over all of the basic blocks and remove them if they are unneeded...
>>>      //
>>>      for (Function::iterator BBIt = F.begin(); BBIt != F.end(); ) {
>>> -      if (SimplifyCFG(BBIt++, TTI, TD)) {
>>> +      if (SimplifyCFG(BBIt++, TTI, TD, AA)) {
>>>          LocalChange = true;
>>>          ++NumSimpl;
>>>        }
>>> @@ -324,11 +329,12 @@
>>>  // simplify the CFG.
>>>  //
>>>  bool CFGSimplifyPass::runOnFunction(Function&F) {
>>> +  AA =&getAnalysis<AliasAnalysis>();
>>>    const TargetTransformInfo&TTI = getAnalysis<TargetTransformInfo>();
>>>    const DataLayout *TD = getAnalysisIfAvailable<DataLayout>();
>>>    bool EverChanged = removeUnreachableBlocksFromFn(F);
>>>    EverChanged |= mergeEmptyReturnBlocks(F);
>>> -  EverChanged |= iterativelySimplifyCFG(F, TTI, TD);
>>> +  EverChanged |= iterativelySimplifyCFG(F, TTI, TD, AA);
>>> 
>>>    // If neither pass changed anything, we're done.
>>>    if (!EverChanged) return false;
>>> @@ -342,7 +348,7 @@
>>>      return true;
>>> 
>>>    do {
>>> -    EverChanged = iterativelySimplifyCFG(F, TTI, TD);
>>> +    EverChanged = iterativelySimplifyCFG(F, TTI, TD, AA);
>>>      EverChanged |= removeUnreachableBlocksFromFn(F);
>>>    } while (EverChanged);
>>> 
>>> Index: lib/CodeGen/BasicTargetTransformInfo.cpp
>>> ===================================================================
>>> --- lib/CodeGen/BasicTargetTransformInfo.cpp	(revision 183763)
>>> +++ lib/CodeGen/BasicTargetTransformInfo.cpp	(working copy)
>>> @@ -63,6 +63,8 @@
>>>      return this;
>>>    }
>>> 
>>> +  virtual bool hasBranchDivergence() const;
>>> +
>>>    /// \name Scalar TTI Implementations
>>>    /// @{
>>> 
>>> @@ -122,6 +124,7 @@
>>>    return new BasicTTI(TLI);
>>>  }
>>> 
>>> +bool BasicTTI::hasBranchDivergence() const { return false; }
>>> 
>>>  bool BasicTTI::isLegalAddImmediate(int64_t imm) const {
>>>    return TLI->isLegalAddImmediate(imm);
>> 
>> 
>> 
>> 
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> 
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits