[patch] simplifyCFG for review

Andrew Trick atrick at apple.com
Thu Jul 25 17:24:30 PDT 2013


On Jul 25, 2013, at 5:12 PM, Ye, Mei <Mei.Ye at amd.com> wrote:

> Thanks Nick.  Attached patch addressed all of your new comments.  Andy and Tom gave a OK in the past.
> I assume Tom can go ahead to test and submit the patch now?

Yes, Tom can commit this patch.

My response to Nick’s review is below, but you don't need another round of review.

> W.R.T to your question, I desire a combined set of !mayHaveSideEffects and isSafeToSpeculativelyExecute.  Now I take a closer look at the implementations.  It looks to me that isSafeToSpeculativelyExecute is a superset of !mayHaveSideEffects.   So I only keep isSafeToSpeculativelyExecute in this patch.
> 
> -Mei
> 
> 
> -----Original Message-----
> From: Nick Lewycky [mailto:nicholas at mxc.ca] 
> Sent: Thursday, July 25, 2013 12:38 PM
> To: Ye, Mei
> Cc: Andrew Trick; Tom Stellard; CVS Commit Messages for LLVM repository
> Subject: Re: [patch] simplifyCFG for review
> 
> Ye, Mei wrote:
>> Ping?
> 
> This looks much better, the logic was much easier for me to reason about 
> this time around. Thanks!
> 
> I don't feel like I need to see another version of the patch with these 
> changes applied. I trust you.
> 
> I'm going to give half of an ok-to-commit, because while the code looks 
> fine I haven't taken the time to properly understand what this 
> transformation really is or why it's important, whether it really 
> belongs here in the IR layer, and those other higher-level decisions. 
> For that I'm going to defer to Andy and Tom, please get an OK from one 
> of them.

This does make sense to do in the IR layer, but only after the first round of target-independent optimizations. We'd like to make this distinction more clear and have a target-configurable passmanager for transforms like this. Until then, Mei's workaround is acceptable, as Evan mentioned before.

I also don't really like adding so much special purpose code to SimplifyCFG.cpp, but that can split out later after we make a better target dependent/independent distinction.

I totally agree with all of Nick's feedback on the last patch iteration. Thanks.

-Andy

> +  bool IsOpt; // Is optimization pass
> 
> Er, people are going to be confused by this. I realize what you mean 
> (Andy's canonicalization vs. optimization distinction) but people are 
> going to think of simplifycfg as an optimization pass unless they happen 
> to have recently been talking with Andy. Strangely, I don't think people 
> will be confused by the CFGSimplifyPass vs. CFGOptimizePass distinction.
> 
> How about "IsTargetAware"?
> 
> +/// GetIfCondition - Given a basic block (BB) with two predecessors,
> +/// check to see if the merge at this block is due
>  /// to an "if condition".  If so, return the boolean condition that 
> determines
> 
> Please reflow the rest of the comment.
> 
> +    if (PBI->isUnconditional()) {
> [...]
> +      if (UnCondBlock || !PP || (Preds.count(PP) == 0) ||
> +          (PBI->getNumSuccessors() != 1) || Pred->hasAddressTaken())
> 
> PBI->isUnconditional() implies that PBI->getNumSuccessors() equals one. 
> You can remove that part of the check.
> 
> +        if (isa<PHINode>(CI) || CI->mayHaveSideEffects() ||
> +            !isSafeToSpeculativelyExecute(CI))
> 
> I have a silly question. What if the call *is* safe to speculatively 
> execute yet still has side-effects? It seems to me that this code is 
> doing speculation, so you should only be checking 
> isSafeToSpeculativelyExecute?
> 
> +    // The terminator must have exactly two successors.
> +    if (PBI->getNumSuccessors() != 2)
> +      return false;
> 
> This is redundant. I'd say "remove it or assert it", but you already 
> assert it above ("assert(PBI->isConditional())"). Just remove it.
> 
> +      // BB must have an unique successor.
> +      TerminatorInst *TBB = BB->getTerminator();
> +      if (TBB->getNumSuccessors() != 1)
> 
> No code change, just a fun fact. The code doesn't quite match the 
> comment. It's very close though! :) A branch "br i1 %cond, label %bb, 
> label %bb" has two successors, but one unique successor (since both 
> successors are the same block). No, I don't think you need to care here 
> since simplifycfg will have folded this conditional branch into an 
> unconditional branch elsewhere.
> 
> +    PBI = dyn_cast<BranchInst>(FirstCondBlock->getTerminator());
> +    Value *CC = PBI->getCondition();
> 
> What if the dyn_cast returns NULL? If that's impossible here, use cast<> 
> instead of dyn_cast<>.
> 
> You have the same code again inside MergeIfRegion (with 'Head1' instead 
> of 'FirstCondBlock').
> 
> Nick
> 
>> -----Original Message-----
>> From: Ye, Mei
>> Sent: Monday, July 22, 2013 1:55 PM
>> To: 'Andrew Trick'
>> Cc: Nick Lewycky; llvm commits; Tom Stellard
>> Subject: RE: [patch] simplifyCFG for review
>> 
>> Hi Andy and Nick
>> 
>> Attached patch cleans out the remaining issues from both of you.
>> W.R.T Andy's question on why "Head2->replaceAllUsesWith(Head1);".  This was added to address Nick's concern to update PHI nodes if  "Head2" is an incoming block of a PHI node.   But after giving it more thoughts, my work is not triggered if "Head2" may have side effects which include stores, so "Head2" will not be fed into a PHI node.   Therefore I removed it in this patch.
>> 
>> -Mei
>> 
>> -----Original Message-----
>> From: Andrew Trick [mailto:atrick at apple.com]
>> Sent: Sunday, July 21, 2013 5:14 AM
>> To: Ye, Mei
>> Cc: Nick Lewycky; llvm commits; Tom Stellard
>> Subject: Re: [patch] simplifyCFG for review
>> 
>> On Jul 19, 2013, at 2:23 PM, "Ye, Mei"<Mei.Ye at amd.com>  wrote:
>>> Thanks for taking the time to review this. Attached is a version that fixes all of your issues except:
>>> 
>>> 1. A couple of the following patterns are not feld together since "PTI" either has more than 1 uses or its storage can be re-used.
>>> 
>>> --------------------------------------------------------------
>>> +    TerminatorInst *PTI = Pred->getTerminator();
>>> +    BranchInst *PBI = dyn_cast<BranchInst>(PTI);
>>> Fold them together:
>>>   BranchInst *PBI = dyn_cast<BranchInst>(Pred->getTerminator());
>>> -----------------------------------------------------------------
>>> 
>>> 2.  In the following pattern, "Preds" is predecessors of "BB".  "PP" is the predecessor of an element in "Preds".  "Preds.count(PP)" queries whether "PP" appear in "Preds".  This is used to identify internal nodes.  So it is not redundant.
>>> 
>>> -----------------------------------------------------------------
>>> +    if (PP&&  (Preds.count(PP) != 0)) {
>>> This is redundant. Preds can't contain any NULL pointer entries.
>>> ------------------------------------------------------------------
>>> 
>>> 3. From what I can see, "!PBI->isUnconditional" is not equivalent to "PBI->isConditional".  The implementation just checks operand counts. Could there be multi-way branches?
>>> 
>>> --------------------------------------------------------------------
>>> +    // Only conditional branches are allowed beyond this point.
>>> +    if (!PBI->isConditional())
>>> +      return false;
>>> 
>>> Remove it or assert it. This condition never happens. Also, you wrote it
>>> as "PBI->isUnconditional" above, but "!PBI->isConditional" here?
>>> -----------------------------------------------------------------------
>>> 
>>> 4. I use clang-format, which unindents the struct.  Clang-format reports many errors in existing codes.  I have to take the pain to isolate my changes.  Life will be easier if someone can run a clang-format for all files, and make clang-format run for every check-in.
>>> 
>>> ---------------------------------------------------------------------------
>>>  namespace {
>>> -  struct CFGSimplifyPass : public FunctionPass {
>>> -    static char ID; // Pass identification, replacement for typeid
>>> -    CFGSimplifyPass() : FunctionPass(ID) {
>>> -      initializeCFGSimplifyPassPass(*PassRegistry::getPassRegistry());
>>> -    }
>>> +struct CFGSimplifyPass : public FunctionPass {
>>> +  CFGSimplifyPass(char&ID, bool isOpt) : FunctionPass(ID), IsOpt(isOpt) {}
>>> +  virtual bool runOnFunction(Function&F);
>>> 
>>> Why did you unindent the struct? Please put it back.
>>> -------------------------------------------------------------------------------
>> 
>> Thanks to Nick for reviewing the implementation. I had only reviewed the change related to my initial concern. I looked through the most recent patch more carefully and mostly have nits:
>> 
>> +bool TargetTransformInfo::hasBranchDivergence() const { return false; }
>> +
>> 
>> Use PrevTTI->hasBranchDivergence(), just like all the other TargetTransformInfo implementations.
>> 
>> +  virtual bool hasBranchDivergence() const;
>> +
>> 
>> This declaration belongs inside the comment below.
>> 
>>    /// \name Scalar TTI Implementations
>>    /// @{
>> 
>> --- SimplifyParallelAndOr
>> 
>> + ///  Case 1: \param BB is on the else-path.
>> 
>> BB is on all paths. Nothing is on the else-path.
>> 
>> ///  BB =>    BB3      |  BB3 contains unconditiona branch and corresponds
>> 
>> "unconditional"
>> 
>> +  BasicBlock *LCond = NULL;
>> +  BasicBlock *FCond = NULL;
>> 
>> I'm not sure what these variable names mean. First and last? But they're not really conditions, so I'm confused. Please comment and maybe rename.
>> 
>> +        // Case 1: PS(BB3) should be an unconditional branch.
>> +        LCond = Pred;
>> ...
>> +  if (FCond&&  LCond&&  (FCond != LCond)) {
>> 
>> Not really important, but you could invert this condition and immediately return false.
>> 
>> Would the following assertion be appropriate?
>> 
>> assert((!UCond || UCond == LCond)&&  "...");
>> 
>> +    bool ITER = true;
>> 
>> ITER is not an acronym. (I realize PHI isn't either, but there's some strange precedent for capsing PHI). Maybe S
>> 
>> --- CompareBlock
>> 
>> + bool SimplifyCFGOpt::CompareBlock(BasicBlock *Head1, BasicBlock *Head2,
>> 
>> This name is too generic for a helper that's so specific to MergeIfRegion. Maybe "CompareIfRegion".
>> 
>> -- MergeIfRegion
>> 
>> +  Head2->replaceAllUsesWith(Head1);
>> 
>> Can you comment on why the replaceAllUsesWith is necessary? It's a strange thing to do.
>> 
>> -Andy
>> 
>> 
>>> -----Original Message-----
>>> From: Nick Lewycky [mailto:nicholas at mxc.ca]
>>> Sent: Thursday, July 18, 2013 1:25 AM
>>> To: Ye, Mei
>>> Cc: Andrew Trick; llvm commits; Tom Stellard
>>> Subject: Re: [patch] simplifyCFG for review
>>> 
>>> Andrew Trick wrote:
>>>> 
>>>> On Jul 17, 2013, at 11:07 AM, Ye, Mei<Mei.Ye at amd.com
>>>> <mailto:Mei.Ye at amd.com>>  wrote:
>>>> 
>>>>> Hi Andrew
>>>>> Attached is a patch that follows what you suggested. Thanks for your time.
>>>> 
>>>> I'm ok with this patch now. Someone who can test it should commit. Maybe
>>>> Tom?
>>> 
>>> --- lib/Target/R600/AMDGPUTargetTransformInfo.cpp	(revision 0)
>>> +++ lib/Target/R600/AMDGPUTargetTransformInfo.cpp	(revision 0)
>>> @@ -0,0 +1,90 @@
>>> +//===-- AMDGPUTargetTransformInfo.cpp - AMDGPU specific TTI pass
>>> +//----------------===//
>>> 
>>> Why is this two lines?
>>> 
>>> +//===----------------------------------------------------------------------===//
>>> +/// \file
>>> +/// This file implements a TargetTransformInfo analysis pass specific
>>> to the
>>> 
>>> Two slashes for these comments, and a blank //-only line after the ruler.
>>> 
>>> +  /// \brief Compare a pair of blocks: \param Block1 and \param Block2,
>>> which
>>> +  /// are from two if-regions whose head blocks are \param Head1 and \param
>>> +  /// Head2.  \returns true if \param Block1 and \param Block2 contain
>>> identical
>>> +  /// instructions, and none of the instructions alias with \param Head2.
>>> +  /// This is used as a legality check for merging if-regions.
>>> 
>>> Despite reading this, I still don't understand it. It compares Block1
>>> with Block2. What does "equal" mean in this context? What is a "head
>>> block" and why does it matter? You say instructions alias, you mean
>>> 
>>> +  SimplifyCFGOpt(const TargetTransformInfo&TTI, const DataLayout *TD,
>>> +                 AliasAnalysis *aa)
>>> +      : TTI(TTI), TD(TD), AA(aa) {}
>>> 
>>> Call it 'AA' not 'aa'. Follow the existing pattern.
>>> 
>>> +    SmallSetVector<BasicBlock *, 16>  Preds(pred_begin(BB), pred_end(BB));
>>> +    for (SmallSetVector<BasicBlock *, 16>::iterator PI = Preds.begin(),
>>> +                                                    PE = Preds.end();
>>> +         PI != PE; ++PI) {
>>> +      if (Pred1 == NULL)
>>> +        Pred1 = *PI;
>>> +      else if (Pred2 == NULL)
>>> +        Pred2 = *PI;
>>> +      else
>>> +        return NULL;
>>> +    }
>>> 
>>> This is inefficient. Remove 'Preds', make the loop iterate from
>>> pred_begin(BB) to pred_end(BB). Add "if (*PI == Pred1 || *PI == Pred2)
>>> continue;" to the top of the loop.
>>> 
>>> +    if (!Pred1 || !Pred2)
>>> +       return NULL;
>>> 
>>> Fix indent before 'return'.
>>> 
>>> +  PHINode *PHI = dyn_cast<PHINode>(&BB->front());
>>> 
>>> You can use "BB->begin()" instead of "&BB->front()".
>>> 
>>> +  SmallSetVector<BasicBlock *, 16>  Preds(pred_begin(BB), pred_end(BB));
>>> +  // Check predecessors of \param BB.
>>> +  for (SmallSetVector<BasicBlock *, 16>::iterator PI = Preds.begin(),
>>> +                                                  PE = Preds.end();
>>> +       PI != PE; ++PI) {
>>> 
>>> This is inefficient. You can use SmallPtrSet for this. You don't need to
>>> preserve the iteration order of pred_begin to pred_end, and SmallPtrSet
>>> gives you iterators (unlike SmallSet).
>>> 
>>> +    TerminatorInst *PTI = Pred->getTerminator();
>>> +    BranchInst *PBI = dyn_cast<BranchInst>(PTI);
>>> 
>>> Fold them together:
>>>   BranchInst *PBI = dyn_cast<BranchInst>(Pred->getTerminator());
>>> 
>>> +    // Only conditional branches are allowed beyond this point.
>>> +    if (!PBI->isConditional())
>>> +      return false;
>>> 
>>> Remove it or assert it. This condition never happens. Also, you wrote it
>>> as "PBI->isUnconditional" above, but "!PBI->isConditional" here?
>>> 
>>> +    if (!PC || (PC->getNumUses() != 1))
>>> 
>>> This is inefficient. Use !PC-hasOneUse() to avoid walking the linked
>>> list of uses after we know there is more than one entry.
>>> 
>>> +    if (PP&&  (Preds.count(PP) != 0)) {
>>> 
>>> This is redundant. Preds can't contain any NULL pointer entries.
>>> 
>>> +      PHI = dyn_cast<PHINode>(&SBB->front());
>>> 
>>> Again, SBB->begin().
>>> 
>>> +      TerminatorInst *TPS = PS->getTerminator();
>>> +      BranchInst *BPS = dyn_cast<BranchInst>(TPS);
>>> +      if (BPS&&  BPS->isUnconditional()) {
>>> 
>>> At least merge TPS into the dyn_cast<...>(TPS). You could also write
>>>   if (BranchInst *BPS = dyn_cast<BranchInst>(PS->getTerminator())
>>>     if (BPS->isUnconditional())
>>>       ...
>>> 
>>> +  if (Block1 == Head1) {
>>> +    if (Block2 != Head2)
>>> +      return false;
>>> +  } else if (Block2 == Head2)
>>> +    return false;
>>> +  else {
>>> 
>>> So if Block1 == Head1 and Block2 == Head2 this function does nothing?
>>> 
>>> Can you reword the above without the elses following returns? See
>>> http://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return .
>>> 
>>> +    }
>>> +    ;
>>> +  }
>>> 
>>> Spurious semi-colon.
>>> 
>>> +  Value *IfCond2 = GetIfCondition(BB, IfTrue2, IfFalse2);
>>> +  if (!IfCond2)
>>> +    return false;
>>> +
>>> +  Instruction *CInst2 = dyn_cast<Instruction>(IfCond2);
>>> +  if (!CInst2)
>>> +    return false;
>>> 
>>> You can simplify this using dyn_cast_or_null:
>>> 
>>>   Value *IfCond2 = GetIfCondition(BB, IfTrue2, IfFalse2);
>>>   Instruction *CInst2 = dyn_cast_or_null<Instruction>(IfCond2);
>>>   if (!CInt2)
>>>     return false;
>>> 
>>> +  // Check whether \param Head2 has side-effect and is safe to speculate.
>>> 
>>> Ensure that ... has no side-effects?
>>> 
>>> +  Head1->getInstList().pop_back();
>>> +  Head1->getInstList().splice(Head1->end(), Head2->getInstList());
>>> 
>>> You don't update PHI nodes after doing this. Can you show me a
>>> proof-sketch that Head2 can't be used in any PHI nodes in the function
>>> if we reach here?
>>> 
>>> +  bool OptCB = (ParallelAndOr&&  AA&&  TTI.hasBranchDivergence()) ?
>>> true : false;
>>> 
>>> I think the ?: is redundant here.
>>> 
>>> -  initializeCFGSimplifyPassPass(Registry);
>>> +  initializeCFGCANONPass(Registry);
>>> +  initializeCFGOPTPass(Registry);
>>> 
>>> OPT and CANON aren't acronyms.
>>> 
>>>  namespace {
>>> -  struct CFGSimplifyPass : public FunctionPass {
>>> -    static char ID; // Pass identification, replacement for typeid
>>> -    CFGSimplifyPass() : FunctionPass(ID) {
>>> -      initializeCFGSimplifyPassPass(*PassRegistry::getPassRegistry());
>>> -    }
>>> +struct CFGSimplifyPass : public FunctionPass {
>>> +  CFGSimplifyPass(char&ID, bool isOpt) : FunctionPass(ID), IsOpt(isOpt) {}
>>> +  virtual bool runOnFunction(Function&F);
>>> 
>>> Why did you unindent the struct? Please put it back. See
>>> http://llvm.org/docs/CodingStandards.html#namespace-indentation .
>>> 
>>> +                                   const DataLayout *TD, AliasAnalysis
>>> * AA) {
>>> 
>>> Extra space before 'AA'.
>>> 
>>> +                 const DataLayout *TD = 0, AliasAnalysis * AA = 0);
>>> 
>>> Extra space again.
>>> 
>>> Nick
>>> 
>>>> 
>>>> -Andy
>>>> 
>>>>> On Jul 12, 2013, at 3:39 PM, Ye, Mei<Mei.Ye at amd.com
>>>>> <mailto:Mei.Ye at amd.com>>  wrote:
>>>>> 
>>>>> 
>>>>> The fundamental issue is whether target-specific works are allowed
>>>>> inside "machine-independent" passes. In fact, almost all compiler
>>>>> optimizations are target-dependent. Having an good infrastructure to
>>>>> enable target-specific tunings or optimizations will promote
>>>>> code-sharing and code-quality. The common practice that I have seen is
>>>>> that compiler vendors keep their work "if-defed" inside their own
>>>>> branches and do not contribute back to the trunk. There are IP and
>>>>> business strategy reasons. But in the long run, code base diverges and
>>>>> merging gets more difficult if not impossible. Not only the vendors
>>>>> will suffer, but also the larger community if less and less
>>>>> contributions flow back into the trunk.
>>>>> Yes, please contribute your target-dependent optimizations! Usually
>>>>> the best way to introduce them is in target passes, which are only
>>>>> built when building your target. You can easily do this by overriding
>>>>> TargetPassConfig::addISelPrepare.
>>>>> For this patch, you could make the argument that your transformations
>>>>> are sufficiently generic that they might as well be made available in
>>>>> SimplifyCFG.cpp. I agree.
>>>>> However, (correct me if I'm wrong) your transformations are not
>>>>> required to expose downstream machine-independent optimizations. They
>>>>> should not run early in the pass pipeline. Doing so will lead to
>>>>> divergence across targets and lack of code sharing and code quality
>>>>> that you are afraid of. There is also no reason to run them evey time
>>>>> we invoke the SimplifyCFG pass.
>>>>> We want to have a core SimplifyCFG pass that can be run repeatedly to
>>>>> canonicalize the IR. It's one of the first things we run. Introducing
>>>>> target heuristics this early would be a mistake.
>>>>> I realize this design goal is not obvious from the current codebase
>>>>> and documentation. Tomorrow, I will send out a design proposal to llvm
>>>>> to make sure contributors are working toward a common goal.
>>>>> You could help move toward this goal though by modifying your patch to
>>>>> run only once in the final round of SimplifyCFG (after jump-threading
>>>>> and DSE).
>>>>> The most straightforward way to handle this is to add a new
>>>>> OptimizeCFG pass for all the "lowering" stuff and simply schedule it
>>>>> after the last SimplifyCFG.
>>>>> If you want to combine it with a round of SimplifyCFG, that's also ok,
>>>>> but we should still have a distinct pass name. In this case you need to:
>>>>> - Define a subclass of CFGSimplifyPass that passes a flag to the
>>>>> CFGSimplifyPass ctor to enable lowering optimizations.
>>>>> - Add INITIALIZE_PASS_... declarations for the new pass name
>>>>> "optimizecfg".
>>>>> - Add the same flag to createCFGSimplificationPass() and make the
>>>>> trivial changes to populateModulePassManager() and
>>>>> populateLTOPassManager().
>>>>> See ScalerReplAggregates.cpp for an example of all this.
>>>>> 
>>>>> 
>>>>> If we have a good repository of optimizations, each target can
>>>>> selectively enable/disable them to avoid unexpected performance
>>>>> impact. To safeguard the correctness, the community can create a super
>>>>> target (just clone X86) that tests all optimizations.
>>>>> 
>>>>> From the aspect of implementation, this patch fits into SimplifyCFG
>>>>> quite naturally. It is often advantageous to change the
>>>>> "machine-independent" codes since you already traverse the whole
>>>>> function and reach the point where certain patterns are recognized in
>>>>> a similar pass. Adding a new optimization/transformation at that point
>>>>> is free, while creating a separated pass will have to repeat works and
>>>>> increases compilation time. JIT and on-line compilation models are
>>>>> very stingy in compilation time. I unfortunately live in a world of
>>>>> compiling-from-source model, adding a new pass is prohibited in my
>>>>> organization. Besides, this patch is not GPU specific. It is good for
>>>>> CPU in general unless you happen to have patterns of very expensive
>>>>> floating point comparisons. Even in that case, it might still win due
>>>>> to reduction in branches.
>>>>> It's a moot point if you piggy back on the last SimplifyCFG as
>>>>> explained above. But I don't buy this argument at all. What work do
>>>>> you need to "redo" if you run this as a separate pass? Iterating over
>>>>> blocks? How does that compare to the cost of analyzing compound
>>>>> if-statements every time we run SimplifyCFG. Please quantify the
>>>>> impact of running this in a separate "LowerCFG" pass before claiming
>>>>> that you're optimizing the pass pipeline.
>>>>> -Andy
>>>>> 
>>>>> 
>>>>> 
>>>>> And, yes, GPU needs to do CFG transformations. But code gen can
>>>>> introduce new control flows. Vendors will need to recognize
>>>>> irreducible CFG in their jitters anyway as a verification.
>>>>> 
>>>>> -Mei
>>>>> 
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Nick Lewycky [mailto:nicholas at mxc.ca]
>>>>> Sent: Thursday, July 11, 2013 10:40 PM
>>>>> To: Ye, Mei
>>>>> Cc: Tom Stellard;llvm-commits at cs.uiuc.edu
>>>>> <mailto:llvm-commits at cs.uiuc.edu>
>>>>> Subject: Re: [patch] simplifyCFG for review
>>>>> 
>>>>> The fundamental issue with this patch is that you're saying that
>>>>> simplifycfg should be responsible for structuring the CFG differently
>>>>> based on whether the target supports branches that go different ways in
>>>>> different threads (a common GPU restriction).
>>>>> 
>>>>> We try very, very hard to avoid doing target-specific things inside
>>>>> passes like this. Doing so requires an analysis of why it couldn't be
>>>>> done differently. (Don't GPUs have problems with irreducable CFG anyhow?
>>>>> Don't you already need to rewrite the CFG for that anyways? Why not have
>>>>> a GPU CFG simplification pass right before CodeGenPrep?)
>>>>> 
>>>>> I have some comments on the details of the patch, but the above issue is
>>>>> big and needs wider discussion and consensus first.
>>>>> 
>>>>> Nick
>>>>> 
>>>>> Ye, Mei wrote:
>>>>> 
>>>>> Thanks Tom. Attached is a new patch that adds comments as requested. I
>>>>> also made a new diff against most-recent trunk. But it looks like many
>>>>> files have locks, so some diffs are based on an earlier revision.
>>>>> 
>>>>> With regard to your question:
>>>>> "I'm a little confused about why we need to add a
>>>>> BasicTargetTransformInfo and
>>>>> also an AMDGPUTargetTransformInfo. What is the reason for this?"
>>>>> 
>>>>> My answer is that same thing happens to X86, ARM, and other targets. I
>>>>> didn't track whether BasicTargetTransformInfo is always needed, but I
>>>>> think you can build compiler for more than one targets, and for
>>>>> targets that you do not supply a target-spefici Tranform Info, you can
>>>>> always default back to use the basic one.
>>>>> 
>>>>> -Mei
>>>>> 
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Tom Stellard [mailto:tom at stellard.net]
>>>>> Sent: Monday, July 08, 2013 11:14 AM
>>>>> To: Ye, Mei
>>>>> Cc: Evan Cheng; Renato Golin; Sean Silva;llvm-commits at cs.uiuc.edu
>>>>> <mailto:llvm-commits at cs.uiuc.edu>
>>>>> Subject: Re: [patch] simplifyCFG for review
>>>>> 
>>>>> On Fri, Jun 28, 2013 at 09:46:13PM +0000, Ye, Mei wrote:
>>>>> 
>>>>> Hi all
>>>>> 
>>>>> Thank you for your comments. Attached is an updated patch.
>>>>> - add TargetTransformInfo to R600 .
>>>>> - add hasBranchDivergence to query whether a target has branch divergence.
>>>>> - Invoke branch reduction opts only when the underlying target has
>>>>> branch divergence. So currently it only gets triggered for R600.
>>>>> - fixed a bug in opt.cpp
>>>>> - correct clang-format issues.
>>>>> 
>>>>> I have tested the correctness on x86 (by adding branch divergence to
>>>>> X86TargetTransformInfo in my local workspace) using existing testing
>>>>> infrastructure and CPU2006, CPU2000. Tom Stellard agrees to run R600
>>>>> testings (thanks a lot, Tom).
>>>>> 
>>>>> -Mei
>>>>> 
>>>>> 
>>>>> Index: test/Transforms/SimplifyCFG/R600/lit.local.cfg
>>>>> ===================================================================
>>>>> --- test/Transforms/SimplifyCFG/R600/lit.local.cfg(revision 0)
>>>>> +++ test/Transforms/SimplifyCFG/R600/lit.local.cfg(revision 0)
>>>>> @@ -0,0 +1,6 @@
>>>>> +config.suffixes = ['.ll', '.c', '.cpp']
>>>>> +
>>>>> +targets = set(config.root.targets_to_build.split())
>>>>> +if not 'R600' in targets:
>>>>> + config.unsupported = True
>>>>> +
>>>>> Index: test/Transforms/SimplifyCFG/R600/parallelorifcollapse.ll
>>>>> ===================================================================
>>>>> --- test/Transforms/SimplifyCFG/R600/parallelorifcollapse.ll(revision 0)
>>>>> +++ test/Transforms/SimplifyCFG/R600/parallelorifcollapse.ll(revision 0)
>>>>> @@ -0,0 +1,51 @@
>>>>> +; Function Attrs: nounwind
>>>>> +; RUN: opt<  %s -mtriple=r600-unknown-linux-gnu -simplifycfg -basicaa
>>>>> -S | FileCheck %s
>>>>> +; CHECK: or i1
>>>>> +; CHECK-NEXT: br
>>>>> +; CHECK: br
>>>>> +; CHECK: ret
>>>>> 
>>>>> It's not clear to me what the expected output of this test is supposed
>>>>> to be. Could you add a comment explaining what the simplifycfg pass is
>>>>> supposed to be doing here.
>>>>> 
>>>>> 
>>>>> +define void @_Z9chk1D_512v() #0 {
>>>>> +entry:
>>>>> + %a0 = alloca i32, align 4
>>>>> + %b0 = alloca i32, align 4
>>>>> + %c0 = alloca i32, align 4
>>>>> + %d0 = alloca i32, align 4
>>>>> + %a1 = alloca i32, align 4
>>>>> + %b1 = alloca i32, align 4
>>>>> + %c1 = alloca i32, align 4
>>>>> + %d1 = alloca i32, align 4
>>>>> + %data = alloca i32, align 4
>>>>> + %0 = load i32* %a0, align 4
>>>>> + %1 = load i32* %b0, align 4
>>>>> + %cmp = icmp ne i32 %0, %1
>>>>> + br i1 %cmp, label %land.lhs.true, label %if.end
>>>>> +
>>>>> +land.lhs.true: ; preds = %entry
>>>>> + %2 = load i32* %c0, align 4
>>>>> + %3 = load i32* %d0, align 4
>>>>> + %cmp1 = icmp ne i32 %2, %3
>>>>> + br i1 %cmp1, label %if.then, label %if.end
>>>>> +
>>>>> +if.then: ; preds = %land.lhs.true
>>>>> + store i32 1, i32* %data, align 4
>>>>> + br label %if.end
>>>>> +
>>>>> +if.end: ; preds = %if.then, %land.lhs.true, %entry
>>>>> + %4 = load i32* %a1, align 4
>>>>> + %5 = load i32* %b1, align 4
>>>>> + %cmp2 = icmp ne i32 %4, %5
>>>>> + br i1 %cmp2, label %land.lhs.true3, label %if.end6
>>>>> +
>>>>> +land.lhs.true3: ; preds = %if.end
>>>>> + %6 = load i32* %c1, align 4
>>>>> + %7 = load i32* %d1, align 4
>>>>> + %cmp4 = icmp ne i32 %6, %7
>>>>> + br i1 %cmp4, label %if.then5, label %if.end6
>>>>> +
>>>>> +if.then5: ; preds = %land.lhs.true3
>>>>> + store i32 1, i32* %data, align 4
>>>>> + br label %if.end6
>>>>> +
>>>>> +if.end6: ; preds = %if.then5, %land.lhs.true3, %if.end
>>>>> + ret void
>>>>> +}
>>>>> Index: test/Transforms/SimplifyCFG/R600/parallelandifcollapse.ll
>>>>> ===================================================================
>>>>> --- test/Transforms/SimplifyCFG/R600/parallelandifcollapse.ll(revision 0)
>>>>> +++ test/Transforms/SimplifyCFG/R600/parallelandifcollapse.ll(revision 0)
>>>>> @@ -0,0 +1,58 @@
>>>>> +; Function Attrs: nounwind
>>>>> +; RUN: opt<  %s -mtriple=r600-unknown-linux-gnu -simplifycfg -basicaa
>>>>> -S | FileCheck %s
>>>>> +; CHECK: or i1
>>>>> +; CHECK-NEXT: br
>>>>> +; CHECK: br
>>>>> +; CHECK: ret
>>>>> 
>>>>> Same thing here, a comment explaining what the expected output should be
>>>>> would be helpful.
>>>>> 
>>>>> 
>>>>> +define void @_Z9chk1D_512v() #0 {
>>>>> +entry:
>>>>> + %a0 = alloca i32, align 4
>>>>> + %b0 = alloca i32, align 4
>>>>> + %c0 = alloca i32, align 4
>>>>> + %d0 = alloca i32, align 4
>>>>> + %a1 = alloca i32, align 4
>>>>> + %b1 = alloca i32, align 4
>>>>> + %c1 = alloca i32, align 4
>>>>> + %d1 = alloca i32, align 4
>>>>> + %data = alloca i32, align 4
>>>>> + %0 = load i32* %a0, align 4
>>>>> + %1 = load i32* %b0, align 4
>>>>> + %cmp = icmp ne i32 %0, %1
>>>>> + br i1 %cmp, label %land.lhs.true, label %if.else
>>>>> +
>>>>> +land.lhs.true: ; preds = %entry
>>>>> + %2 = load i32* %c0, align 4
>>>>> + %3 = load i32* %d0, align 4
>>>>> + %cmp1 = icmp ne i32 %2, %3
>>>>> + br i1 %cmp1, label %if.then, label %if.else
>>>>> +
>>>>> +if.then: ; preds = %land.lhs.true
>>>>> + br label %if.end
>>>>> +
>>>>> +if.else: ; preds = %land.lhs.true, %entry
>>>>> + store i32 1, i32* %data, align 4
>>>>> + br label %if.end
>>>>> +
>>>>> +if.end: ; preds = %if.else, %if.then
>>>>> + %4 = load i32* %a1, align 4
>>>>> + %5 = load i32* %b1, align 4
>>>>> + %cmp2 = icmp ne i32 %4, %5
>>>>> + br i1 %cmp2, label %land.lhs.true3, label %if.else6
>>>>> +
>>>>> +land.lhs.true3: ; preds = %if.end
>>>>> + %6 = load i32* %c1, align 4
>>>>> + %7 = load i32* %d1, align 4
>>>>> + %cmp4 = icmp ne i32 %6, %7
>>>>> + br i1 %cmp4, label %if.then5, label %if.else6
>>>>> +
>>>>> +if.then5: ; preds = %land.lhs.true3
>>>>> + br label %if.end7
>>>>> +
>>>>> +if.else6: ; preds = %land.lhs.true3, %if.end
>>>>> + store i32 1, i32* %data, align 4
>>>>> + br label %if.end7
>>>>> +
>>>>> +if.end7: ; preds = %if.else6, %if.then5
>>>>> + ret void
>>>>> +}
>>>>> +
>>>>> Index: include/llvm/Analysis/TargetTransformInfo.h
>>>>> ===================================================================
>>>>> --- include/llvm/Analysis/TargetTransformInfo.h(revision 183763)
>>>>> +++ include/llvm/Analysis/TargetTransformInfo.h(working copy)
>>>>> @@ -171,6 +171,9 @@
>>>>> /// comments for a detailed explanation of the cost values.
>>>>> virtual unsigned getUserCost(const User *U) const;
>>>>> 
>>>>> + /// \brief hasBranchDivergence - Return true if branch divergence
>>>>> exists.
>>>>> + virtual bool hasBranchDivergence() const;
>>>>> +
>>>>> 
>>>>> What exactly is branch divergence? Could you explain this more in
>>>>> the comment and give an example.
>>>>> 
>>>>> 
>>>>> /// \brief Test whether calls to a function lower to actual program
>>>>> function
>>>>> /// calls.
>>>>> ///
>>>>> Index: include/llvm/Transforms/Utils/Local.h
>>>>> ===================================================================
>>>>> --- include/llvm/Transforms/Utils/Local.h(revision 184607)
>>>>> +++ include/llvm/Transforms/Utils/Local.h(working copy)
>>>>> @@ -39,6 +39,7 @@
>>>>> class TargetLibraryInfo;
>>>>> class TargetTransformInfo;
>>>>> class DIBuilder;
>>>>> +class AliasAnalysis;
>>>>> 
>>>>> template<typename T>  class SmallVectorImpl;
>>>>> 
>>>>> @@ -136,7 +137,7 @@
>>>>> /// the basic block that was pointed to.
>>>>> ///
>>>>> bool SimplifyCFG(BasicBlock *BB, const TargetTransformInfo&TTI,
>>>>> - const DataLayout *TD = 0);
>>>>> + const DataLayout *TD = 0, AliasAnalysis * AA = 0);
>>>>> 
>>>>> /// FoldBranchToCommonDest - If this basic block is ONLY a setcc and a
>>>>> branch,
>>>>> /// and if a predecessor branches to us and one of our successors,
>>>>> fold the
>>>>> Index: tools/opt/opt.cpp
>>>>> ===================================================================
>>>>> --- tools/opt/opt.cpp(revision 183763)
>>>>> +++ tools/opt/opt.cpp(working copy)
>>>>> @@ -668,6 +668,9 @@
>>>>> FPasses.reset(new FunctionPassManager(M.get()));
>>>>> if (TD)
>>>>> FPasses->add(new DataLayout(*TD));
>>>>> + if (TM.get())
>>>>> + TM->addAnalysisPasses(*FPasses);
>>>>> +
>>>>> }
>>>>> 
>>>>> if (PrintBreakpoints) {
>>>>> Index: lib/Analysis/TargetTransformInfo.cpp
>>>>> ===================================================================
>>>>> --- lib/Analysis/TargetTransformInfo.cpp(revision 183763)
>>>>> +++ lib/Analysis/TargetTransformInfo.cpp(working copy)
>>>>> @@ -88,6 +88,8 @@
>>>>> return PrevTTI->getUserCost(U);
>>>>> }
>>>>> 
>>>>> +bool TargetTransformInfo::hasBranchDivergence() const { return false; }
>>>>> +
>>>>> bool TargetTransformInfo::isLoweredToCall(const Function *F) const {
>>>>> return PrevTTI->isLoweredToCall(F);
>>>>> }
>>>>> Index: lib/Target/R600/AMDGPUTargetMachine.cpp
>>>>> ===================================================================
>>>>> --- lib/Target/R600/AMDGPUTargetMachine.cpp(revision 183763)
>>>>> +++ lib/Target/R600/AMDGPUTargetMachine.cpp(working copy)
>>>>> @@ -105,6 +105,18 @@
>>>>> return new AMDGPUPassConfig(this, PM);
>>>>> }
>>>>> 
>>>>> +//===----------------------------------------------------------------------===//
>>>>> +// AMDGPU Analysis Pass Setup
>>>>> +//===----------------------------------------------------------------------===//
>>>>> +
>>>>> +void AMDGPUTargetMachine::addAnalysisPasses(PassManagerBase&PM) {
>>>>> + // Add first the target-independent BasicTTI pass, then our AMDGPU
>>>>> pass. This
>>>>> + // allows the AMDGPU pass to delegate to the target independent
>>>>> layer when
>>>>> + // appropriate.
>>>>> + PM.add(createBasicTargetTransformInfoPass(getTargetLowering()));
>>>>> + PM.add(createAMDGPUTargetTransformInfoPass(this));
>>>>> 
>>>>> I'm a little confused about why we need to add a
>>>>> BasicTargetTransformInfo and
>>>>> also an AMDGPUTargetTransformInfo. What is the reason for this?
>>>>> 
>>>>> 
>>>>> +}
>>>>> +
>>>>> bool
>>>>> AMDGPUPassConfig::addPreISel() {
>>>>> const AMDGPUSubtarget&ST = TM->getSubtarget<AMDGPUSubtarget>();
>>>>> Index: lib/Target/R600/AMDGPUTargetTransformInfo.cpp
>>>>> ===================================================================
>>>>> --- lib/Target/R600/AMDGPUTargetTransformInfo.cpp(revision 0)
>>>>> +++ lib/Target/R600/AMDGPUTargetTransformInfo.cpp(revision 0)
>>>>> 
>>>>> It looks like you forgot to add this new file to CMakeLists.txt.
>>>>> 
>>>>> 
>>>>> @@ -0,0 +1,90 @@
>>>>> +//===-- AMDGPUTargetTransformInfo.cpp - AMDGPU specific TTI pass
>>>>> +//----------------===//
>>>>> +//
>>>>> +// The LLVM Compiler Infrastructure
>>>>> +//
>>>>> +// This file is distributed under the University of Illinois Open Source
>>>>> +// License. See LICENSE.TXT for details.
>>>>> +//
>>>>> +//===----------------------------------------------------------------------===//
>>>>> +/// \file
>>>>> +/// This file implements a TargetTransformInfo analysis pass specific
>>>>> to the
>>>>> +/// AMDGPU target machine. It uses the target's detailed information
>>>>> to provide
>>>>> +/// more precise answers to certain TTI queries, while letting the target
>>>>> +/// independent and default TTI implementations handle the rest.
>>>>> +///
>>>>> +//===----------------------------------------------------------------------===//
>>>>> +
>>>>> +#define DEBUG_TYPE "AMDGPUtti"
>>>>> +#include "AMDGPU.h"
>>>>> +#include "AMDGPUTargetMachine.h"
>>>>> +#include "llvm/Analysis/TargetTransformInfo.h"
>>>>> +#include "llvm/Support/Debug.h"
>>>>> +#include "llvm/Target/TargetLowering.h"
>>>>> +#include "llvm/Target/CostTable.h"
>>>>> +using namespace llvm;
>>>>> +
>>>>> +// Declare the pass initialization routine locally as target-specific
>>>>> passes
>>>>> +// don't have a target-wide initialization entry point, and so we
>>>>> rely on the
>>>>> +// pass constructor initialization.
>>>>> +namespace llvm {
>>>>> +void initializeAMDGPUTTIPass(PassRegistry&);
>>>>> +}
>>>>> +
>>>>> +namespace {
>>>>> +
>>>>> +class AMDGPUTTI : public ImmutablePass, public TargetTransformInfo {
>>>>> + const AMDGPUTargetMachine *TM;
>>>>> + const AMDGPUSubtarget *ST;
>>>>> + const AMDGPUTargetLowering *TLI;
>>>>> +
>>>>> + /// Estimate the overhead of scalarizing an instruction. Insert and
>>>>> Extract
>>>>> + /// are set if the result needs to be inserted and/or extracted from
>>>>> vectors.
>>>>> + unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool
>>>>> Extract) const;
>>>>> +
>>>>> +public:
>>>>> + AMDGPUTTI() : ImmutablePass(ID), TM(0), ST(0), TLI(0) {
>>>>> + llvm_unreachable("This pass cannot be directly constructed");
>>>>> + }
>>>>> +
>>>>> + AMDGPUTTI(const AMDGPUTargetMachine *TM)
>>>>> + : ImmutablePass(ID), TM(TM), ST(TM->getSubtargetImpl()),
>>>>> + TLI(TM->getTargetLowering()) {
>>>>> + initializeAMDGPUTTIPass(*PassRegistry::getPassRegistry());
>>>>> + }
>>>>> +
>>>>> + virtual void initializePass() { pushTTIStack(this); }
>>>>> +
>>>>> + virtual void finalizePass() { popTTIStack(); }
>>>>> +
>>>>> + virtual void getAnalysisUsage(AnalysisUsage&AU) const {
>>>>> + TargetTransformInfo::getAnalysisUsage(AU);
>>>>> + }
>>>>> +
>>>>> + /// Pass identification.
>>>>> + static char ID;
>>>>> +
>>>>> + /// Provide necessary pointer adjustments for the two base classes.
>>>>> + virtual void *getAdjustedAnalysisPointer(const void *ID) {
>>>>> + if (ID ==&TargetTransformInfo::ID)
>>>>> + return (TargetTransformInfo *)this;
>>>>> + return this;
>>>>> + }
>>>>> +
>>>>> + virtual bool hasBranchDivergence() const;
>>>>> +
>>>>> + /// @}
>>>>> +};
>>>>> +
>>>>> +} // end anonymous namespace
>>>>> +
>>>>> +INITIALIZE_AG_PASS(AMDGPUTTI, TargetTransformInfo, "AMDGPUtti",
>>>>> + "AMDGPU Target Transform Info", true, true, false)
>>>>> +char AMDGPUTTI::ID = 0;
>>>>> +
>>>>> +ImmutablePass *
>>>>> +llvm::createAMDGPUTargetTransformInfoPass(const AMDGPUTargetMachine
>>>>> *TM) {
>>>>> + return new AMDGPUTTI(TM);
>>>>> +}
>>>>> +
>>>>> +bool AMDGPUTTI::hasBranchDivergence() const { return true; }
>>>>> Index: lib/Target/R600/AMDGPUTargetMachine.h
>>>>> ===================================================================
>>>>> --- lib/Target/R600/AMDGPUTargetMachine.h(revision 183763)
>>>>> +++ lib/Target/R600/AMDGPUTargetMachine.h(working copy)
>>>>> @@ -63,6 +63,9 @@
>>>>> }
>>>>> virtual const DataLayout *getDataLayout() const { return&Layout; }
>>>>> virtual TargetPassConfig *createPassConfig(PassManagerBase&PM);
>>>>> +
>>>>> + /// \brief Register R600 analysis passes with a pass manager.
>>>>> + virtual void addAnalysisPasses(PassManagerBase&PM);
>>>>> };
>>>>> 
>>>>> } // End namespace llvm
>>>>> Index: lib/Target/R600/AMDGPU.h
>>>>> ===================================================================
>>>>> --- lib/Target/R600/AMDGPU.h(revision 183763)
>>>>> +++ lib/Target/R600/AMDGPU.h(working copy)
>>>>> @@ -46,6 +46,10 @@
>>>>> FunctionPass *createAMDGPUIndirectAddressingPass(TargetMachine&tm);
>>>>> FunctionPass *createAMDGPUISelDag(TargetMachine&tm);
>>>>> 
>>>>> +/// \brief Creates an AMDGPU-specific Target Transformation Info pass.
>>>>> +ImmutablePass *
>>>>> + createAMDGPUTargetTransformInfoPass(const AMDGPUTargetMachine *TM);
>>>>> +
>>>>> extern Target TheAMDGPUTarget;
>>>>> 
>>>>> } // End namespace llvm
>>>>> Index: lib/Transforms/Utils/SimplifyCFG.cpp
>>>>> ===================================================================
>>>>> --- lib/Transforms/Utils/SimplifyCFG.cpp(revision 184607)
>>>>> +++ lib/Transforms/Utils/SimplifyCFG.cpp(working copy)
>>>>> @@ -17,8 +17,10 @@
>>>>> #include "llvm/ADT/STLExtras.h"
>>>>> #include "llvm/ADT/SetVector.h"
>>>>> #include "llvm/ADT/SmallPtrSet.h"
>>>>> +#include "llvm/ADT/SmallSet.h"
>>>>> #include "llvm/ADT/SmallVector.h"
>>>>> #include "llvm/ADT/Statistic.h"
>>>>> +#include "llvm/Analysis/AliasAnalysis.h"
>>>>> #include "llvm/Analysis/InstructionSimplify.h"
>>>>> #include "llvm/Analysis/TargetTransformInfo.h"
>>>>> #include "llvm/Analysis/ValueTracking.h"
>>>>> @@ -63,6 +65,10 @@
>>>>> HoistCondStores("simplifycfg-hoist-cond-stores", cl::Hidden,
>>>>> cl::init(true),
>>>>> cl::desc("Hoist conditional stores if an unconditional store preceeds"));
>>>>> 
>>>>> +static cl::opt<bool>
>>>>> + ParallelAndOr("simplifycfg-parallel-and-or", cl::Hidden, cl::init(true),
>>>>> + cl::desc("Use parallel-and-or mode for branch conditions"));
>>>>> +
>>>>> STATISTIC(NumBitMaps, "Number of switch instructions turned into
>>>>> bitmaps");
>>>>> STATISTIC(NumLookupTables, "Number of switch instructions turned into
>>>>> lookup tables");
>>>>> STATISTIC(NumSinkCommons, "Number of common instructions sunk down to
>>>>> the end block");
>>>>> @@ -88,6 +94,7 @@
>>>>> class SimplifyCFGOpt {
>>>>> const TargetTransformInfo&TTI;
>>>>> const DataLayout *const TD;
>>>>> + AliasAnalysis *AA;
>>>>> 
>>>>> Value *isValueEqualityComparison(TerminatorInst *TI);
>>>>> BasicBlock *GetValueEqualityComparisonCases(TerminatorInst *TI,
>>>>> @@ -105,10 +112,25 @@
>>>>> bool SimplifyIndirectBr(IndirectBrInst *IBI);
>>>>> bool SimplifyUncondBranch(BranchInst *BI, IRBuilder<>  &Builder);
>>>>> bool SimplifyCondBranch(BranchInst *BI, IRBuilder<>&Builder);
>>>>> + /// \brief Use parallel-and or parallel-or to generate conditions for
>>>>> + /// conditional branches.
>>>>> + bool SimplifyParallelAndOr(BasicBlock *BB, IRBuilder<>  &Builder,
>>>>> Pass *P = 0);
>>>>> + /// \brief If \param BB is the merge block of an if-region, attempt
>>>>> to merge
>>>>> + /// the if-region with an adjacent if-region upstream if two if-regions
>>>>> + /// contain identical instructions.
>>>>> + bool MergeIfRegion(BasicBlock *BB, IRBuilder<>  &Builder, Pass *P = 0);
>>>>> + /// \brief Compare a pair of blocks: \param Block1 and \param
>>>>> Block2, which
>>>>> + /// are from two if-regions whose head blocks are \param Head1 and
>>>>> \param
>>>>> + /// Head2. \returns true if \param Block1 and \param Block2 contain
>>>>> identical
>>>>> + /// instructions, and none of the instructions alias with \param Head2.
>>>>> + /// This is used as a legality check for merging if-regions.
>>>>> + bool CompareBlock(BasicBlock *Head1, BasicBlock *Head2, BasicBlock
>>>>> *Block1,
>>>>> + BasicBlock *Block2);
>>>>> 
>>>>> public:
>>>>> - SimplifyCFGOpt(const TargetTransformInfo&TTI, const DataLayout *TD)
>>>>> - : TTI(TTI), TD(TD) {}
>>>>> + SimplifyCFGOpt(const TargetTransformInfo&TTI, const DataLayout *TD,
>>>>> + AliasAnalysis *aa)
>>>>> + : TTI(TTI), TD(TD), AA(aa) {}
>>>>> bool run(BasicBlock *BB);
>>>>> };
>>>>> }
>>>>> @@ -195,8 +217,8 @@
>>>>> }
>>>>> 
>>>>> 
>>>>> -/// GetIfCondition - Given a basic block (BB) with two predecessors
>>>>> (and at
>>>>> -/// least one PHI node in it), check to see if the merge at this
>>>>> block is due
>>>>> +/// GetIfCondition - Given a basic block (BB) with two predecessors,
>>>>> +/// check to see if the merge at this block is due
>>>>> /// to an "if condition". If so, return the boolean condition that
>>>>> determines
>>>>> /// which entry into BB will be taken. Also, return by references the
>>>>> block
>>>>> /// that will be entered from if the condition is true, and the block
>>>>> that will
>>>>> @@ -206,12 +228,31 @@
>>>>> /// instructions in them.
>>>>> static Value *GetIfCondition(BasicBlock *BB, BasicBlock *&IfTrue,
>>>>> BasicBlock *&IfFalse) {
>>>>> - PHINode *SomePHI = cast<PHINode>(BB->begin());
>>>>> - assert(SomePHI->getNumIncomingValues() == 2&&
>>>>> - "Function can only handle blocks with 2 predecessors!");
>>>>> - BasicBlock *Pred1 = SomePHI->getIncomingBlock(0);
>>>>> - BasicBlock *Pred2 = SomePHI->getIncomingBlock(1);
>>>>> + PHINode *SomePHI = dyn_cast<PHINode>(BB->begin());
>>>>> + BasicBlock *Pred1 = NULL;
>>>>> + BasicBlock *Pred2 = NULL;
>>>>> 
>>>>> + if (SomePHI) {
>>>>> + if (SomePHI->getNumIncomingValues() != 2)
>>>>> + return NULL;
>>>>> + Pred1 = SomePHI->getIncomingBlock(0);
>>>>> + Pred2 = SomePHI->getIncomingBlock(1);
>>>>> + } else {
>>>>> + SmallSetVector<BasicBlock *, 16>  Preds(pred_begin(BB), pred_end(BB));
>>>>> + for (SmallSetVector<BasicBlock *, 16>::iterator PI = Preds.begin(),
>>>>> + PE = Preds.end();
>>>>> + PI != PE; ++PI) {
>>>>> + if (Pred1 == NULL)
>>>>> + Pred1 = *PI;
>>>>> + else if (Pred2 == NULL)
>>>>> + Pred2 = *PI;
>>>>> + else
>>>>> + return NULL;
>>>>> + }
>>>>> + if (!Pred1 || !Pred2)
>>>>> + return NULL;
>>>>> + }
>>>>> +
>>>>> // We can only handle branches. Other control flow will be lowered to
>>>>> // branches if possible anyway.
>>>>> BranchInst *Pred1Br = dyn_cast<BranchInst>(Pred1->getTerminator());
>>>>> @@ -4039,6 +4080,402 @@
>>>>> return false;
>>>>> }
>>>>> 
>>>>> +/// If \param [in] BB has more than one predecessor that is a conditional
>>>>> +/// branch, attempt to use parallel and/or for the branch condition.
>>>>> \returns
>>>>> +/// true on success.
>>>>> +///
>>>>> +/// Before:
>>>>> +/// ......
>>>>> +/// %cmp10 = fcmp une float %tmp1, %tmp2
>>>>> +/// br i1 %cmp1, label %if.then, label %lor.rhs
>>>>> +///
>>>>> +/// lor.rhs:
>>>>> +/// ......
>>>>> +/// %cmp11 = fcmp une float %tmp3, %tmp4
>>>>> +/// br i1 %cmp11, label %if.then, label %ifend
>>>>> +///
>>>>> +/// if.end: // the merge block
>>>>> +/// ......
>>>>> +///
>>>>> +/// if.then: // has two predecessors, both of them contains
>>>>> conditional branch.
>>>>> +/// ......
>>>>> +/// br label %if.end;
>>>>> +///
>>>>> +/// After:
>>>>> +/// ......
>>>>> +/// %cmp10 = fcmp une float %tmp1, %tmp2
>>>>> +/// ......
>>>>> +/// %cmp11 = fcmp une float %tmp3, %tmp4
>>>>> +/// %cmp12 = or i1 %cmp10, %cmp11 // parallel-or mode.
>>>>> +/// br i1 %cmp12, label %if.then, label %ifend
>>>>> +///
>>>>> +/// if.end:
>>>>> +/// ......
>>>>> +///
>>>>> +/// if.then:
>>>>> +/// ......
>>>>> +/// br label %if.end;
>>>>> +///
>>>>> +/// Current implementation handles two cases.
>>>>> +/// Case 1: \param BB is on the else-path.
>>>>> +///
>>>>> +/// BB1
>>>>> +/// / |
>>>>> +/// BB2 |
>>>>> +/// / \ |
>>>>> +/// BB3 \ | where, BB1, BB2 contain conditional branches.
>>>>> +/// \ | / BB3 contains unconditional branch.
>>>>> +/// \ | / BB4 corresponds to \param BB which is also the merge.
>>>>> +/// BB =>  BB4
>>>>> +///
>>>>> +///
>>>>> +/// Corresponding source code:
>>>>> +///
>>>>> +/// if (a == b&&  c == d)
>>>>> +/// statement; // BB3
>>>>> +///
>>>>> +/// Case 2: \param BB BB is on the then-path.
>>>>> +///
>>>>> +/// BB1
>>>>> +/// / |
>>>>> +/// | BB2
>>>>> +/// \ / | where BB1, BB2 contain conditional branches.
>>>>> +/// BB =>  BB3 | BB3 contains unconditiona branch and corresponds
>>>>> +/// \ / to \param BB. BB4 is the merge.
>>>>> +/// BB4
>>>>> +///
>>>>> +/// Corresponding source code:
>>>>> +///
>>>>> +/// if (a == b || c == d)
>>>>> +/// statement; // BB3
>>>>> +///
>>>>> +/// In both cases, \param BB is the common successor of conditional
>>>>> branches.
>>>>> +/// In Case 1, \param BB (BB4) has an unconditional branch (BB3) as
>>>>> +/// its predecessor. In Case 2, \param BB (BB3) only has conditional
>>>>> branches
>>>>> +/// as its predecessors.
>>>>> +///
>>>>> +bool SimplifyCFGOpt::SimplifyParallelAndOr(BasicBlock *BB,
>>>>> IRBuilder<>  &Builder,
>>>>> + Pass *P) {
>>>>> + PHINode *PHI = dyn_cast<PHINode>(&BB->front());
>>>>> + if (PHI)
>>>>> + return false; // For simplicity, avoid cases containing PHI nodes.
>>>>> +
>>>>> + BasicBlock *LCond = NULL;
>>>>> + BasicBlock *FCond = NULL;
>>>>> + BasicBlock *UCond = NULL;
>>>>> + int Idx = -1;
>>>>> +
>>>>> + SmallSetVector<BasicBlock *, 16>  Preds(pred_begin(BB), pred_end(BB));
>>>>> + // Check predecessors of \param BB.
>>>>> + for (SmallSetVector<BasicBlock *, 16>::iterator PI = Preds.begin(),
>>>>> + PE = Preds.end();
>>>>> + PI != PE; ++PI) {
>>>>> + BasicBlock *Pred = *PI;
>>>>> + TerminatorInst *PTI = Pred->getTerminator();
>>>>> + BranchInst *PBI = dyn_cast<BranchInst>(PTI);
>>>>> +
>>>>> + // All predecessors should terminate with a branch.
>>>>> + if (!PBI)
>>>>> + return false;
>>>>> +
>>>>> + BasicBlock *PP = Pred->getSinglePredecessor();
>>>>> +
>>>>> + if (PBI->isUnconditional()) {
>>>>> + // Case 1: Pred (BB3) is an unconditional block, it should
>>>>> + // have a single successor and a single predecessor (BB2) that
>>>>> + // is also a predecessor of \param BB (BB4) and should not have
>>>>> + // address-taken. There should exist only one such unconditional
>>>>> + // branch among the predecessors.
>>>>> + if (UCond || !PP || (Preds.count(PP) == 0) ||
>>>>> + (PTI->getNumSuccessors() != 1) || Pred->hasAddressTaken())
>>>>> + return false;
>>>>> +
>>>>> + UCond = Pred;
>>>>> + continue;
>>>>> + }
>>>>> +
>>>>> + // Only conditional branches are allowed beyond this point.
>>>>> + if (!PBI->isConditional())
>>>>> + return false;
>>>>> +
>>>>> + // Condition's unique use should be the branch instruction.
>>>>> + Value *PC = PBI->getCondition();
>>>>> + if (!PC || (PC->getNumUses() != 1))
>>>>> + return false;
>>>>> +
>>>>> + if (PP&&  (Preds.count(PP) != 0)) {
>>>>> + // These are internal condition blocks to be merged from, e.g.,
>>>>> + // BB2 in both cases.
>>>>> + // Should not be address-taken.
>>>>> + if (Pred->hasAddressTaken())
>>>>> + return false;
>>>>> +
>>>>> + // Instructions in the internal condition blocks should be safe
>>>>> + // to hoist up.
>>>>> + for (BasicBlock::iterator BI = Pred->begin(), BE = PBI; BI != BE;) {
>>>>> + Instruction *CI = BI++;
>>>>> + if (isa<PHINode>(CI) || CI->mayHaveSideEffects() ||
>>>>> + !isSafeToSpeculativelyExecute(CI))
>>>>> + return false;
>>>>> + }
>>>>> + } else {
>>>>> + // This is the condition block to be merged into, e.g. BB1 in
>>>>> + // both cases.
>>>>> + if (FCond)
>>>>> + return false;
>>>>> + FCond = Pred;
>>>>> + }
>>>>> +
>>>>> + // The terminator must have exactly two successors.
>>>>> + if (PTI->getNumSuccessors() != 2)
>>>>> + return false;
>>>>> +
>>>>> + // Find whether BB is uniformly on the true (or false) path
>>>>> + // for all of its predecessors.
>>>>> + BasicBlock *PS1 = PTI->getSuccessor(0);
>>>>> + BasicBlock *PS2 = PTI->getSuccessor(1);
>>>>> + BasicBlock *PS = (PS1 == BB) ? PS2 : PS1;
>>>>> + int CIdx = (PS1 == BB) ? 0 : 1;
>>>>> +
>>>>> + if (Idx == -1)
>>>>> + Idx = CIdx;
>>>>> + else if (CIdx != Idx)
>>>>> + return false;
>>>>> +
>>>>> + // PS is the successor which is not BB. Check successors to identify
>>>>> + // the last conditional branch.
>>>>> + if (Preds.count(PS) == 0) {
>>>>> + // Case 2.
>>>>> + // BB must have an unique successor.
>>>>> + TerminatorInst *TBB = BB->getTerminator();
>>>>> + if (TBB->getNumSuccessors() != 1)
>>>>> + return false;
>>>>> +
>>>>> + BasicBlock *SBB = TBB->getSuccessor(0);
>>>>> + PHI = dyn_cast<PHINode>(&SBB->front());
>>>>> + if (PHI)
>>>>> + return false;
>>>>> +
>>>>> + // PS (BB4) should be BB's successor.
>>>>> + if (SBB != PS)
>>>>> + return false;
>>>>> + LCond = Pred;
>>>>> + } else {
>>>>> + TerminatorInst *TPS = PS->getTerminator();
>>>>> + BranchInst *BPS = dyn_cast<BranchInst>(TPS);
>>>>> + if (BPS&&  BPS->isUnconditional()) {
>>>>> + // Case 1: PS(BB3) should be an unconditional branch.
>>>>> + LCond = Pred;
>>>>> + }
>>>>> + }
>>>>> + }
>>>>> +
>>>>> + if (FCond&&  LCond&&  (FCond != LCond)) {
>>>>> + // Do the transformation.
>>>>> + BasicBlock *CB;
>>>>> + bool ITER = true;
>>>>> + BasicBlock::iterator ItOld = Builder.GetInsertPoint();
>>>>> + TerminatorInst *PTI = FCond->getTerminator();
>>>>> + BranchInst *PBI = dyn_cast<BranchInst>(PTI);
>>>>> + Value *PC = PBI->getCondition();
>>>>> + do {
>>>>> + CB = PBI->getSuccessor(1 - Idx);
>>>>> + // Delete the conditional branch.
>>>>> + FCond->getInstList().pop_back();
>>>>> + FCond->getInstList().splice(FCond->end(), CB->getInstList());
>>>>> + PTI = FCond->getTerminator();
>>>>> + PBI = dyn_cast<BranchInst>(PTI);
>>>>> + Value *CC = PBI->getCondition();
>>>>> + // Merge conditions.
>>>>> + Builder.SetInsertPoint(PTI);
>>>>> + Value *NC;
>>>>> + if (Idx == 0)
>>>>> + // Case 2, use parallel or.
>>>>> + NC = Builder.CreateOr(PC, CC);
>>>>> + else
>>>>> + // Case 1, use parallel and.
>>>>> + NC = Builder.CreateAnd(PC, CC);
>>>>> +
>>>>> + PBI->replaceUsesOfWith(CC, NC);
>>>>> + PC = NC;
>>>>> + if (CB == LCond)
>>>>> + ITER = false;
>>>>> + // Remove internal conditional branches.
>>>>> + CB->dropAllReferences();
>>>>> + // make CB unreachable and let downstream to delete the block.
>>>>> + new UnreachableInst(CB->getContext(), CB);
>>>>> + } while (ITER);
>>>>> +
>>>>> + Builder.SetInsertPoint(ItOld);
>>>>> + DEBUG(dbgs()<<  "Use parallel and/or in:\n"<<  *FCond);
>>>>> + return true;
>>>>> + }
>>>>> +
>>>>> + return false;
>>>>> +}
>>>>> +
>>>>> +/// Compare blocks from two if-regions, where \param Head1 is the
>>>>> head of the
>>>>> +/// 1st if-region. \param Head2 is the head of the 2nd if-region. \param
>>>>> +/// Block1 is a block in the 1st if-region to compare. \param Block2
>>>>> is a block
>>>>> +// in the 2nd if-region to compare. \returns true if the blocks have
>>>>> identical
>>>>> +/// instructions that do not alias with \param Head2 and it is legal
>>>>> to merge
>>>>> +/// the two blocks so that only one instance of each instruction is kept.
>>>>> +///
>>>>> +bool SimplifyCFGOpt::CompareBlock(BasicBlock *Head1, BasicBlock *Head2,
>>>>> + BasicBlock *Block1, BasicBlock *Block2) {
>>>>> + TerminatorInst *PTI2 = Head2->getTerminator();
>>>>> + Instruction *PBI2 = Head2->begin();
>>>>> +
>>>>> + if (Block1 == Head1) {
>>>>> + if (Block2 != Head2)
>>>>> + return false;
>>>>> + } else if (Block2 == Head2)
>>>>> + return false;
>>>>> + else {
>>>>> + // Check whether instructions in Block1 and Block2 are identical
>>>>> + // and do not alias with instructions in Head2.
>>>>> + BasicBlock::iterator iter1 = Block1->begin();
>>>>> + BasicBlock::iterator end1 = Block1->getTerminator();
>>>>> + BasicBlock::iterator iter2 = Block2->begin();
>>>>> + BasicBlock::iterator end2 = Block2->getTerminator();
>>>>> +
>>>>> + while (1) {
>>>>> + if (iter1 == end1) {
>>>>> + if (iter2 != end2)
>>>>> + return false;
>>>>> + break;
>>>>> + }
>>>>> +
>>>>> + if (!iter1->isIdenticalTo(iter2))
>>>>> + return false;
>>>>> +
>>>>> + // Illegal to remove instructions with side effects except
>>>>> + // non-volatile stores.
>>>>> + if (iter1->mayHaveSideEffects()) {
>>>>> + Instruction *CurI =&*iter1;
>>>>> + StoreInst *SI = dyn_cast<StoreInst>(CurI);
>>>>> + if (!SI || SI->isVolatile())
>>>>> + return false;
>>>>> + }
>>>>> +
>>>>> + // For simplicity and speed, data dependency check can be
>>>>> + // avoided if read from memory doesn't exist.
>>>>> + if (iter1->mayReadFromMemory())
>>>>> + return false;
>>>>> +
>>>>> + if (iter1->mayWriteToMemory()) {
>>>>> + for (BasicBlock::iterator BI = PBI2, BE = PTI2; BI != BE; ++BI) {
>>>>> + if (BI->mayReadFromMemory() || BI->mayWriteToMemory()) {
>>>>> + // Check alias with Head2.
>>>>> + if (!AA || AA->alias(iter1, BI))
>>>>> + return false;
>>>>> + }
>>>>> + }
>>>>> + }
>>>>> + ++iter1;
>>>>> + ++iter2;
>>>>> + }
>>>>> + ;
>>>>> + }
>>>>> +
>>>>> + return true;
>>>>> +}
>>>>> +
>>>>> +/// Check whether \param BB is the merge block of a if-region. If
>>>>> yes, check
>>>>> +/// whether there exists an adjacent if-region upstream, the two
>>>>> if-regions
>>>>> +/// contain identical instuctions and can be legally merged. \returns
>>>>> true if
>>>>> +/// the two if-regions are merged.
>>>>> +///
>>>>> +/// From:
>>>>> +/// if (a)
>>>>> +/// statement;
>>>>> +/// if (b)
>>>>> +/// statement;
>>>>> +///
>>>>> +/// To:
>>>>> +/// if (a || b)
>>>>> +/// statement;
>>>>> +///
>>>>> +bool SimplifyCFGOpt::MergeIfRegion(BasicBlock *BB, IRBuilder<>  &Builder,
>>>>> + Pass *P) {
>>>>> + BasicBlock *IfTrue2, *IfFalse2;
>>>>> + Value *IfCond2 = GetIfCondition(BB, IfTrue2, IfFalse2);
>>>>> + if (!IfCond2)
>>>>> + return false;
>>>>> +
>>>>> + Instruction *CInst2 = dyn_cast<Instruction>(IfCond2);
>>>>> + if (!CInst2)
>>>>> + return false;
>>>>> +
>>>>> + BasicBlock *Head2 = CInst2->getParent();
>>>>> + if (Head2->hasAddressTaken())
>>>>> + return false;
>>>>> +
>>>>> + BasicBlock *IfTrue1, *IfFalse1;
>>>>> + Value *IfCond1 = GetIfCondition(Head2, IfTrue1, IfFalse1);
>>>>> + if (!IfCond1)
>>>>> + return false;
>>>>> +
>>>>> + Instruction *CInst1 = dyn_cast<Instruction>(IfCond1);
>>>>> + if (!CInst1)
>>>>> + return false;
>>>>> +
>>>>> + BasicBlock *Head1 = CInst1->getParent();
>>>>> +
>>>>> + // Either then-path or else-path should be empty.
>>>>> + if ((IfTrue1 != Head1)&&  (IfFalse1 != Head1))
>>>>> + return false;
>>>>> + if ((IfTrue2 != Head2)&&  (IfFalse2 != Head2))
>>>>> + return false;
>>>>> +
>>>>> + TerminatorInst *PTI2 = Head2->getTerminator();
>>>>> + Instruction *PBI2 = Head2->begin();
>>>>> +
>>>>> + if (!CompareBlock(Head1, Head2, IfTrue1, IfTrue2))
>>>>> + return false;
>>>>> +
>>>>> + if (!CompareBlock(Head1, Head2, IfFalse1, IfFalse2))
>>>>> + return false;
>>>>> +
>>>>> + // Check whether \param Head2 has side-effect and is safe to speculate.
>>>>> + for (BasicBlock::iterator BI = PBI2, BE = PTI2; BI != BE; ++BI) {
>>>>> + Instruction *CI = BI;
>>>>> + if (isa<PHINode>(CI) || CI->mayHaveSideEffects() ||
>>>>> + !isSafeToSpeculativelyExecute(CI))
>>>>> + return false;
>>>>> + }
>>>>> +
>>>>> + // Merge \param Head2 into \param Head1.
>>>>> + Head1->getInstList().pop_back();
>>>>> + Head1->getInstList().splice(Head1->end(), Head2->getInstList());
>>>>> + TerminatorInst *PTI = Head1->getTerminator();
>>>>> + BranchInst *PBI = dyn_cast<BranchInst>(PTI);
>>>>> + Value *CC = PBI->getCondition();
>>>>> + BasicBlock::iterator ItOld = Builder.GetInsertPoint();
>>>>> + Builder.SetInsertPoint(PTI);
>>>>> + Value *NC = Builder.CreateOr(CInst1, CC);
>>>>> + PBI->replaceUsesOfWith(CC, NC);
>>>>> + Builder.SetInsertPoint(ItOld);
>>>>> +
>>>>> + // Remove IfTrue1
>>>>> + if (IfTrue1 != Head1) {
>>>>> + IfTrue1->dropAllReferences();
>>>>> + IfTrue1->eraseFromParent();
>>>>> + }
>>>>> +
>>>>> + // Remove IfFalse1
>>>>> + if (IfFalse1 != Head1) {
>>>>> + IfFalse1->dropAllReferences();
>>>>> + IfFalse1->eraseFromParent();
>>>>> + }
>>>>> +
>>>>> + // Remove \param Head2
>>>>> + Head2->dropAllReferences();
>>>>> + Head2->eraseFromParent();
>>>>> + DEBUG(dbgs()<<  "If conditions merged into:\n"<<  *Head1);
>>>>> + return true;
>>>>> +}
>>>>> +
>>>>> /// Check if passing a value to an instruction will cause undefined
>>>>> behavior.
>>>>> static bool passingValueIsAlwaysUndefined(Value *V, Instruction *I) {
>>>>> Constant *C = dyn_cast<Constant>(V);
>>>>> @@ -4142,6 +4579,10 @@
>>>>> 
>>>>> IRBuilder<>  Builder(BB);
>>>>> 
>>>>> + if (ParallelAndOr&&  TTI.hasBranchDivergence()&&
>>>>> + SimplifyParallelAndOr(BB, Builder))
>>>>> + return true;
>>>>> +
>>>>> // If there is a trivial two-entry PHI node in this basic block, and
>>>>> we can
>>>>> // eliminate it, do so now.
>>>>> if (PHINode *PN = dyn_cast<PHINode>(BB->begin()))
>>>>> @@ -4169,6 +4610,9 @@
>>>>> if (SimplifyIndirectBr(IBI)) return true;
>>>>> }
>>>>> 
>>>>> + if (ParallelAndOr&&  TTI.hasBranchDivergence()&&  MergeIfRegion(BB,
>>>>> Builder))
>>>>> + return true;
>>>>> +
>>>>> return Changed;
>>>>> }
>>>>> 
>>>>> @@ -4178,6 +4622,6 @@
>>>>> /// of the CFG. It returns true if a modification was made.
>>>>> ///
>>>>> bool llvm::SimplifyCFG(BasicBlock *BB, const TargetTransformInfo&TTI,
>>>>> - const DataLayout *TD) {
>>>>> - return SimplifyCFGOpt(TTI, TD).run(BB);
>>>>> + const DataLayout *TD, AliasAnalysis *AA) {
>>>>> + return SimplifyCFGOpt(TTI, TD, AA).run(BB);
>>>>> }
>>>>> Index: lib/Transforms/Scalar/SimplifyCFGPass.cpp
>>>>> ===================================================================
>>>>> --- lib/Transforms/Scalar/SimplifyCFGPass.cpp(revision 184607)
>>>>> +++ lib/Transforms/Scalar/SimplifyCFGPass.cpp(working copy)
>>>>> @@ -27,6 +27,7 @@
>>>>> #include "llvm/ADT/SmallVector.h"
>>>>> #include "llvm/ADT/Statistic.h"
>>>>> #include "llvm/Analysis/TargetTransformInfo.h"
>>>>> +#include "llvm/Analysis/AliasAnalysis.h"
>>>>> #include "llvm/IR/Attributes.h"
>>>>> #include "llvm/IR/Constants.h"
>>>>> #include "llvm/IR/DataLayout.h"
>>>>> @@ -49,10 +50,14 @@
>>>>> 
>>>>> virtual bool runOnFunction(Function&F);
>>>>> 
>>>>> - virtual void getAnalysisUsage(AnalysisUsage&AU) const {
>>>>> - AU.addRequired<TargetTransformInfo>();
>>>>> - }
>>>>> - };
>>>>> + virtual void getAnalysisUsage(AnalysisUsage&AU) const {
>>>>> + AU.addRequired<TargetTransformInfo>();
>>>>> + AU.addRequired<AliasAnalysis>();
>>>>> + AU.addRequired<AliasAnalysis>();
>>>>> + }
>>>>> +private:
>>>>> + AliasAnalysis *AA;
>>>>> +};
>>>>> }
>>>>> 
>>>>> char CFGSimplifyPass::ID = 0;
>>>>> @@ -301,7 +306,7 @@
>>>>> /// iterativelySimplifyCFG - Call SimplifyCFG on all the blocks in the
>>>>> function,
>>>>> /// iterating until no more changes are made.
>>>>> static bool iterativelySimplifyCFG(Function&F, const
>>>>> TargetTransformInfo&TTI,
>>>>> - const DataLayout *TD) {
>>>>> + const DataLayout *TD, AliasAnalysis * AA) {
>>>>> bool Changed = false;
>>>>> bool LocalChange = true;
>>>>> while (LocalChange) {
>>>>> @@ -310,7 +315,7 @@
>>>>> // Loop over all of the basic blocks and remove them if they are
>>>>> unneeded...
>>>>> //
>>>>> for (Function::iterator BBIt = F.begin(); BBIt != F.end(); ) {
>>>>> - if (SimplifyCFG(BBIt++, TTI, TD)) {
>>>>> + if (SimplifyCFG(BBIt++, TTI, TD, AA)) {
>>>>> LocalChange = true;
>>>>> ++NumSimpl;
>>>>> }
>>>>> @@ -324,11 +329,12 @@
>>>>> // simplify the CFG.
>>>>> //
>>>>> bool CFGSimplifyPass::runOnFunction(Function&F) {
>>>>> + AA =&getAnalysis<AliasAnalysis>();
>>>>> const TargetTransformInfo&TTI = getAnalysis<TargetTransformInfo>();
>>>>> const DataLayout *TD = getAnalysisIfAvailable<DataLayout>();
>>>>> bool EverChanged = removeUnreachableBlocksFromFn(F);
>>>>> EverChanged |= mergeEmptyReturnBlocks(F);
>>>>> - EverChanged |= iterativelySimplifyCFG(F, TTI, TD);
>>>>> + EverChanged |= iterativelySimplifyCFG(F, TTI, TD, AA);
>>>>> 
>>>>> // If neither pass changed anything, we're done.
>>>>> if (!EverChanged) return false;
>>>>> @@ -342,7 +348,7 @@
>>>>> return true;
>>>>> 
>>>>> do {
>>>>> - EverChanged = iterativelySimplifyCFG(F, TTI, TD);
>>>>> + EverChanged = iterativelySimplifyCFG(F, TTI, TD, AA);
>>>>> EverChanged |= removeUnreachableBlocksFromFn(F);
>>>>> } while (EverChanged);
>>>>> 
>>>>> Index: lib/CodeGen/BasicTargetTransformInfo.cpp
>>>>> ===================================================================
>>>>> --- lib/CodeGen/BasicTargetTransformInfo.cpp(revision 183763)
>>>>> +++ lib/CodeGen/BasicTargetTransformInfo.cpp(working copy)
>>>>> @@ -63,6 +63,8 @@
>>>>> return this;
>>>>> }
>>>>> 
>>>>> + virtual bool hasBranchDivergence() const;
>>>>> +
>>>>> /// \name Scalar TTI Implementations
>>>>> /// @{
>>>>> 
>>>>> @@ -122,6 +124,7 @@
>>>>> return new BasicTTI(TLI);
>>>>> }
>>>>> 
>>>>> +bool BasicTTI::hasBranchDivergence() const { return false; }
>>>>> 
>>>>> bool BasicTTI::isLegalAddImmediate(int64_t imm) const {
>>>>> return TLI->isLegalAddImmediate(imm);
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> llvm-commits mailing list
>>>>> llvm-commits at cs.uiuc.edu<mailto:llvm-commits at cs.uiuc.edu>
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> llvm-commits mailing list
>>>>> llvm-commits at cs.uiuc.edu<mailto:llvm-commits at cs.uiuc.edu>
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>> <odc_v5>
>>>> 
>>> 
>>> 
>>> <odc_v6>
>> 
>> 
>> 
>> 
> 
> 
> <odc_v8>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130725/69f6e5f1/attachment.html>


More information about the llvm-commits mailing list