[patch] simplifyCFG for review

Fri Jul 19 14:23:15 PDT 2013

Hi Nick,

Thanks for taking the time to review this. Attached is a version that fixes all of your issues except:

1. A couple of the following patterns are not feld together since "PTI" either has more than 1 uses or its storage can be re-used.

--------------------------------------------------------------
+    TerminatorInst *PTI = Pred->getTerminator();
+    BranchInst *PBI = dyn_cast<BranchInst>(PTI);
Fold them together:
   BranchInst *PBI = dyn_cast<BranchInst>(Pred->getTerminator());
-----------------------------------------------------------------

2.  In the following pattern, "Preds" is predecessors of "BB".  "PP" is the predecessor of an element in "Preds".  "Preds.count(PP)" queries whether "PP" appear in "Preds".  This is used to identify internal nodes.  So it is not redundant.  

-----------------------------------------------------------------
+    if (PP && (Preds.count(PP) != 0)) {
This is redundant. Preds can't contain any NULL pointer entries.
------------------------------------------------------------------

3. From what I can see, "!PBI->isUnconditional" is not equivalent to "PBI->isConditional".  The implementation just checks operand counts. Could there be multi-way branches?

--------------------------------------------------------------------
+    // Only conditional branches are allowed beyond this point.
+    if (!PBI->isConditional())
+      return false;

Remove it or assert it. This condition never happens. Also, you wrote it 
as "PBI->isUnconditional" above, but "!PBI->isConditional" here?
-----------------------------------------------------------------------

4. I use clang-format, which unindents the struct.  Clang-format reports many errors in existing codes.  I have to take the pain to isolate my changes.  Life will be easier if someone can run a clang-format for all files, and make clang-format run for every check-in.

---------------------------------------------------------------------------
  namespace {
-  struct CFGSimplifyPass : public FunctionPass {
-    static char ID; // Pass identification, replacement for typeid
-    CFGSimplifyPass() : FunctionPass(ID) {
-      initializeCFGSimplifyPassPass(*PassRegistry::getPassRegistry());
-    }
+struct CFGSimplifyPass : public FunctionPass {
+  CFGSimplifyPass(char &ID, bool isOpt) : FunctionPass(ID), IsOpt(isOpt) {}
+  virtual bool runOnFunction(Function &F);

Why did you unindent the struct? Please put it back. 
-------------------------------------------------------------------------------
  
-Mei

-----Original Message-----
From: Nick Lewycky [mailto:nicholas at mxc.ca] 
Sent: Thursday, July 18, 2013 1:25 AM
To: Ye, Mei
Cc: Andrew Trick; llvm commits; Tom Stellard
Subject: Re: [patch] simplifyCFG for review

Andrew Trick wrote:
>
> On Jul 17, 2013, at 11:07 AM, Ye, Mei <Mei.Ye at amd.com
> <mailto:Mei.Ye at amd.com>> wrote:
>
>> Hi Andrew
>> Attached is a patch that follows what you suggested. Thanks for your time.
>
> I'm ok with this patch now. Someone who can test it should commit. Maybe
> Tom?

--- lib/Target/R600/AMDGPUTargetTransformInfo.cpp	(revision 0)
+++ lib/Target/R600/AMDGPUTargetTransformInfo.cpp	(revision 0)
@@ -0,0 +1,90 @@
+//===-- AMDGPUTargetTransformInfo.cpp - AMDGPU specific TTI pass
+//----------------===//

Why is this two lines?

+//===----------------------------------------------------------------------===//
+/// \file
+/// This file implements a TargetTransformInfo analysis pass specific 
to the

Two slashes for these comments, and a blank //-only line after the ruler.

+  /// \brief Compare a pair of blocks: \param Block1 and \param Block2, 
which
+  /// are from two if-regions whose head blocks are \param Head1 and \param
+  /// Head2.  \returns true if \param Block1 and \param Block2 contain 
identical
+  /// instructions, and none of the instructions alias with \param Head2.
+  /// This is used as a legality check for merging if-regions.

Despite reading this, I still don't understand it. It compares Block1 
with Block2. What does "equal" mean in this context? What is a "head 
block" and why does it matter? You say instructions alias, you mean

+  SimplifyCFGOpt(const TargetTransformInfo &TTI, const DataLayout *TD,
+                 AliasAnalysis *aa)
+      : TTI(TTI), TD(TD), AA(aa) {}

Call it 'AA' not 'aa'. Follow the existing pattern.

+    SmallSetVector<BasicBlock *, 16> Preds(pred_begin(BB), pred_end(BB));
+    for (SmallSetVector<BasicBlock *, 16>::iterator PI = Preds.begin(),
+                                                    PE = Preds.end();
+         PI != PE; ++PI) {
+      if (Pred1 == NULL)
+        Pred1 = *PI;
+      else if (Pred2 == NULL)
+        Pred2 = *PI;
+      else
+        return NULL;
+    }

This is inefficient. Remove 'Preds', make the loop iterate from 
pred_begin(BB) to pred_end(BB). Add "if (*PI == Pred1 || *PI == Pred2) 
continue;" to the top of the loop.

+    if (!Pred1 || !Pred2)
+       return NULL;

Fix indent before 'return'.

+  PHINode *PHI = dyn_cast<PHINode>(&BB->front());

You can use "BB->begin()" instead of "&BB->front()".

+  SmallSetVector<BasicBlock *, 16> Preds(pred_begin(BB), pred_end(BB));
+  // Check predecessors of \param BB.
+  for (SmallSetVector<BasicBlock *, 16>::iterator PI = Preds.begin(),
+                                                  PE = Preds.end();
+       PI != PE; ++PI) {

This is inefficient. You can use SmallPtrSet for this. You don't need to 
preserve the iteration order of pred_begin to pred_end, and SmallPtrSet 
gives you iterators (unlike SmallSet).

+    TerminatorInst *PTI = Pred->getTerminator();
+    BranchInst *PBI = dyn_cast<BranchInst>(PTI);

Fold them together:
   BranchInst *PBI = dyn_cast<BranchInst>(Pred->getTerminator());

+    // Only conditional branches are allowed beyond this point.
+    if (!PBI->isConditional())
+      return false;

Remove it or assert it. This condition never happens. Also, you wrote it 
as "PBI->isUnconditional" above, but "!PBI->isConditional" here?

+    if (!PC || (PC->getNumUses() != 1))

This is inefficient. Use !PC-hasOneUse() to avoid walking the linked 
list of uses after we know there is more than one entry.

+    if (PP && (Preds.count(PP) != 0)) {

This is redundant. Preds can't contain any NULL pointer entries.

+      PHI = dyn_cast<PHINode>(&SBB->front());

Again, SBB->begin().

+      TerminatorInst *TPS = PS->getTerminator();
+      BranchInst *BPS = dyn_cast<BranchInst>(TPS);
+      if (BPS && BPS->isUnconditional()) {

At least merge TPS into the dyn_cast<...>(TPS). You could also write
   if (BranchInst *BPS = dyn_cast<BranchInst>(PS->getTerminator())
     if (BPS->isUnconditional())
       ...

+  if (Block1 == Head1) {
+    if (Block2 != Head2)
+      return false;
+  } else if (Block2 == Head2)
+    return false;
+  else {

So if Block1 == Head1 and Block2 == Head2 this function does nothing?

Can you reword the above without the elses following returns? See 
http://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return .

+    }
+    ;
+  }

Spurious semi-colon.

+  Value *IfCond2 = GetIfCondition(BB, IfTrue2, IfFalse2);
+  if (!IfCond2)
+    return false;
+
+  Instruction *CInst2 = dyn_cast<Instruction>(IfCond2);
+  if (!CInst2)
+    return false;

You can simplify this using dyn_cast_or_null:

   Value *IfCond2 = GetIfCondition(BB, IfTrue2, IfFalse2);
   Instruction *CInst2 = dyn_cast_or_null<Instruction>(IfCond2);
   if (!CInt2)
     return false;

+  // Check whether \param Head2 has side-effect and is safe to speculate.

Ensure that ... has no side-effects?

+  Head1->getInstList().pop_back();
+  Head1->getInstList().splice(Head1->end(), Head2->getInstList());

You don't update PHI nodes after doing this. Can you show me a 
proof-sketch that Head2 can't be used in any PHI nodes in the function 
if we reach here?

+  bool OptCB = (ParallelAndOr && AA && TTI.hasBranchDivergence()) ? 
true : false;

I think the ?: is redundant here.

-  initializeCFGSimplifyPassPass(Registry);
+  initializeCFGCANONPass(Registry);
+  initializeCFGOPTPass(Registry);

OPT and CANON aren't acronyms.

  namespace {
-  struct CFGSimplifyPass : public FunctionPass {
-    static char ID; // Pass identification, replacement for typeid
-    CFGSimplifyPass() : FunctionPass(ID) {
-      initializeCFGSimplifyPassPass(*PassRegistry::getPassRegistry());
-    }
+struct CFGSimplifyPass : public FunctionPass {
+  CFGSimplifyPass(char &ID, bool isOpt) : FunctionPass(ID), IsOpt(isOpt) {}
+  virtual bool runOnFunction(Function &F);

Why did you unindent the struct? Please put it back. See 
http://llvm.org/docs/CodingStandards.html#namespace-indentation .

+                                   const DataLayout *TD, AliasAnalysis 
* AA) {

Extra space before 'AA'.

+                 const DataLayout *TD = 0, AliasAnalysis * AA = 0);

Extra space again.

Nick

>
> -Andy
>
>> On Jul 12, 2013, at 3:39 PM, Ye, Mei <Mei.Ye at amd.com
>> <mailto:Mei.Ye at amd.com>> wrote:
>>
>>
>> The fundamental issue is whether target-specific works are allowed
>> inside "machine-independent" passes. In fact, almost all compiler
>> optimizations are target-dependent. Having an good infrastructure to
>> enable target-specific tunings or optimizations will promote
>> code-sharing and code-quality. The common practice that I have seen is
>> that compiler vendors keep their work "if-defed" inside their own
>> branches and do not contribute back to the trunk. There are IP and
>> business strategy reasons. But in the long run, code base diverges and
>> merging gets more difficult if not impossible. Not only the vendors
>> will suffer, but also the larger community if less and less
>> contributions flow back into the trunk.
>> Yes, please contribute your target-dependent optimizations! Usually
>> the best way to introduce them is in target passes, which are only
>> built when building your target. You can easily do this by overriding
>> TargetPassConfig::addISelPrepare.
>> For this patch, you could make the argument that your transformations
>> are sufficiently generic that they might as well be made available in
>> SimplifyCFG.cpp. I agree.
>> However, (correct me if I'm wrong) your transformations are not
>> required to expose downstream machine-independent optimizations. They
>> should not run early in the pass pipeline. Doing so will lead to
>> divergence across targets and lack of code sharing and code quality
>> that you are afraid of. There is also no reason to run them evey time
>> we invoke the SimplifyCFG pass.
>> We want to have a core SimplifyCFG pass that can be run repeatedly to
>> canonicalize the IR. It's one of the first things we run. Introducing
>> target heuristics this early would be a mistake.
>> I realize this design goal is not obvious from the current codebase
>> and documentation. Tomorrow, I will send out a design proposal to llvm
>> to make sure contributors are working toward a common goal.
>> You could help move toward this goal though by modifying your patch to
>> run only once in the final round of SimplifyCFG (after jump-threading
>> and DSE).
>> The most straightforward way to handle this is to add a new
>> OptimizeCFG pass for all the "lowering" stuff and simply schedule it
>> after the last SimplifyCFG.
>> If you want to combine it with a round of SimplifyCFG, that's also ok,
>> but we should still have a distinct pass name. In this case you need to:
>> - Define a subclass of CFGSimplifyPass that passes a flag to the
>> CFGSimplifyPass ctor to enable lowering optimizations.
>> - Add INITIALIZE_PASS_... declarations for the new pass name
>> "optimizecfg".
>> - Add the same flag to createCFGSimplificationPass() and make the
>> trivial changes to populateModulePassManager() and
>> populateLTOPassManager().
>> See ScalerReplAggregates.cpp for an example of all this.
>>
>>
>> If we have a good repository of optimizations, each target can
>> selectively enable/disable them to avoid unexpected performance
>> impact. To safeguard the correctness, the community can create a super
>> target (just clone X86) that tests all optimizations.
>>
>> From the aspect of implementation, this patch fits into SimplifyCFG
>> quite naturally. It is often advantageous to change the
>> "machine-independent" codes since you already traverse the whole
>> function and reach the point where certain patterns are recognized in
>> a similar pass. Adding a new optimization/transformation at that point
>> is free, while creating a separated pass will have to repeat works and
>> increases compilation time. JIT and on-line compilation models are
>> very stingy in compilation time. I unfortunately live in a world of
>> compiling-from-source model, adding a new pass is prohibited in my
>> organization. Besides, this patch is not GPU specific. It is good for
>> CPU in general unless you happen to have patterns of very expensive
>> floating point comparisons. Even in that case, it might still win due
>> to reduction in branches.
>> It's a moot point if you piggy back on the last SimplifyCFG as
>> explained above. But I don't buy this argument at all. What work do
>> you need to "redo" if you run this as a separate pass? Iterating over
>> blocks? How does that compare to the cost of analyzing compound
>> if-statements every time we run SimplifyCFG. Please quantify the
>> impact of running this in a separate "LowerCFG" pass before claiming
>> that you're optimizing the pass pipeline.
>> -Andy
>>
>>
>>
>> And, yes, GPU needs to do CFG transformations. But code gen can
>> introduce new control flows. Vendors will need to recognize
>> irreducible CFG in their jitters anyway as a verification.
>>
>> -Mei
>>
>>
>> -----Original Message-----
>> From: Nick Lewycky [mailto:nicholas at mxc.ca]
>> Sent: Thursday, July 11, 2013 10:40 PM
>> To: Ye, Mei
>> Cc: Tom Stellard;llvm-commits at cs.uiuc.edu
>> <mailto:llvm-commits at cs.uiuc.edu>
>> Subject: Re: [patch] simplifyCFG for review
>>
>> The fundamental issue with this patch is that you're saying that
>> simplifycfg should be responsible for structuring the CFG differently
>> based on whether the target supports branches that go different ways in
>> different threads (a common GPU restriction).
>>
>> We try very, very hard to avoid doing target-specific things inside
>> passes like this. Doing so requires an analysis of why it couldn't be
>> done differently. (Don't GPUs have problems with irreducable CFG anyhow?
>> Don't you already need to rewrite the CFG for that anyways? Why not have
>> a GPU CFG simplification pass right before CodeGenPrep?)
>>
>> I have some comments on the details of the patch, but the above issue is
>> big and needs wider discussion and consensus first.
>>
>> Nick
>>
>> Ye, Mei wrote:
>>
>> Thanks Tom. Attached is a new patch that adds comments as requested. I
>> also made a new diff against most-recent trunk. But it looks like many
>> files have locks, so some diffs are based on an earlier revision.
>>
>> With regard to your question:
>> "I'm a little confused about why we need to add a
>> BasicTargetTransformInfo and
>> also an AMDGPUTargetTransformInfo. What is the reason for this?"
>>
>> My answer is that same thing happens to X86, ARM, and other targets. I
>> didn't track whether BasicTargetTransformInfo is always needed, but I
>> think you can build compiler for more than one targets, and for
>> targets that you do not supply a target-spefici Tranform Info, you can
>> always default back to use the basic one.
>>
>> -Mei
>>
>>
>> -----Original Message-----
>> From: Tom Stellard [mailto:tom at stellard.net]
>> Sent: Monday, July 08, 2013 11:14 AM
>> To: Ye, Mei
>> Cc: Evan Cheng; Renato Golin; Sean Silva;llvm-commits at cs.uiuc.edu
>> <mailto:llvm-commits at cs.uiuc.edu>
>> Subject: Re: [patch] simplifyCFG for review
>>
>> On Fri, Jun 28, 2013 at 09:46:13PM +0000, Ye, Mei wrote:
>>
>> Hi all
>>
>> Thank you for your comments. Attached is an updated patch.
>> - add TargetTransformInfo to R600 .
>> - add hasBranchDivergence to query whether a target has branch divergence.
>> - Invoke branch reduction opts only when the underlying target has
>> branch divergence. So currently it only gets triggered for R600.
>> - fixed a bug in opt.cpp
>> - correct clang-format issues.
>>
>> I have tested the correctness on x86 (by adding branch divergence to
>> X86TargetTransformInfo in my local workspace) using existing testing
>> infrastructure and CPU2006, CPU2000. Tom Stellard agrees to run R600
>> testings (thanks a lot, Tom).
>>
>> -Mei
>>
>>
>> Index: test/Transforms/SimplifyCFG/R600/lit.local.cfg
>> ===================================================================
>> --- test/Transforms/SimplifyCFG/R600/lit.local.cfg(revision 0)
>> +++ test/Transforms/SimplifyCFG/R600/lit.local.cfg(revision 0)
>> @@ -0,0 +1,6 @@
>> +config.suffixes = ['.ll', '.c', '.cpp']
>> +
>> +targets = set(config.root.targets_to_build.split())
>> +if not 'R600' in targets:
>> + config.unsupported = True
>> +
>> Index: test/Transforms/SimplifyCFG/R600/parallelorifcollapse.ll
>> ===================================================================
>> --- test/Transforms/SimplifyCFG/R600/parallelorifcollapse.ll(revision 0)
>> +++ test/Transforms/SimplifyCFG/R600/parallelorifcollapse.ll(revision 0)
>> @@ -0,0 +1,51 @@
>> +; Function Attrs: nounwind
>> +; RUN: opt< %s -mtriple=r600-unknown-linux-gnu -simplifycfg -basicaa
>> -S | FileCheck %s
>> +; CHECK: or i1
>> +; CHECK-NEXT: br
>> +; CHECK: br
>> +; CHECK: ret
>>
>> It's not clear to me what the expected output of this test is supposed
>> to be. Could you add a comment explaining what the simplifycfg pass is
>> supposed to be doing here.
>>
>>
>> +define void @_Z9chk1D_512v() #0 {
>> +entry:
>> + %a0 = alloca i32, align 4
>> + %b0 = alloca i32, align 4
>> + %c0 = alloca i32, align 4
>> + %d0 = alloca i32, align 4
>> + %a1 = alloca i32, align 4
>> + %b1 = alloca i32, align 4
>> + %c1 = alloca i32, align 4
>> + %d1 = alloca i32, align 4
>> + %data = alloca i32, align 4
>> + %0 = load i32* %a0, align 4
>> + %1 = load i32* %b0, align 4
>> + %cmp = icmp ne i32 %0, %1
>> + br i1 %cmp, label %land.lhs.true, label %if.end
>> +
>> +land.lhs.true: ; preds = %entry
>> + %2 = load i32* %c0, align 4
>> + %3 = load i32* %d0, align 4
>> + %cmp1 = icmp ne i32 %2, %3
>> + br i1 %cmp1, label %if.then, label %if.end
>> +
>> +if.then: ; preds = %land.lhs.true
>> + store i32 1, i32* %data, align 4
>> + br label %if.end
>> +
>> +if.end: ; preds = %if.then, %land.lhs.true, %entry
>> + %4 = load i32* %a1, align 4
>> + %5 = load i32* %b1, align 4
>> + %cmp2 = icmp ne i32 %4, %5
>> + br i1 %cmp2, label %land.lhs.true3, label %if.end6
>> +
>> +land.lhs.true3: ; preds = %if.end
>> + %6 = load i32* %c1, align 4
>> + %7 = load i32* %d1, align 4
>> + %cmp4 = icmp ne i32 %6, %7
>> + br i1 %cmp4, label %if.then5, label %if.end6
>> +
>> +if.then5: ; preds = %land.lhs.true3
>> + store i32 1, i32* %data, align 4
>> + br label %if.end6
>> +
>> +if.end6: ; preds = %if.then5, %land.lhs.true3, %if.end
>> + ret void
>> +}
>> Index: test/Transforms/SimplifyCFG/R600/parallelandifcollapse.ll
>> ===================================================================
>> --- test/Transforms/SimplifyCFG/R600/parallelandifcollapse.ll(revision 0)
>> +++ test/Transforms/SimplifyCFG/R600/parallelandifcollapse.ll(revision 0)
>> @@ -0,0 +1,58 @@
>> +; Function Attrs: nounwind
>> +; RUN: opt< %s -mtriple=r600-unknown-linux-gnu -simplifycfg -basicaa
>> -S | FileCheck %s
>> +; CHECK: or i1
>> +; CHECK-NEXT: br
>> +; CHECK: br
>> +; CHECK: ret
>>
>> Same thing here, a comment explaining what the expected output should be
>> would be helpful.
>>
>>
>> +define void @_Z9chk1D_512v() #0 {
>> +entry:
>> + %a0 = alloca i32, align 4
>> + %b0 = alloca i32, align 4
>> + %c0 = alloca i32, align 4
>> + %d0 = alloca i32, align 4
>> + %a1 = alloca i32, align 4
>> + %b1 = alloca i32, align 4
>> + %c1 = alloca i32, align 4
>> + %d1 = alloca i32, align 4
>> + %data = alloca i32, align 4
>> + %0 = load i32* %a0, align 4
>> + %1 = load i32* %b0, align 4
>> + %cmp = icmp ne i32 %0, %1
>> + br i1 %cmp, label %land.lhs.true, label %if.else
>> +
>> +land.lhs.true: ; preds = %entry
>> + %2 = load i32* %c0, align 4
>> + %3 = load i32* %d0, align 4
>> + %cmp1 = icmp ne i32 %2, %3
>> + br i1 %cmp1, label %if.then, label %if.else
>> +
>> +if.then: ; preds = %land.lhs.true
>> + br label %if.end
>> +
>> +if.else: ; preds = %land.lhs.true, %entry
>> + store i32 1, i32* %data, align 4
>> + br label %if.end
>> +
>> +if.end: ; preds = %if.else, %if.then
>> + %4 = load i32* %a1, align 4
>> + %5 = load i32* %b1, align 4
>> + %cmp2 = icmp ne i32 %4, %5
>> + br i1 %cmp2, label %land.lhs.true3, label %if.else6
>> +
>> +land.lhs.true3: ; preds = %if.end
>> + %6 = load i32* %c1, align 4
>> + %7 = load i32* %d1, align 4
>> + %cmp4 = icmp ne i32 %6, %7
>> + br i1 %cmp4, label %if.then5, label %if.else6
>> +
>> +if.then5: ; preds = %land.lhs.true3
>> + br label %if.end7
>> +
>> +if.else6: ; preds = %land.lhs.true3, %if.end
>> + store i32 1, i32* %data, align 4
>> + br label %if.end7
>> +
>> +if.end7: ; preds = %if.else6, %if.then5
>> + ret void
>> +}
>> +
>> Index: include/llvm/Analysis/TargetTransformInfo.h
>> ===================================================================
>> --- include/llvm/Analysis/TargetTransformInfo.h(revision 183763)
>> +++ include/llvm/Analysis/TargetTransformInfo.h(working copy)
>> @@ -171,6 +171,9 @@
>> /// comments for a detailed explanation of the cost values.
>> virtual unsigned getUserCost(const User *U) const;
>>
>> + /// \brief hasBranchDivergence - Return true if branch divergence
>> exists.
>> + virtual bool hasBranchDivergence() const;
>> +
>>
>> What exactly is branch divergence? Could you explain this more in
>> the comment and give an example.
>>
>>
>> /// \brief Test whether calls to a function lower to actual program
>> function
>> /// calls.
>> ///
>> Index: include/llvm/Transforms/Utils/Local.h
>> ===================================================================
>> --- include/llvm/Transforms/Utils/Local.h(revision 184607)
>> +++ include/llvm/Transforms/Utils/Local.h(working copy)
>> @@ -39,6 +39,7 @@
>> class TargetLibraryInfo;
>> class TargetTransformInfo;
>> class DIBuilder;
>> +class AliasAnalysis;
>>
>> template<typename T> class SmallVectorImpl;
>>
>> @@ -136,7 +137,7 @@
>> /// the basic block that was pointed to.
>> ///
>> bool SimplifyCFG(BasicBlock *BB, const TargetTransformInfo&TTI,
>> - const DataLayout *TD = 0);
>> + const DataLayout *TD = 0, AliasAnalysis * AA = 0);
>>
>> /// FoldBranchToCommonDest - If this basic block is ONLY a setcc and a
>> branch,
>> /// and if a predecessor branches to us and one of our successors,
>> fold the
>> Index: tools/opt/opt.cpp
>> ===================================================================
>> --- tools/opt/opt.cpp(revision 183763)
>> +++ tools/opt/opt.cpp(working copy)
>> @@ -668,6 +668,9 @@
>> FPasses.reset(new FunctionPassManager(M.get()));
>> if (TD)
>> FPasses->add(new DataLayout(*TD));
>> + if (TM.get())
>> + TM->addAnalysisPasses(*FPasses);
>> +
>> }
>>
>> if (PrintBreakpoints) {
>> Index: lib/Analysis/TargetTransformInfo.cpp
>> ===================================================================
>> --- lib/Analysis/TargetTransformInfo.cpp(revision 183763)
>> +++ lib/Analysis/TargetTransformInfo.cpp(working copy)
>> @@ -88,6 +88,8 @@
>> return PrevTTI->getUserCost(U);
>> }
>>
>> +bool TargetTransformInfo::hasBranchDivergence() const { return false; }
>> +
>> bool TargetTransformInfo::isLoweredToCall(const Function *F) const {
>> return PrevTTI->isLoweredToCall(F);
>> }
>> Index: lib/Target/R600/AMDGPUTargetMachine.cpp
>> ===================================================================
>> --- lib/Target/R600/AMDGPUTargetMachine.cpp(revision 183763)
>> +++ lib/Target/R600/AMDGPUTargetMachine.cpp(working copy)
>> @@ -105,6 +105,18 @@
>> return new AMDGPUPassConfig(this, PM);
>> }
>>
>> +//===----------------------------------------------------------------------===//
>> +// AMDGPU Analysis Pass Setup
>> +//===----------------------------------------------------------------------===//
>> +
>> +void AMDGPUTargetMachine::addAnalysisPasses(PassManagerBase&PM) {
>> + // Add first the target-independent BasicTTI pass, then our AMDGPU
>> pass. This
>> + // allows the AMDGPU pass to delegate to the target independent
>> layer when
>> + // appropriate.
>> + PM.add(createBasicTargetTransformInfoPass(getTargetLowering()));
>> + PM.add(createAMDGPUTargetTransformInfoPass(this));
>>
>> I'm a little confused about why we need to add a
>> BasicTargetTransformInfo and
>> also an AMDGPUTargetTransformInfo. What is the reason for this?
>>
>>
>> +}
>> +
>> bool
>> AMDGPUPassConfig::addPreISel() {
>> const AMDGPUSubtarget&ST = TM->getSubtarget<AMDGPUSubtarget>();
>> Index: lib/Target/R600/AMDGPUTargetTransformInfo.cpp
>> ===================================================================
>> --- lib/Target/R600/AMDGPUTargetTransformInfo.cpp(revision 0)
>> +++ lib/Target/R600/AMDGPUTargetTransformInfo.cpp(revision 0)
>>
>> It looks like you forgot to add this new file to CMakeLists.txt.
>>
>>
>> @@ -0,0 +1,90 @@
>> +//===-- AMDGPUTargetTransformInfo.cpp - AMDGPU specific TTI pass
>> +//----------------===//
>> +//
>> +// The LLVM Compiler Infrastructure
>> +//
>> +// This file is distributed under the University of Illinois Open Source
>> +// License. See LICENSE.TXT for details.
>> +//
>> +//===----------------------------------------------------------------------===//
>> +/// \file
>> +/// This file implements a TargetTransformInfo analysis pass specific
>> to the
>> +/// AMDGPU target machine. It uses the target's detailed information
>> to provide
>> +/// more precise answers to certain TTI queries, while letting the target
>> +/// independent and default TTI implementations handle the rest.
>> +///
>> +//===----------------------------------------------------------------------===//
>> +
>> +#define DEBUG_TYPE "AMDGPUtti"
>> +#include "AMDGPU.h"
>> +#include "AMDGPUTargetMachine.h"
>> +#include "llvm/Analysis/TargetTransformInfo.h"
>> +#include "llvm/Support/Debug.h"
>> +#include "llvm/Target/TargetLowering.h"
>> +#include "llvm/Target/CostTable.h"
>> +using namespace llvm;
>> +
>> +// Declare the pass initialization routine locally as target-specific
>> passes
>> +// don't have a target-wide initialization entry point, and so we
>> rely on the
>> +// pass constructor initialization.
>> +namespace llvm {
>> +void initializeAMDGPUTTIPass(PassRegistry&);
>> +}
>> +
>> +namespace {
>> +
>> +class AMDGPUTTI : public ImmutablePass, public TargetTransformInfo {
>> + const AMDGPUTargetMachine *TM;
>> + const AMDGPUSubtarget *ST;
>> + const AMDGPUTargetLowering *TLI;
>> +
>> + /// Estimate the overhead of scalarizing an instruction. Insert and
>> Extract
>> + /// are set if the result needs to be inserted and/or extracted from
>> vectors.
>> + unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool
>> Extract) const;
>> +
>> +public:
>> + AMDGPUTTI() : ImmutablePass(ID), TM(0), ST(0), TLI(0) {
>> + llvm_unreachable("This pass cannot be directly constructed");
>> + }
>> +
>> + AMDGPUTTI(const AMDGPUTargetMachine *TM)
>> + : ImmutablePass(ID), TM(TM), ST(TM->getSubtargetImpl()),
>> + TLI(TM->getTargetLowering()) {
>> + initializeAMDGPUTTIPass(*PassRegistry::getPassRegistry());
>> + }
>> +
>> + virtual void initializePass() { pushTTIStack(this); }
>> +
>> + virtual void finalizePass() { popTTIStack(); }
>> +
>> + virtual void getAnalysisUsage(AnalysisUsage&AU) const {
>> + TargetTransformInfo::getAnalysisUsage(AU);
>> + }
>> +
>> + /// Pass identification.
>> + static char ID;
>> +
>> + /// Provide necessary pointer adjustments for the two base classes.
>> + virtual void *getAdjustedAnalysisPointer(const void *ID) {
>> + if (ID ==&TargetTransformInfo::ID)
>> + return (TargetTransformInfo *)this;
>> + return this;
>> + }
>> +
>> + virtual bool hasBranchDivergence() const;
>> +
>> + /// @}
>> +};
>> +
>> +} // end anonymous namespace
>> +
>> +INITIALIZE_AG_PASS(AMDGPUTTI, TargetTransformInfo, "AMDGPUtti",
>> + "AMDGPU Target Transform Info", true, true, false)
>> +char AMDGPUTTI::ID = 0;
>> +
>> +ImmutablePass *
>> +llvm::createAMDGPUTargetTransformInfoPass(const AMDGPUTargetMachine
>> *TM) {
>> + return new AMDGPUTTI(TM);
>> +}
>> +
>> +bool AMDGPUTTI::hasBranchDivergence() const { return true; }
>> Index: lib/Target/R600/AMDGPUTargetMachine.h
>> ===================================================================
>> --- lib/Target/R600/AMDGPUTargetMachine.h(revision 183763)
>> +++ lib/Target/R600/AMDGPUTargetMachine.h(working copy)
>> @@ -63,6 +63,9 @@
>> }
>> virtual const DataLayout *getDataLayout() const { return&Layout; }
>> virtual TargetPassConfig *createPassConfig(PassManagerBase&PM);
>> +
>> + /// \brief Register R600 analysis passes with a pass manager.
>> + virtual void addAnalysisPasses(PassManagerBase&PM);
>> };
>>
>> } // End namespace llvm
>> Index: lib/Target/R600/AMDGPU.h
>> ===================================================================
>> --- lib/Target/R600/AMDGPU.h(revision 183763)
>> +++ lib/Target/R600/AMDGPU.h(working copy)
>> @@ -46,6 +46,10 @@
>> FunctionPass *createAMDGPUIndirectAddressingPass(TargetMachine&tm);
>> FunctionPass *createAMDGPUISelDag(TargetMachine&tm);
>>
>> +/// \brief Creates an AMDGPU-specific Target Transformation Info pass.
>> +ImmutablePass *
>> + createAMDGPUTargetTransformInfoPass(const AMDGPUTargetMachine *TM);
>> +
>> extern Target TheAMDGPUTarget;
>>
>> } // End namespace llvm
>> Index: lib/Transforms/Utils/SimplifyCFG.cpp
>> ===================================================================
>> --- lib/Transforms/Utils/SimplifyCFG.cpp(revision 184607)
>> +++ lib/Transforms/Utils/SimplifyCFG.cpp(working copy)
>> @@ -17,8 +17,10 @@
>> #include "llvm/ADT/STLExtras.h"
>> #include "llvm/ADT/SetVector.h"
>> #include "llvm/ADT/SmallPtrSet.h"
>> +#include "llvm/ADT/SmallSet.h"
>> #include "llvm/ADT/SmallVector.h"
>> #include "llvm/ADT/Statistic.h"
>> +#include "llvm/Analysis/AliasAnalysis.h"
>> #include "llvm/Analysis/InstructionSimplify.h"
>> #include "llvm/Analysis/TargetTransformInfo.h"
>> #include "llvm/Analysis/ValueTracking.h"
>> @@ -63,6 +65,10 @@
>> HoistCondStores("simplifycfg-hoist-cond-stores", cl::Hidden,
>> cl::init(true),
>> cl::desc("Hoist conditional stores if an unconditional store preceeds"));
>>
>> +static cl::opt<bool>
>> + ParallelAndOr("simplifycfg-parallel-and-or", cl::Hidden, cl::init(true),
>> + cl::desc("Use parallel-and-or mode for branch conditions"));
>> +
>> STATISTIC(NumBitMaps, "Number of switch instructions turned into
>> bitmaps");
>> STATISTIC(NumLookupTables, "Number of switch instructions turned into
>> lookup tables");
>> STATISTIC(NumSinkCommons, "Number of common instructions sunk down to
>> the end block");
>> @@ -88,6 +94,7 @@
>> class SimplifyCFGOpt {
>> const TargetTransformInfo&TTI;
>> const DataLayout *const TD;
>> + AliasAnalysis *AA;
>>
>> Value *isValueEqualityComparison(TerminatorInst *TI);
>> BasicBlock *GetValueEqualityComparisonCases(TerminatorInst *TI,
>> @@ -105,10 +112,25 @@
>> bool SimplifyIndirectBr(IndirectBrInst *IBI);
>> bool SimplifyUncondBranch(BranchInst *BI, IRBuilder<> &Builder);
>> bool SimplifyCondBranch(BranchInst *BI, IRBuilder<>&Builder);
>> + /// \brief Use parallel-and or parallel-or to generate conditions for
>> + /// conditional branches.
>> + bool SimplifyParallelAndOr(BasicBlock *BB, IRBuilder<> &Builder,
>> Pass *P = 0);
>> + /// \brief If \param BB is the merge block of an if-region, attempt
>> to merge
>> + /// the if-region with an adjacent if-region upstream if two if-regions
>> + /// contain identical instructions.
>> + bool MergeIfRegion(BasicBlock *BB, IRBuilder<> &Builder, Pass *P = 0);
>> + /// \brief Compare a pair of blocks: \param Block1 and \param
>> Block2, which
>> + /// are from two if-regions whose head blocks are \param Head1 and
>> \param
>> + /// Head2. \returns true if \param Block1 and \param Block2 contain
>> identical
>> + /// instructions, and none of the instructions alias with \param Head2.
>> + /// This is used as a legality check for merging if-regions.
>> + bool CompareBlock(BasicBlock *Head1, BasicBlock *Head2, BasicBlock
>> *Block1,
>> + BasicBlock *Block2);
>>
>> public:
>> - SimplifyCFGOpt(const TargetTransformInfo&TTI, const DataLayout *TD)
>> - : TTI(TTI), TD(TD) {}
>> + SimplifyCFGOpt(const TargetTransformInfo&TTI, const DataLayout *TD,
>> + AliasAnalysis *aa)
>> + : TTI(TTI), TD(TD), AA(aa) {}
>> bool run(BasicBlock *BB);
>> };
>> }
>> @@ -195,8 +217,8 @@
>> }
>>
>>
>> -/// GetIfCondition - Given a basic block (BB) with two predecessors
>> (and at
>> -/// least one PHI node in it), check to see if the merge at this
>> block is due
>> +/// GetIfCondition - Given a basic block (BB) with two predecessors,
>> +/// check to see if the merge at this block is due
>> /// to an "if condition". If so, return the boolean condition that
>> determines
>> /// which entry into BB will be taken. Also, return by references the
>> block
>> /// that will be entered from if the condition is true, and the block
>> that will
>> @@ -206,12 +228,31 @@
>> /// instructions in them.
>> static Value *GetIfCondition(BasicBlock *BB, BasicBlock *&IfTrue,
>> BasicBlock *&IfFalse) {
>> - PHINode *SomePHI = cast<PHINode>(BB->begin());
>> - assert(SomePHI->getNumIncomingValues() == 2&&
>> - "Function can only handle blocks with 2 predecessors!");
>> - BasicBlock *Pred1 = SomePHI->getIncomingBlock(0);
>> - BasicBlock *Pred2 = SomePHI->getIncomingBlock(1);
>> + PHINode *SomePHI = dyn_cast<PHINode>(BB->begin());
>> + BasicBlock *Pred1 = NULL;
>> + BasicBlock *Pred2 = NULL;
>>
>> + if (SomePHI) {
>> + if (SomePHI->getNumIncomingValues() != 2)
>> + return NULL;
>> + Pred1 = SomePHI->getIncomingBlock(0);
>> + Pred2 = SomePHI->getIncomingBlock(1);
>> + } else {
>> + SmallSetVector<BasicBlock *, 16> Preds(pred_begin(BB), pred_end(BB));
>> + for (SmallSetVector<BasicBlock *, 16>::iterator PI = Preds.begin(),
>> + PE = Preds.end();
>> + PI != PE; ++PI) {
>> + if (Pred1 == NULL)
>> + Pred1 = *PI;
>> + else if (Pred2 == NULL)
>> + Pred2 = *PI;
>> + else
>> + return NULL;
>> + }
>> + if (!Pred1 || !Pred2)
>> + return NULL;
>> + }
>> +
>> // We can only handle branches. Other control flow will be lowered to
>> // branches if possible anyway.
>> BranchInst *Pred1Br = dyn_cast<BranchInst>(Pred1->getTerminator());
>> @@ -4039,6 +4080,402 @@
>> return false;
>> }
>>
>> +/// If \param [in] BB has more than one predecessor that is a conditional
>> +/// branch, attempt to use parallel and/or for the branch condition.
>> \returns
>> +/// true on success.
>> +///
>> +/// Before:
>> +/// ......
>> +/// %cmp10 = fcmp une float %tmp1, %tmp2
>> +/// br i1 %cmp1, label %if.then, label %lor.rhs
>> +///
>> +/// lor.rhs:
>> +/// ......
>> +/// %cmp11 = fcmp une float %tmp3, %tmp4
>> +/// br i1 %cmp11, label %if.then, label %ifend
>> +///
>> +/// if.end: // the merge block
>> +/// ......
>> +///
>> +/// if.then: // has two predecessors, both of them contains
>> conditional branch.
>> +/// ......
>> +/// br label %if.end;
>> +///
>> +/// After:
>> +/// ......
>> +/// %cmp10 = fcmp une float %tmp1, %tmp2
>> +/// ......
>> +/// %cmp11 = fcmp une float %tmp3, %tmp4
>> +/// %cmp12 = or i1 %cmp10, %cmp11 // parallel-or mode.
>> +/// br i1 %cmp12, label %if.then, label %ifend
>> +///
>> +/// if.end:
>> +/// ......
>> +///
>> +/// if.then:
>> +/// ......
>> +/// br label %if.end;
>> +///
>> +/// Current implementation handles two cases.
>> +/// Case 1: \param BB is on the else-path.
>> +///
>> +/// BB1
>> +/// / |
>> +/// BB2 |
>> +/// / \ |
>> +/// BB3 \ | where, BB1, BB2 contain conditional branches.
>> +/// \ | / BB3 contains unconditional branch.
>> +/// \ | / BB4 corresponds to \param BB which is also the merge.
>> +/// BB => BB4
>> +///
>> +///
>> +/// Corresponding source code:
>> +///
>> +/// if (a == b&& c == d)
>> +/// statement; // BB3
>> +///
>> +/// Case 2: \param BB BB is on the then-path.
>> +///
>> +/// BB1
>> +/// / |
>> +/// | BB2
>> +/// \ / | where BB1, BB2 contain conditional branches.
>> +/// BB => BB3 | BB3 contains unconditiona branch and corresponds
>> +/// \ / to \param BB. BB4 is the merge.
>> +/// BB4
>> +///
>> +/// Corresponding source code:
>> +///
>> +/// if (a == b || c == d)
>> +/// statement; // BB3
>> +///
>> +/// In both cases, \param BB is the common successor of conditional
>> branches.
>> +/// In Case 1, \param BB (BB4) has an unconditional branch (BB3) as
>> +/// its predecessor. In Case 2, \param BB (BB3) only has conditional
>> branches
>> +/// as its predecessors.
>> +///
>> +bool SimplifyCFGOpt::SimplifyParallelAndOr(BasicBlock *BB,
>> IRBuilder<> &Builder,
>> + Pass *P) {
>> + PHINode *PHI = dyn_cast<PHINode>(&BB->front());
>> + if (PHI)
>> + return false; // For simplicity, avoid cases containing PHI nodes.
>> +
>> + BasicBlock *LCond = NULL;
>> + BasicBlock *FCond = NULL;
>> + BasicBlock *UCond = NULL;
>> + int Idx = -1;
>> +
>> + SmallSetVector<BasicBlock *, 16> Preds(pred_begin(BB), pred_end(BB));
>> + // Check predecessors of \param BB.
>> + for (SmallSetVector<BasicBlock *, 16>::iterator PI = Preds.begin(),
>> + PE = Preds.end();
>> + PI != PE; ++PI) {
>> + BasicBlock *Pred = *PI;
>> + TerminatorInst *PTI = Pred->getTerminator();
>> + BranchInst *PBI = dyn_cast<BranchInst>(PTI);
>> +
>> + // All predecessors should terminate with a branch.
>> + if (!PBI)
>> + return false;
>> +
>> + BasicBlock *PP = Pred->getSinglePredecessor();
>> +
>> + if (PBI->isUnconditional()) {
>> + // Case 1: Pred (BB3) is an unconditional block, it should
>> + // have a single successor and a single predecessor (BB2) that
>> + // is also a predecessor of \param BB (BB4) and should not have
>> + // address-taken. There should exist only one such unconditional
>> + // branch among the predecessors.
>> + if (UCond || !PP || (Preds.count(PP) == 0) ||
>> + (PTI->getNumSuccessors() != 1) || Pred->hasAddressTaken())
>> + return false;
>> +
>> + UCond = Pred;
>> + continue;
>> + }
>> +
>> + // Only conditional branches are allowed beyond this point.
>> + if (!PBI->isConditional())
>> + return false;
>> +
>> + // Condition's unique use should be the branch instruction.
>> + Value *PC = PBI->getCondition();
>> + if (!PC || (PC->getNumUses() != 1))
>> + return false;
>> +
>> + if (PP&& (Preds.count(PP) != 0)) {
>> + // These are internal condition blocks to be merged from, e.g.,
>> + // BB2 in both cases.
>> + // Should not be address-taken.
>> + if (Pred->hasAddressTaken())
>> + return false;
>> +
>> + // Instructions in the internal condition blocks should be safe
>> + // to hoist up.
>> + for (BasicBlock::iterator BI = Pred->begin(), BE = PBI; BI != BE;) {
>> + Instruction *CI = BI++;
>> + if (isa<PHINode>(CI) || CI->mayHaveSideEffects() ||
>> + !isSafeToSpeculativelyExecute(CI))
>> + return false;
>> + }
>> + } else {
>> + // This is the condition block to be merged into, e.g. BB1 in
>> + // both cases.
>> + if (FCond)
>> + return false;
>> + FCond = Pred;
>> + }
>> +
>> + // The terminator must have exactly two successors.
>> + if (PTI->getNumSuccessors() != 2)
>> + return false;
>> +
>> + // Find whether BB is uniformly on the true (or false) path
>> + // for all of its predecessors.
>> + BasicBlock *PS1 = PTI->getSuccessor(0);
>> + BasicBlock *PS2 = PTI->getSuccessor(1);
>> + BasicBlock *PS = (PS1 == BB) ? PS2 : PS1;
>> + int CIdx = (PS1 == BB) ? 0 : 1;
>> +
>> + if (Idx == -1)
>> + Idx = CIdx;
>> + else if (CIdx != Idx)
>> + return false;
>> +
>> + // PS is the successor which is not BB. Check successors to identify
>> + // the last conditional branch.
>> + if (Preds.count(PS) == 0) {
>> + // Case 2.
>> + // BB must have an unique successor.
>> + TerminatorInst *TBB = BB->getTerminator();
>> + if (TBB->getNumSuccessors() != 1)
>> + return false;
>> +
>> + BasicBlock *SBB = TBB->getSuccessor(0);
>> + PHI = dyn_cast<PHINode>(&SBB->front());
>> + if (PHI)
>> + return false;
>> +
>> + // PS (BB4) should be BB's successor.
>> + if (SBB != PS)
>> + return false;
>> + LCond = Pred;
>> + } else {
>> + TerminatorInst *TPS = PS->getTerminator();
>> + BranchInst *BPS = dyn_cast<BranchInst>(TPS);
>> + if (BPS&& BPS->isUnconditional()) {
>> + // Case 1: PS(BB3) should be an unconditional branch.
>> + LCond = Pred;
>> + }
>> + }
>> + }
>> +
>> + if (FCond&& LCond&& (FCond != LCond)) {
>> + // Do the transformation.
>> + BasicBlock *CB;
>> + bool ITER = true;
>> + BasicBlock::iterator ItOld = Builder.GetInsertPoint();
>> + TerminatorInst *PTI = FCond->getTerminator();
>> + BranchInst *PBI = dyn_cast<BranchInst>(PTI);
>> + Value *PC = PBI->getCondition();
>> + do {
>> + CB = PBI->getSuccessor(1 - Idx);
>> + // Delete the conditional branch.
>> + FCond->getInstList().pop_back();
>> + FCond->getInstList().splice(FCond->end(), CB->getInstList());
>> + PTI = FCond->getTerminator();
>> + PBI = dyn_cast<BranchInst>(PTI);
>> + Value *CC = PBI->getCondition();
>> + // Merge conditions.
>> + Builder.SetInsertPoint(PTI);
>> + Value *NC;
>> + if (Idx == 0)
>> + // Case 2, use parallel or.
>> + NC = Builder.CreateOr(PC, CC);
>> + else
>> + // Case 1, use parallel and.
>> + NC = Builder.CreateAnd(PC, CC);
>> +
>> + PBI->replaceUsesOfWith(CC, NC);
>> + PC = NC;
>> + if (CB == LCond)
>> + ITER = false;
>> + // Remove internal conditional branches.
>> + CB->dropAllReferences();
>> + // make CB unreachable and let downstream to delete the block.
>> + new UnreachableInst(CB->getContext(), CB);
>> + } while (ITER);
>> +
>> + Builder.SetInsertPoint(ItOld);
>> + DEBUG(dbgs()<< "Use parallel and/or in:\n"<< *FCond);
>> + return true;
>> + }
>> +
>> + return false;
>> +}
>> +
>> +/// Compare blocks from two if-regions, where \param Head1 is the
>> head of the
>> +/// 1st if-region. \param Head2 is the head of the 2nd if-region. \param
>> +/// Block1 is a block in the 1st if-region to compare. \param Block2
>> is a block
>> +// in the 2nd if-region to compare. \returns true if the blocks have
>> identical
>> +/// instructions that do not alias with \param Head2 and it is legal
>> to merge
>> +/// the two blocks so that only one instance of each instruction is kept.
>> +///
>> +bool SimplifyCFGOpt::CompareBlock(BasicBlock *Head1, BasicBlock *Head2,
>> + BasicBlock *Block1, BasicBlock *Block2) {
>> + TerminatorInst *PTI2 = Head2->getTerminator();
>> + Instruction *PBI2 = Head2->begin();
>> +
>> + if (Block1 == Head1) {
>> + if (Block2 != Head2)
>> + return false;
>> + } else if (Block2 == Head2)
>> + return false;
>> + else {
>> + // Check whether instructions in Block1 and Block2 are identical
>> + // and do not alias with instructions in Head2.
>> + BasicBlock::iterator iter1 = Block1->begin();
>> + BasicBlock::iterator end1 = Block1->getTerminator();
>> + BasicBlock::iterator iter2 = Block2->begin();
>> + BasicBlock::iterator end2 = Block2->getTerminator();
>> +
>> + while (1) {
>> + if (iter1 == end1) {
>> + if (iter2 != end2)
>> + return false;
>> + break;
>> + }
>> +
>> + if (!iter1->isIdenticalTo(iter2))
>> + return false;
>> +
>> + // Illegal to remove instructions with side effects except
>> + // non-volatile stores.
>> + if (iter1->mayHaveSideEffects()) {
>> + Instruction *CurI =&*iter1;
>> + StoreInst *SI = dyn_cast<StoreInst>(CurI);
>> + if (!SI || SI->isVolatile())
>> + return false;
>> + }
>> +
>> + // For simplicity and speed, data dependency check can be
>> + // avoided if read from memory doesn't exist.
>> + if (iter1->mayReadFromMemory())
>> + return false;
>> +
>> + if (iter1->mayWriteToMemory()) {
>> + for (BasicBlock::iterator BI = PBI2, BE = PTI2; BI != BE; ++BI) {
>> + if (BI->mayReadFromMemory() || BI->mayWriteToMemory()) {
>> + // Check alias with Head2.
>> + if (!AA || AA->alias(iter1, BI))
>> + return false;
>> + }
>> + }
>> + }
>> + ++iter1;
>> + ++iter2;
>> + }
>> + ;
>> + }
>> +
>> + return true;
>> +}
>> +
>> +/// Check whether \param BB is the merge block of a if-region. If
>> yes, check
>> +/// whether there exists an adjacent if-region upstream, the two
>> if-regions
>> +/// contain identical instuctions and can be legally merged. \returns
>> true if
>> +/// the two if-regions are merged.
>> +///
>> +/// From:
>> +/// if (a)
>> +/// statement;
>> +/// if (b)
>> +/// statement;
>> +///
>> +/// To:
>> +/// if (a || b)
>> +/// statement;
>> +///
>> +bool SimplifyCFGOpt::MergeIfRegion(BasicBlock *BB, IRBuilder<> &Builder,
>> + Pass *P) {
>> + BasicBlock *IfTrue2, *IfFalse2;
>> + Value *IfCond2 = GetIfCondition(BB, IfTrue2, IfFalse2);
>> + if (!IfCond2)
>> + return false;
>> +
>> + Instruction *CInst2 = dyn_cast<Instruction>(IfCond2);
>> + if (!CInst2)
>> + return false;
>> +
>> + BasicBlock *Head2 = CInst2->getParent();
>> + if (Head2->hasAddressTaken())
>> + return false;
>> +
>> + BasicBlock *IfTrue1, *IfFalse1;
>> + Value *IfCond1 = GetIfCondition(Head2, IfTrue1, IfFalse1);
>> + if (!IfCond1)
>> + return false;
>> +
>> + Instruction *CInst1 = dyn_cast<Instruction>(IfCond1);
>> + if (!CInst1)
>> + return false;
>> +
>> + BasicBlock *Head1 = CInst1->getParent();
>> +
>> + // Either then-path or else-path should be empty.
>> + if ((IfTrue1 != Head1)&& (IfFalse1 != Head1))
>> + return false;
>> + if ((IfTrue2 != Head2)&& (IfFalse2 != Head2))
>> + return false;
>> +
>> + TerminatorInst *PTI2 = Head2->getTerminator();
>> + Instruction *PBI2 = Head2->begin();
>> +
>> + if (!CompareBlock(Head1, Head2, IfTrue1, IfTrue2))
>> + return false;
>> +
>> + if (!CompareBlock(Head1, Head2, IfFalse1, IfFalse2))
>> + return false;
>> +
>> + // Check whether \param Head2 has side-effect and is safe to speculate.
>> + for (BasicBlock::iterator BI = PBI2, BE = PTI2; BI != BE; ++BI) {
>> + Instruction *CI = BI;
>> + if (isa<PHINode>(CI) || CI->mayHaveSideEffects() ||
>> + !isSafeToSpeculativelyExecute(CI))
>> + return false;
>> + }
>> +
>> + // Merge \param Head2 into \param Head1.
>> + Head1->getInstList().pop_back();
>> + Head1->getInstList().splice(Head1->end(), Head2->getInstList());
>> + TerminatorInst *PTI = Head1->getTerminator();
>> + BranchInst *PBI = dyn_cast<BranchInst>(PTI);
>> + Value *CC = PBI->getCondition();
>> + BasicBlock::iterator ItOld = Builder.GetInsertPoint();
>> + Builder.SetInsertPoint(PTI);
>> + Value *NC = Builder.CreateOr(CInst1, CC);
>> + PBI->replaceUsesOfWith(CC, NC);
>> + Builder.SetInsertPoint(ItOld);
>> +
>> + // Remove IfTrue1
>> + if (IfTrue1 != Head1) {
>> + IfTrue1->dropAllReferences();
>> + IfTrue1->eraseFromParent();
>> + }
>> +
>> + // Remove IfFalse1
>> + if (IfFalse1 != Head1) {
>> + IfFalse1->dropAllReferences();
>> + IfFalse1->eraseFromParent();
>> + }
>> +
>> + // Remove \param Head2
>> + Head2->dropAllReferences();
>> + Head2->eraseFromParent();
>> + DEBUG(dbgs()<< "If conditions merged into:\n"<< *Head1);
>> + return true;
>> +}
>> +
>> /// Check if passing a value to an instruction will cause undefined
>> behavior.
>> static bool passingValueIsAlwaysUndefined(Value *V, Instruction *I) {
>> Constant *C = dyn_cast<Constant>(V);
>> @@ -4142,6 +4579,10 @@
>>
>> IRBuilder<> Builder(BB);
>>
>> + if (ParallelAndOr&& TTI.hasBranchDivergence()&&
>> + SimplifyParallelAndOr(BB, Builder))
>> + return true;
>> +
>> // If there is a trivial two-entry PHI node in this basic block, and
>> we can
>> // eliminate it, do so now.
>> if (PHINode *PN = dyn_cast<PHINode>(BB->begin()))
>> @@ -4169,6 +4610,9 @@
>> if (SimplifyIndirectBr(IBI)) return true;
>> }
>>
>> + if (ParallelAndOr&& TTI.hasBranchDivergence()&& MergeIfRegion(BB,
>> Builder))
>> + return true;
>> +
>> return Changed;
>> }
>>
>> @@ -4178,6 +4622,6 @@
>> /// of the CFG. It returns true if a modification was made.
>> ///
>> bool llvm::SimplifyCFG(BasicBlock *BB, const TargetTransformInfo&TTI,
>> - const DataLayout *TD) {
>> - return SimplifyCFGOpt(TTI, TD).run(BB);
>> + const DataLayout *TD, AliasAnalysis *AA) {
>> + return SimplifyCFGOpt(TTI, TD, AA).run(BB);
>> }
>> Index: lib/Transforms/Scalar/SimplifyCFGPass.cpp
>> ===================================================================
>> --- lib/Transforms/Scalar/SimplifyCFGPass.cpp(revision 184607)
>> +++ lib/Transforms/Scalar/SimplifyCFGPass.cpp(working copy)
>> @@ -27,6 +27,7 @@
>> #include "llvm/ADT/SmallVector.h"
>> #include "llvm/ADT/Statistic.h"
>> #include "llvm/Analysis/TargetTransformInfo.h"
>> +#include "llvm/Analysis/AliasAnalysis.h"
>> #include "llvm/IR/Attributes.h"
>> #include "llvm/IR/Constants.h"
>> #include "llvm/IR/DataLayout.h"
>> @@ -49,10 +50,14 @@
>>
>> virtual bool runOnFunction(Function&F);
>>
>> - virtual void getAnalysisUsage(AnalysisUsage&AU) const {
>> - AU.addRequired<TargetTransformInfo>();
>> - }
>> - };
>> + virtual void getAnalysisUsage(AnalysisUsage&AU) const {
>> + AU.addRequired<TargetTransformInfo>();
>> + AU.addRequired<AliasAnalysis>();
>> + AU.addRequired<AliasAnalysis>();
>> + }
>> +private:
>> + AliasAnalysis *AA;
>> +};
>> }
>>
>> char CFGSimplifyPass::ID = 0;
>> @@ -301,7 +306,7 @@
>> /// iterativelySimplifyCFG - Call SimplifyCFG on all the blocks in the
>> function,
>> /// iterating until no more changes are made.
>> static bool iterativelySimplifyCFG(Function&F, const
>> TargetTransformInfo&TTI,
>> - const DataLayout *TD) {
>> + const DataLayout *TD, AliasAnalysis * AA) {
>> bool Changed = false;
>> bool LocalChange = true;
>> while (LocalChange) {
>> @@ -310,7 +315,7 @@
>> // Loop over all of the basic blocks and remove them if they are
>> unneeded...
>> //
>> for (Function::iterator BBIt = F.begin(); BBIt != F.end(); ) {
>> - if (SimplifyCFG(BBIt++, TTI, TD)) {
>> + if (SimplifyCFG(BBIt++, TTI, TD, AA)) {
>> LocalChange = true;
>> ++NumSimpl;
>> }
>> @@ -324,11 +329,12 @@
>> // simplify the CFG.
>> //
>> bool CFGSimplifyPass::runOnFunction(Function&F) {
>> + AA =&getAnalysis<AliasAnalysis>();
>> const TargetTransformInfo&TTI = getAnalysis<TargetTransformInfo>();
>> const DataLayout *TD = getAnalysisIfAvailable<DataLayout>();
>> bool EverChanged = removeUnreachableBlocksFromFn(F);
>> EverChanged |= mergeEmptyReturnBlocks(F);
>> - EverChanged |= iterativelySimplifyCFG(F, TTI, TD);
>> + EverChanged |= iterativelySimplifyCFG(F, TTI, TD, AA);
>>
>> // If neither pass changed anything, we're done.
>> if (!EverChanged) return false;
>> @@ -342,7 +348,7 @@
>> return true;
>>
>> do {
>> - EverChanged = iterativelySimplifyCFG(F, TTI, TD);
>> + EverChanged = iterativelySimplifyCFG(F, TTI, TD, AA);
>> EverChanged |= removeUnreachableBlocksFromFn(F);
>> } while (EverChanged);
>>
>> Index: lib/CodeGen/BasicTargetTransformInfo.cpp
>> ===================================================================
>> --- lib/CodeGen/BasicTargetTransformInfo.cpp(revision 183763)
>> +++ lib/CodeGen/BasicTargetTransformInfo.cpp(working copy)
>> @@ -63,6 +63,8 @@
>> return this;
>> }
>>
>> + virtual bool hasBranchDivergence() const;
>> +
>> /// \name Scalar TTI Implementations
>> /// @{
>>
>> @@ -122,6 +124,7 @@
>> return new BasicTTI(TLI);
>> }
>>
>> +bool BasicTTI::hasBranchDivergence() const { return false; }
>>
>> bool BasicTTI::isLegalAddImmediate(int64_t imm) const {
>> return TLI->isLegalAddImmediate(imm);
>>
>>
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>>
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> <odc_v5>
>


-------------- next part --------------
A non-text attachment was scrubbed...
Name: odc_v6
Type: application/octet-stream
Size: 39779 bytes
Desc: odc_v6
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130719/89561c60/attachment.obj>