[patch] simplifyCFG for review

Thu Jul 18 01:25:18 PDT 2013

Andrew Trick wrote:
>
> On Jul 17, 2013, at 11:07 AM, Ye, Mei <Mei.Ye at amd.com
> <mailto:Mei.Ye at amd.com>> wrote:
>
>> Hi Andrew
>> Attached is a patch that follows what you suggested. Thanks for your time.
>
> I’m ok with this patch now. Someone who can test it should commit. Maybe
> Tom?

--- lib/Target/R600/AMDGPUTargetTransformInfo.cpp	(revision 0)
+++ lib/Target/R600/AMDGPUTargetTransformInfo.cpp	(revision 0)
@@ -0,0 +1,90 @@
+//===-- AMDGPUTargetTransformInfo.cpp - AMDGPU specific TTI pass
+//----------------===//

Why is this two lines?

+//===----------------------------------------------------------------------===//
+/// \file
+/// This file implements a TargetTransformInfo analysis pass specific 
to the

Two slashes for these comments, and a blank //-only line after the ruler.

+  /// \brief Compare a pair of blocks: \param Block1 and \param Block2, 
which
+  /// are from two if-regions whose head blocks are \param Head1 and \param
+  /// Head2.  \returns true if \param Block1 and \param Block2 contain 
identical
+  /// instructions, and none of the instructions alias with \param Head2.
+  /// This is used as a legality check for merging if-regions.

Despite reading this, I still don't understand it. It compares Block1 
with Block2. What does "equal" mean in this context? What is a "head 
block" and why does it matter? You say instructions alias, you mean

+  SimplifyCFGOpt(const TargetTransformInfo &TTI, const DataLayout *TD,
+                 AliasAnalysis *aa)
+      : TTI(TTI), TD(TD), AA(aa) {}

Call it 'AA' not 'aa'. Follow the existing pattern.

+    SmallSetVector<BasicBlock *, 16> Preds(pred_begin(BB), pred_end(BB));
+    for (SmallSetVector<BasicBlock *, 16>::iterator PI = Preds.begin(),
+                                                    PE = Preds.end();
+         PI != PE; ++PI) {
+      if (Pred1 == NULL)
+        Pred1 = *PI;
+      else if (Pred2 == NULL)
+        Pred2 = *PI;
+      else
+        return NULL;
+    }

This is inefficient. Remove 'Preds', make the loop iterate from 
pred_begin(BB) to pred_end(BB). Add "if (*PI == Pred1 || *PI == Pred2) 
continue;" to the top of the loop.

+    if (!Pred1 || !Pred2)
+       return NULL;

Fix indent before 'return'.

+  PHINode *PHI = dyn_cast<PHINode>(&BB->front());

You can use "BB->begin()" instead of "&BB->front()".

+  SmallSetVector<BasicBlock *, 16> Preds(pred_begin(BB), pred_end(BB));
+  // Check predecessors of \param BB.
+  for (SmallSetVector<BasicBlock *, 16>::iterator PI = Preds.begin(),
+                                                  PE = Preds.end();
+       PI != PE; ++PI) {

This is inefficient. You can use SmallPtrSet for this. You don't need to 
preserve the iteration order of pred_begin to pred_end, and SmallPtrSet 
gives you iterators (unlike SmallSet).

+    TerminatorInst *PTI = Pred->getTerminator();
+    BranchInst *PBI = dyn_cast<BranchInst>(PTI);

Fold them together:
   BranchInst *PBI = dyn_cast<BranchInst>(Pred->getTerminator());

+    // Only conditional branches are allowed beyond this point.
+    if (!PBI->isConditional())
+      return false;

Remove it or assert it. This condition never happens. Also, you wrote it 
as "PBI->isUnconditional" above, but "!PBI->isConditional" here?

+    if (!PC || (PC->getNumUses() != 1))

This is inefficient. Use !PC-hasOneUse() to avoid walking the linked 
list of uses after we know there is more than one entry.

+    if (PP && (Preds.count(PP) != 0)) {

This is redundant. Preds can't contain any NULL pointer entries.

+      PHI = dyn_cast<PHINode>(&SBB->front());

Again, SBB->begin().

+      TerminatorInst *TPS = PS->getTerminator();
+      BranchInst *BPS = dyn_cast<BranchInst>(TPS);
+      if (BPS && BPS->isUnconditional()) {

At least merge TPS into the dyn_cast<...>(TPS). You could also write
   if (BranchInst *BPS = dyn_cast<BranchInst>(PS->getTerminator())
     if (BPS->isUnconditional())
       ...

+  if (Block1 == Head1) {
+    if (Block2 != Head2)
+      return false;
+  } else if (Block2 == Head2)
+    return false;
+  else {

So if Block1 == Head1 and Block2 == Head2 this function does nothing?

Can you reword the above without the elses following returns? See 
http://llvm.org/docs/CodingStandards.html#don-t-use-else-after-a-return .

+    }
+    ;
+  }

Spurious semi-colon.

+  Value *IfCond2 = GetIfCondition(BB, IfTrue2, IfFalse2);
+  if (!IfCond2)
+    return false;
+
+  Instruction *CInst2 = dyn_cast<Instruction>(IfCond2);
+  if (!CInst2)
+    return false;

You can simplify this using dyn_cast_or_null:

   Value *IfCond2 = GetIfCondition(BB, IfTrue2, IfFalse2);
   Instruction *CInst2 = dyn_cast_or_null<Instruction>(IfCond2);
   if (!CInt2)
     return false;

+  // Check whether \param Head2 has side-effect and is safe to speculate.

Ensure that ... has no side-effects?

+  Head1->getInstList().pop_back();
+  Head1->getInstList().splice(Head1->end(), Head2->getInstList());

You don't update PHI nodes after doing this. Can you show me a 
proof-sketch that Head2 can't be used in any PHI nodes in the function 
if we reach here?

+  bool OptCB = (ParallelAndOr && AA && TTI.hasBranchDivergence()) ? 
true : false;

I think the ?: is redundant here.

-  initializeCFGSimplifyPassPass(Registry);
+  initializeCFGCANONPass(Registry);
+  initializeCFGOPTPass(Registry);

OPT and CANON aren't acronyms.

  namespace {
-  struct CFGSimplifyPass : public FunctionPass {
-    static char ID; // Pass identification, replacement for typeid
-    CFGSimplifyPass() : FunctionPass(ID) {
-      initializeCFGSimplifyPassPass(*PassRegistry::getPassRegistry());
-    }
+struct CFGSimplifyPass : public FunctionPass {
+  CFGSimplifyPass(char &ID, bool isOpt) : FunctionPass(ID), IsOpt(isOpt) {}
+  virtual bool runOnFunction(Function &F);

Why did you unindent the struct? Please put it back. See 
http://llvm.org/docs/CodingStandards.html#namespace-indentation .

+                                   const DataLayout *TD, AliasAnalysis 
* AA) {

Extra space before 'AA'.

+                 const DataLayout *TD = 0, AliasAnalysis * AA = 0);

Extra space again.

Nick

>
> -Andy
>
>> On Jul 12, 2013, at 3:39 PM, Ye, Mei <Mei.Ye at amd.com
>> <mailto:Mei.Ye at amd.com>> wrote:
>>
>>
>> The fundamental issue is whether target-specific works are allowed
>> inside "machine-independent" passes. In fact, almost all compiler
>> optimizations are target-dependent. Having an good infrastructure to
>> enable target-specific tunings or optimizations will promote
>> code-sharing and code-quality. The common practice that I have seen is
>> that compiler vendors keep their work "if-defed" inside their own
>> branches and do not contribute back to the trunk. There are IP and
>> business strategy reasons. But in the long run, code base diverges and
>> merging gets more difficult if not impossible. Not only the vendors
>> will suffer, but also the larger community if less and less
>> contributions flow back into the trunk.
>> Yes, please contribute your target-dependent optimizations! Usually
>> the best way to introduce them is in target passes, which are only
>> built when building your target. You can easily do this by overriding
>> TargetPassConfig::addISelPrepare.
>> For this patch, you could make the argument that your transformations
>> are sufficiently generic that they might as well be made available in
>> SimplifyCFG.cpp. I agree.
>> However, (correct me if I'm wrong) your transformations are not
>> required to expose downstream machine-independent optimizations. They
>> should not run early in the pass pipeline. Doing so will lead to
>> divergence across targets and lack of code sharing and code quality
>> that you are afraid of. There is also no reason to run them evey time
>> we invoke the SimplifyCFG pass.
>> We want to have a core SimplifyCFG pass that can be run repeatedly to
>> canonicalize the IR. It's one of the first things we run. Introducing
>> target heuristics this early would be a mistake.
>> I realize this design goal is not obvious from the current codebase
>> and documentation. Tomorrow, I will send out a design proposal to llvm
>> to make sure contributors are working toward a common goal.
>> You could help move toward this goal though by modifying your patch to
>> run only once in the final round of SimplifyCFG (after jump-threading
>> and DSE).
>> The most straightforward way to handle this is to add a new
>> OptimizeCFG pass for all the "lowering" stuff and simply schedule it
>> after the last SimplifyCFG.
>> If you want to combine it with a round of SimplifyCFG, that's also ok,
>> but we should still have a distinct pass name. In this case you need to:
>> - Define a subclass of CFGSimplifyPass that passes a flag to the
>> CFGSimplifyPass ctor to enable lowering optimizations.
>> - Add INITIALIZE_PASS_... declarations for the new pass name
>> "optimizecfg".
>> - Add the same flag to createCFGSimplificationPass() and make the
>> trivial changes to populateModulePassManager() and
>> populateLTOPassManager().
>> See ScalerReplAggregates.cpp for an example of all this.
>>
>>
>> If we have a good repository of optimizations, each target can
>> selectively enable/disable them to avoid unexpected performance
>> impact. To safeguard the correctness, the community can create a super
>> target (just clone X86) that tests all optimizations.
>>
>> From the aspect of implementation, this patch fits into SimplifyCFG
>> quite naturally. It is often advantageous to change the
>> "machine-independent" codes since you already traverse the whole
>> function and reach the point where certain patterns are recognized in
>> a similar pass. Adding a new optimization/transformation at that point
>> is free, while creating a separated pass will have to repeat works and
>> increases compilation time. JIT and on-line compilation models are
>> very stingy in compilation time. I unfortunately live in a world of
>> compiling-from-source model, adding a new pass is prohibited in my
>> organization. Besides, this patch is not GPU specific. It is good for
>> CPU in general unless you happen to have patterns of very expensive
>> floating point comparisons. Even in that case, it might still win due
>> to reduction in branches.
>> It's a moot point if you piggy back on the last SimplifyCFG as
>> explained above. But I don't buy this argument at all. What work do
>> you need to "redo" if you run this as a separate pass? Iterating over
>> blocks? How does that compare to the cost of analyzing compound
>> if-statements every time we run SimplifyCFG. Please quantify the
>> impact of running this in a separate "LowerCFG" pass before claiming
>> that you're optimizing the pass pipeline.
>> -Andy
>>
>>
>>
>> And, yes, GPU needs to do CFG transformations. But code gen can
>> introduce new control flows. Vendors will need to recognize
>> irreducible CFG in their jitters anyway as a verification.
>>
>> -Mei
>>
>>
>> -----Original Message-----
>> From: Nick Lewycky [mailto:nicholas at mxc.ca]
>> Sent: Thursday, July 11, 2013 10:40 PM
>> To: Ye, Mei
>> Cc: Tom Stellard;llvm-commits at cs.uiuc.edu
>> <mailto:llvm-commits at cs.uiuc.edu>
>> Subject: Re: [patch] simplifyCFG for review
>>
>> The fundamental issue with this patch is that you're saying that
>> simplifycfg should be responsible for structuring the CFG differently
>> based on whether the target supports branches that go different ways in
>> different threads (a common GPU restriction).
>>
>> We try very, very hard to avoid doing target-specific things inside
>> passes like this. Doing so requires an analysis of why it couldn't be
>> done differently. (Don't GPUs have problems with irreducable CFG anyhow?
>> Don't you already need to rewrite the CFG for that anyways? Why not have
>> a GPU CFG simplification pass right before CodeGenPrep?)
>>
>> I have some comments on the details of the patch, but the above issue is
>> big and needs wider discussion and consensus first.
>>
>> Nick
>>
>> Ye, Mei wrote:
>>
>> Thanks Tom. Attached is a new patch that adds comments as requested. I
>> also made a new diff against most-recent trunk. But it looks like many
>> files have locks, so some diffs are based on an earlier revision.
>>
>> With regard to your question:
>> "I'm a little confused about why we need to add a
>> BasicTargetTransformInfo and
>> also an AMDGPUTargetTransformInfo. What is the reason for this?"
>>
>> My answer is that same thing happens to X86, ARM, and other targets. I
>> didn't track whether BasicTargetTransformInfo is always needed, but I
>> think you can build compiler for more than one targets, and for
>> targets that you do not supply a target-spefici Tranform Info, you can
>> always default back to use the basic one.
>>
>> -Mei
>>
>>
>> -----Original Message-----
>> From: Tom Stellard [mailto:tom at stellard.net]
>> Sent: Monday, July 08, 2013 11:14 AM
>> To: Ye, Mei
>> Cc: Evan Cheng; Renato Golin; Sean Silva;llvm-commits at cs.uiuc.edu
>> <mailto:llvm-commits at cs.uiuc.edu>
>> Subject: Re: [patch] simplifyCFG for review
>>
>> On Fri, Jun 28, 2013 at 09:46:13PM +0000, Ye, Mei wrote:
>>
>> Hi all
>>
>> Thank you for your comments. Attached is an updated patch.
>> - add TargetTransformInfo to R600 .
>> - add hasBranchDivergence to query whether a target has branch divergence.
>> - Invoke branch reduction opts only when the underlying target has
>> branch divergence. So currently it only gets triggered for R600.
>> - fixed a bug in opt.cpp
>> - correct clang-format issues.
>>
>> I have tested the correctness on x86 (by adding branch divergence to
>> X86TargetTransformInfo in my local workspace) using existing testing
>> infrastructure and CPU2006, CPU2000. Tom Stellard agrees to run R600
>> testings (thanks a lot, Tom).
>>
>> -Mei
>>
>>
>> Index: test/Transforms/SimplifyCFG/R600/lit.local.cfg
>> ===================================================================
>> --- test/Transforms/SimplifyCFG/R600/lit.local.cfg(revision 0)
>> +++ test/Transforms/SimplifyCFG/R600/lit.local.cfg(revision 0)
>> @@ -0,0 +1,6 @@
>> +config.suffixes = ['.ll', '.c', '.cpp']
>> +
>> +targets = set(config.root.targets_to_build.split())
>> +if not 'R600' in targets:
>> + config.unsupported = True
>> +
>> Index: test/Transforms/SimplifyCFG/R600/parallelorifcollapse.ll
>> ===================================================================
>> --- test/Transforms/SimplifyCFG/R600/parallelorifcollapse.ll(revision 0)
>> +++ test/Transforms/SimplifyCFG/R600/parallelorifcollapse.ll(revision 0)
>> @@ -0,0 +1,51 @@
>> +; Function Attrs: nounwind
>> +; RUN: opt< %s -mtriple=r600-unknown-linux-gnu -simplifycfg -basicaa
>> -S | FileCheck %s
>> +; CHECK: or i1
>> +; CHECK-NEXT: br
>> +; CHECK: br
>> +; CHECK: ret
>>
>> It's not clear to me what the expected output of this test is supposed
>> to be. Could you add a comment explaining what the simplifycfg pass is
>> supposed to be doing here.
>>
>>
>> +define void @_Z9chk1D_512v() #0 {
>> +entry:
>> + %a0 = alloca i32, align 4
>> + %b0 = alloca i32, align 4
>> + %c0 = alloca i32, align 4
>> + %d0 = alloca i32, align 4
>> + %a1 = alloca i32, align 4
>> + %b1 = alloca i32, align 4
>> + %c1 = alloca i32, align 4
>> + %d1 = alloca i32, align 4
>> + %data = alloca i32, align 4
>> + %0 = load i32* %a0, align 4
>> + %1 = load i32* %b0, align 4
>> + %cmp = icmp ne i32 %0, %1
>> + br i1 %cmp, label %land.lhs.true, label %if.end
>> +
>> +land.lhs.true: ; preds = %entry
>> + %2 = load i32* %c0, align 4
>> + %3 = load i32* %d0, align 4
>> + %cmp1 = icmp ne i32 %2, %3
>> + br i1 %cmp1, label %if.then, label %if.end
>> +
>> +if.then: ; preds = %land.lhs.true
>> + store i32 1, i32* %data, align 4
>> + br label %if.end
>> +
>> +if.end: ; preds = %if.then, %land.lhs.true, %entry
>> + %4 = load i32* %a1, align 4
>> + %5 = load i32* %b1, align 4
>> + %cmp2 = icmp ne i32 %4, %5
>> + br i1 %cmp2, label %land.lhs.true3, label %if.end6
>> +
>> +land.lhs.true3: ; preds = %if.end
>> + %6 = load i32* %c1, align 4
>> + %7 = load i32* %d1, align 4
>> + %cmp4 = icmp ne i32 %6, %7
>> + br i1 %cmp4, label %if.then5, label %if.end6
>> +
>> +if.then5: ; preds = %land.lhs.true3
>> + store i32 1, i32* %data, align 4
>> + br label %if.end6
>> +
>> +if.end6: ; preds = %if.then5, %land.lhs.true3, %if.end
>> + ret void
>> +}
>> Index: test/Transforms/SimplifyCFG/R600/parallelandifcollapse.ll
>> ===================================================================
>> --- test/Transforms/SimplifyCFG/R600/parallelandifcollapse.ll(revision 0)
>> +++ test/Transforms/SimplifyCFG/R600/parallelandifcollapse.ll(revision 0)
>> @@ -0,0 +1,58 @@
>> +; Function Attrs: nounwind
>> +; RUN: opt< %s -mtriple=r600-unknown-linux-gnu -simplifycfg -basicaa
>> -S | FileCheck %s
>> +; CHECK: or i1
>> +; CHECK-NEXT: br
>> +; CHECK: br
>> +; CHECK: ret
>>
>> Same thing here, a comment explaining what the expected output should be
>> would be helpful.
>>
>>
>> +define void @_Z9chk1D_512v() #0 {
>> +entry:
>> + %a0 = alloca i32, align 4
>> + %b0 = alloca i32, align 4
>> + %c0 = alloca i32, align 4
>> + %d0 = alloca i32, align 4
>> + %a1 = alloca i32, align 4
>> + %b1 = alloca i32, align 4
>> + %c1 = alloca i32, align 4
>> + %d1 = alloca i32, align 4
>> + %data = alloca i32, align 4
>> + %0 = load i32* %a0, align 4
>> + %1 = load i32* %b0, align 4
>> + %cmp = icmp ne i32 %0, %1
>> + br i1 %cmp, label %land.lhs.true, label %if.else
>> +
>> +land.lhs.true: ; preds = %entry
>> + %2 = load i32* %c0, align 4
>> + %3 = load i32* %d0, align 4
>> + %cmp1 = icmp ne i32 %2, %3
>> + br i1 %cmp1, label %if.then, label %if.else
>> +
>> +if.then: ; preds = %land.lhs.true
>> + br label %if.end
>> +
>> +if.else: ; preds = %land.lhs.true, %entry
>> + store i32 1, i32* %data, align 4
>> + br label %if.end
>> +
>> +if.end: ; preds = %if.else, %if.then
>> + %4 = load i32* %a1, align 4
>> + %5 = load i32* %b1, align 4
>> + %cmp2 = icmp ne i32 %4, %5
>> + br i1 %cmp2, label %land.lhs.true3, label %if.else6
>> +
>> +land.lhs.true3: ; preds = %if.end
>> + %6 = load i32* %c1, align 4
>> + %7 = load i32* %d1, align 4
>> + %cmp4 = icmp ne i32 %6, %7
>> + br i1 %cmp4, label %if.then5, label %if.else6
>> +
>> +if.then5: ; preds = %land.lhs.true3
>> + br label %if.end7
>> +
>> +if.else6: ; preds = %land.lhs.true3, %if.end
>> + store i32 1, i32* %data, align 4
>> + br label %if.end7
>> +
>> +if.end7: ; preds = %if.else6, %if.then5
>> + ret void
>> +}
>> +
>> Index: include/llvm/Analysis/TargetTransformInfo.h
>> ===================================================================
>> --- include/llvm/Analysis/TargetTransformInfo.h(revision 183763)
>> +++ include/llvm/Analysis/TargetTransformInfo.h(working copy)
>> @@ -171,6 +171,9 @@
>> /// comments for a detailed explanation of the cost values.
>> virtual unsigned getUserCost(const User *U) const;
>>
>> + /// \brief hasBranchDivergence - Return true if branch divergence
>> exists.
>> + virtual bool hasBranchDivergence() const;
>> +
>>
>> What exactly is branch divergence? Could you explain this more in
>> the comment and give an example.
>>
>>
>> /// \brief Test whether calls to a function lower to actual program
>> function
>> /// calls.
>> ///
>> Index: include/llvm/Transforms/Utils/Local.h
>> ===================================================================
>> --- include/llvm/Transforms/Utils/Local.h(revision 184607)
>> +++ include/llvm/Transforms/Utils/Local.h(working copy)
>> @@ -39,6 +39,7 @@
>> class TargetLibraryInfo;
>> class TargetTransformInfo;
>> class DIBuilder;
>> +class AliasAnalysis;
>>
>> template<typename T> class SmallVectorImpl;
>>
>> @@ -136,7 +137,7 @@
>> /// the basic block that was pointed to.
>> ///
>> bool SimplifyCFG(BasicBlock *BB, const TargetTransformInfo&TTI,
>> - const DataLayout *TD = 0);
>> + const DataLayout *TD = 0, AliasAnalysis * AA = 0);
>>
>> /// FoldBranchToCommonDest - If this basic block is ONLY a setcc and a
>> branch,
>> /// and if a predecessor branches to us and one of our successors,
>> fold the
>> Index: tools/opt/opt.cpp
>> ===================================================================
>> --- tools/opt/opt.cpp(revision 183763)
>> +++ tools/opt/opt.cpp(working copy)
>> @@ -668,6 +668,9 @@
>> FPasses.reset(new FunctionPassManager(M.get()));
>> if (TD)
>> FPasses->add(new DataLayout(*TD));
>> + if (TM.get())
>> + TM->addAnalysisPasses(*FPasses);
>> +
>> }
>>
>> if (PrintBreakpoints) {
>> Index: lib/Analysis/TargetTransformInfo.cpp
>> ===================================================================
>> --- lib/Analysis/TargetTransformInfo.cpp(revision 183763)
>> +++ lib/Analysis/TargetTransformInfo.cpp(working copy)
>> @@ -88,6 +88,8 @@
>> return PrevTTI->getUserCost(U);
>> }
>>
>> +bool TargetTransformInfo::hasBranchDivergence() const { return false; }
>> +
>> bool TargetTransformInfo::isLoweredToCall(const Function *F) const {
>> return PrevTTI->isLoweredToCall(F);
>> }
>> Index: lib/Target/R600/AMDGPUTargetMachine.cpp
>> ===================================================================
>> --- lib/Target/R600/AMDGPUTargetMachine.cpp(revision 183763)
>> +++ lib/Target/R600/AMDGPUTargetMachine.cpp(working copy)
>> @@ -105,6 +105,18 @@
>> return new AMDGPUPassConfig(this, PM);
>> }
>>
>> +//===----------------------------------------------------------------------===//
>> +// AMDGPU Analysis Pass Setup
>> +//===----------------------------------------------------------------------===//
>> +
>> +void AMDGPUTargetMachine::addAnalysisPasses(PassManagerBase&PM) {
>> + // Add first the target-independent BasicTTI pass, then our AMDGPU
>> pass. This
>> + // allows the AMDGPU pass to delegate to the target independent
>> layer when
>> + // appropriate.
>> + PM.add(createBasicTargetTransformInfoPass(getTargetLowering()));
>> + PM.add(createAMDGPUTargetTransformInfoPass(this));
>>
>> I'm a little confused about why we need to add a
>> BasicTargetTransformInfo and
>> also an AMDGPUTargetTransformInfo. What is the reason for this?
>>
>>
>> +}
>> +
>> bool
>> AMDGPUPassConfig::addPreISel() {
>> const AMDGPUSubtarget&ST = TM->getSubtarget<AMDGPUSubtarget>();
>> Index: lib/Target/R600/AMDGPUTargetTransformInfo.cpp
>> ===================================================================
>> --- lib/Target/R600/AMDGPUTargetTransformInfo.cpp(revision 0)
>> +++ lib/Target/R600/AMDGPUTargetTransformInfo.cpp(revision 0)
>>
>> It looks like you forgot to add this new file to CMakeLists.txt.
>>
>>
>> @@ -0,0 +1,90 @@
>> +//===-- AMDGPUTargetTransformInfo.cpp - AMDGPU specific TTI pass
>> +//----------------===//
>> +//
>> +// The LLVM Compiler Infrastructure
>> +//
>> +// This file is distributed under the University of Illinois Open Source
>> +// License. See LICENSE.TXT for details.
>> +//
>> +//===----------------------------------------------------------------------===//
>> +/// \file
>> +/// This file implements a TargetTransformInfo analysis pass specific
>> to the
>> +/// AMDGPU target machine. It uses the target's detailed information
>> to provide
>> +/// more precise answers to certain TTI queries, while letting the target
>> +/// independent and default TTI implementations handle the rest.
>> +///
>> +//===----------------------------------------------------------------------===//
>> +
>> +#define DEBUG_TYPE "AMDGPUtti"
>> +#include "AMDGPU.h"
>> +#include "AMDGPUTargetMachine.h"
>> +#include "llvm/Analysis/TargetTransformInfo.h"
>> +#include "llvm/Support/Debug.h"
>> +#include "llvm/Target/TargetLowering.h"
>> +#include "llvm/Target/CostTable.h"
>> +using namespace llvm;
>> +
>> +// Declare the pass initialization routine locally as target-specific
>> passes
>> +// don't have a target-wide initialization entry point, and so we
>> rely on the
>> +// pass constructor initialization.
>> +namespace llvm {
>> +void initializeAMDGPUTTIPass(PassRegistry&);
>> +}
>> +
>> +namespace {
>> +
>> +class AMDGPUTTI : public ImmutablePass, public TargetTransformInfo {
>> + const AMDGPUTargetMachine *TM;
>> + const AMDGPUSubtarget *ST;
>> + const AMDGPUTargetLowering *TLI;
>> +
>> + /// Estimate the overhead of scalarizing an instruction. Insert and
>> Extract
>> + /// are set if the result needs to be inserted and/or extracted from
>> vectors.
>> + unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool
>> Extract) const;
>> +
>> +public:
>> + AMDGPUTTI() : ImmutablePass(ID), TM(0), ST(0), TLI(0) {
>> + llvm_unreachable("This pass cannot be directly constructed");
>> + }
>> +
>> + AMDGPUTTI(const AMDGPUTargetMachine *TM)
>> + : ImmutablePass(ID), TM(TM), ST(TM->getSubtargetImpl()),
>> + TLI(TM->getTargetLowering()) {
>> + initializeAMDGPUTTIPass(*PassRegistry::getPassRegistry());
>> + }
>> +
>> + virtual void initializePass() { pushTTIStack(this); }
>> +
>> + virtual void finalizePass() { popTTIStack(); }
>> +
>> + virtual void getAnalysisUsage(AnalysisUsage&AU) const {
>> + TargetTransformInfo::getAnalysisUsage(AU);
>> + }
>> +
>> + /// Pass identification.
>> + static char ID;
>> +
>> + /// Provide necessary pointer adjustments for the two base classes.
>> + virtual void *getAdjustedAnalysisPointer(const void *ID) {
>> + if (ID ==&TargetTransformInfo::ID)
>> + return (TargetTransformInfo *)this;
>> + return this;
>> + }
>> +
>> + virtual bool hasBranchDivergence() const;
>> +
>> + /// @}
>> +};
>> +
>> +} // end anonymous namespace
>> +
>> +INITIALIZE_AG_PASS(AMDGPUTTI, TargetTransformInfo, "AMDGPUtti",
>> + "AMDGPU Target Transform Info", true, true, false)
>> +char AMDGPUTTI::ID = 0;
>> +
>> +ImmutablePass *
>> +llvm::createAMDGPUTargetTransformInfoPass(const AMDGPUTargetMachine
>> *TM) {
>> + return new AMDGPUTTI(TM);
>> +}
>> +
>> +bool AMDGPUTTI::hasBranchDivergence() const { return true; }
>> Index: lib/Target/R600/AMDGPUTargetMachine.h
>> ===================================================================
>> --- lib/Target/R600/AMDGPUTargetMachine.h(revision 183763)
>> +++ lib/Target/R600/AMDGPUTargetMachine.h(working copy)
>> @@ -63,6 +63,9 @@
>> }
>> virtual const DataLayout *getDataLayout() const { return&Layout; }
>> virtual TargetPassConfig *createPassConfig(PassManagerBase&PM);
>> +
>> + /// \brief Register R600 analysis passes with a pass manager.
>> + virtual void addAnalysisPasses(PassManagerBase&PM);
>> };
>>
>> } // End namespace llvm
>> Index: lib/Target/R600/AMDGPU.h
>> ===================================================================
>> --- lib/Target/R600/AMDGPU.h(revision 183763)
>> +++ lib/Target/R600/AMDGPU.h(working copy)
>> @@ -46,6 +46,10 @@
>> FunctionPass *createAMDGPUIndirectAddressingPass(TargetMachine&tm);
>> FunctionPass *createAMDGPUISelDag(TargetMachine&tm);
>>
>> +/// \brief Creates an AMDGPU-specific Target Transformation Info pass.
>> +ImmutablePass *
>> + createAMDGPUTargetTransformInfoPass(const AMDGPUTargetMachine *TM);
>> +
>> extern Target TheAMDGPUTarget;
>>
>> } // End namespace llvm
>> Index: lib/Transforms/Utils/SimplifyCFG.cpp
>> ===================================================================
>> --- lib/Transforms/Utils/SimplifyCFG.cpp(revision 184607)
>> +++ lib/Transforms/Utils/SimplifyCFG.cpp(working copy)
>> @@ -17,8 +17,10 @@
>> #include "llvm/ADT/STLExtras.h"
>> #include "llvm/ADT/SetVector.h"
>> #include "llvm/ADT/SmallPtrSet.h"
>> +#include "llvm/ADT/SmallSet.h"
>> #include "llvm/ADT/SmallVector.h"
>> #include "llvm/ADT/Statistic.h"
>> +#include "llvm/Analysis/AliasAnalysis.h"
>> #include "llvm/Analysis/InstructionSimplify.h"
>> #include "llvm/Analysis/TargetTransformInfo.h"
>> #include "llvm/Analysis/ValueTracking.h"
>> @@ -63,6 +65,10 @@
>> HoistCondStores("simplifycfg-hoist-cond-stores", cl::Hidden,
>> cl::init(true),
>> cl::desc("Hoist conditional stores if an unconditional store preceeds"));
>>
>> +static cl::opt<bool>
>> + ParallelAndOr("simplifycfg-parallel-and-or", cl::Hidden, cl::init(true),
>> + cl::desc("Use parallel-and-or mode for branch conditions"));
>> +
>> STATISTIC(NumBitMaps, "Number of switch instructions turned into
>> bitmaps");
>> STATISTIC(NumLookupTables, "Number of switch instructions turned into
>> lookup tables");
>> STATISTIC(NumSinkCommons, "Number of common instructions sunk down to
>> the end block");
>> @@ -88,6 +94,7 @@
>> class SimplifyCFGOpt {
>> const TargetTransformInfo&TTI;
>> const DataLayout *const TD;
>> + AliasAnalysis *AA;
>>
>> Value *isValueEqualityComparison(TerminatorInst *TI);
>> BasicBlock *GetValueEqualityComparisonCases(TerminatorInst *TI,
>> @@ -105,10 +112,25 @@
>> bool SimplifyIndirectBr(IndirectBrInst *IBI);
>> bool SimplifyUncondBranch(BranchInst *BI, IRBuilder<> &Builder);
>> bool SimplifyCondBranch(BranchInst *BI, IRBuilder<>&Builder);
>> + /// \brief Use parallel-and or parallel-or to generate conditions for
>> + /// conditional branches.
>> + bool SimplifyParallelAndOr(BasicBlock *BB, IRBuilder<> &Builder,
>> Pass *P = 0);
>> + /// \brief If \param BB is the merge block of an if-region, attempt
>> to merge
>> + /// the if-region with an adjacent if-region upstream if two if-regions
>> + /// contain identical instructions.
>> + bool MergeIfRegion(BasicBlock *BB, IRBuilder<> &Builder, Pass *P = 0);
>> + /// \brief Compare a pair of blocks: \param Block1 and \param
>> Block2, which
>> + /// are from two if-regions whose head blocks are \param Head1 and
>> \param
>> + /// Head2. \returns true if \param Block1 and \param Block2 contain
>> identical
>> + /// instructions, and none of the instructions alias with \param Head2.
>> + /// This is used as a legality check for merging if-regions.
>> + bool CompareBlock(BasicBlock *Head1, BasicBlock *Head2, BasicBlock
>> *Block1,
>> + BasicBlock *Block2);
>>
>> public:
>> - SimplifyCFGOpt(const TargetTransformInfo&TTI, const DataLayout *TD)
>> - : TTI(TTI), TD(TD) {}
>> + SimplifyCFGOpt(const TargetTransformInfo&TTI, const DataLayout *TD,
>> + AliasAnalysis *aa)
>> + : TTI(TTI), TD(TD), AA(aa) {}
>> bool run(BasicBlock *BB);
>> };
>> }
>> @@ -195,8 +217,8 @@
>> }
>>
>>
>> -/// GetIfCondition - Given a basic block (BB) with two predecessors
>> (and at
>> -/// least one PHI node in it), check to see if the merge at this
>> block is due
>> +/// GetIfCondition - Given a basic block (BB) with two predecessors,
>> +/// check to see if the merge at this block is due
>> /// to an "if condition". If so, return the boolean condition that
>> determines
>> /// which entry into BB will be taken. Also, return by references the
>> block
>> /// that will be entered from if the condition is true, and the block
>> that will
>> @@ -206,12 +228,31 @@
>> /// instructions in them.
>> static Value *GetIfCondition(BasicBlock *BB, BasicBlock *&IfTrue,
>> BasicBlock *&IfFalse) {
>> - PHINode *SomePHI = cast<PHINode>(BB->begin());
>> - assert(SomePHI->getNumIncomingValues() == 2&&
>> - "Function can only handle blocks with 2 predecessors!");
>> - BasicBlock *Pred1 = SomePHI->getIncomingBlock(0);
>> - BasicBlock *Pred2 = SomePHI->getIncomingBlock(1);
>> + PHINode *SomePHI = dyn_cast<PHINode>(BB->begin());
>> + BasicBlock *Pred1 = NULL;
>> + BasicBlock *Pred2 = NULL;
>>
>> + if (SomePHI) {
>> + if (SomePHI->getNumIncomingValues() != 2)
>> + return NULL;
>> + Pred1 = SomePHI->getIncomingBlock(0);
>> + Pred2 = SomePHI->getIncomingBlock(1);
>> + } else {
>> + SmallSetVector<BasicBlock *, 16> Preds(pred_begin(BB), pred_end(BB));
>> + for (SmallSetVector<BasicBlock *, 16>::iterator PI = Preds.begin(),
>> + PE = Preds.end();
>> + PI != PE; ++PI) {
>> + if (Pred1 == NULL)
>> + Pred1 = *PI;
>> + else if (Pred2 == NULL)
>> + Pred2 = *PI;
>> + else
>> + return NULL;
>> + }
>> + if (!Pred1 || !Pred2)
>> + return NULL;
>> + }
>> +
>> // We can only handle branches. Other control flow will be lowered to
>> // branches if possible anyway.
>> BranchInst *Pred1Br = dyn_cast<BranchInst>(Pred1->getTerminator());
>> @@ -4039,6 +4080,402 @@
>> return false;
>> }
>>
>> +/// If \param [in] BB has more than one predecessor that is a conditional
>> +/// branch, attempt to use parallel and/or for the branch condition.
>> \returns
>> +/// true on success.
>> +///
>> +/// Before:
>> +/// ......
>> +/// %cmp10 = fcmp une float %tmp1, %tmp2
>> +/// br i1 %cmp1, label %if.then, label %lor.rhs
>> +///
>> +/// lor.rhs:
>> +/// ......
>> +/// %cmp11 = fcmp une float %tmp3, %tmp4
>> +/// br i1 %cmp11, label %if.then, label %ifend
>> +///
>> +/// if.end: // the merge block
>> +/// ......
>> +///
>> +/// if.then: // has two predecessors, both of them contains
>> conditional branch.
>> +/// ......
>> +/// br label %if.end;
>> +///
>> +/// After:
>> +/// ......
>> +/// %cmp10 = fcmp une float %tmp1, %tmp2
>> +/// ......
>> +/// %cmp11 = fcmp une float %tmp3, %tmp4
>> +/// %cmp12 = or i1 %cmp10, %cmp11 // parallel-or mode.
>> +/// br i1 %cmp12, label %if.then, label %ifend
>> +///
>> +/// if.end:
>> +/// ......
>> +///
>> +/// if.then:
>> +/// ......
>> +/// br label %if.end;
>> +///
>> +/// Current implementation handles two cases.
>> +/// Case 1: \param BB is on the else-path.
>> +///
>> +/// BB1
>> +/// / |
>> +/// BB2 |
>> +/// / \ |
>> +/// BB3 \ | where, BB1, BB2 contain conditional branches.
>> +/// \ | / BB3 contains unconditional branch.
>> +/// \ | / BB4 corresponds to \param BB which is also the merge.
>> +/// BB => BB4
>> +///
>> +///
>> +/// Corresponding source code:
>> +///
>> +/// if (a == b&& c == d)
>> +/// statement; // BB3
>> +///
>> +/// Case 2: \param BB BB is on the then-path.
>> +///
>> +/// BB1
>> +/// / |
>> +/// | BB2
>> +/// \ / | where BB1, BB2 contain conditional branches.
>> +/// BB => BB3 | BB3 contains unconditiona branch and corresponds
>> +/// \ / to \param BB. BB4 is the merge.
>> +/// BB4
>> +///
>> +/// Corresponding source code:
>> +///
>> +/// if (a == b || c == d)
>> +/// statement; // BB3
>> +///
>> +/// In both cases, \param BB is the common successor of conditional
>> branches.
>> +/// In Case 1, \param BB (BB4) has an unconditional branch (BB3) as
>> +/// its predecessor. In Case 2, \param BB (BB3) only has conditional
>> branches
>> +/// as its predecessors.
>> +///
>> +bool SimplifyCFGOpt::SimplifyParallelAndOr(BasicBlock *BB,
>> IRBuilder<> &Builder,
>> + Pass *P) {
>> + PHINode *PHI = dyn_cast<PHINode>(&BB->front());
>> + if (PHI)
>> + return false; // For simplicity, avoid cases containing PHI nodes.
>> +
>> + BasicBlock *LCond = NULL;
>> + BasicBlock *FCond = NULL;
>> + BasicBlock *UCond = NULL;
>> + int Idx = -1;
>> +
>> + SmallSetVector<BasicBlock *, 16> Preds(pred_begin(BB), pred_end(BB));
>> + // Check predecessors of \param BB.
>> + for (SmallSetVector<BasicBlock *, 16>::iterator PI = Preds.begin(),
>> + PE = Preds.end();
>> + PI != PE; ++PI) {
>> + BasicBlock *Pred = *PI;
>> + TerminatorInst *PTI = Pred->getTerminator();
>> + BranchInst *PBI = dyn_cast<BranchInst>(PTI);
>> +
>> + // All predecessors should terminate with a branch.
>> + if (!PBI)
>> + return false;
>> +
>> + BasicBlock *PP = Pred->getSinglePredecessor();
>> +
>> + if (PBI->isUnconditional()) {
>> + // Case 1: Pred (BB3) is an unconditional block, it should
>> + // have a single successor and a single predecessor (BB2) that
>> + // is also a predecessor of \param BB (BB4) and should not have
>> + // address-taken. There should exist only one such unconditional
>> + // branch among the predecessors.
>> + if (UCond || !PP || (Preds.count(PP) == 0) ||
>> + (PTI->getNumSuccessors() != 1) || Pred->hasAddressTaken())
>> + return false;
>> +
>> + UCond = Pred;
>> + continue;
>> + }
>> +
>> + // Only conditional branches are allowed beyond this point.
>> + if (!PBI->isConditional())
>> + return false;
>> +
>> + // Condition's unique use should be the branch instruction.
>> + Value *PC = PBI->getCondition();
>> + if (!PC || (PC->getNumUses() != 1))
>> + return false;
>> +
>> + if (PP&& (Preds.count(PP) != 0)) {
>> + // These are internal condition blocks to be merged from, e.g.,
>> + // BB2 in both cases.
>> + // Should not be address-taken.
>> + if (Pred->hasAddressTaken())
>> + return false;
>> +
>> + // Instructions in the internal condition blocks should be safe
>> + // to hoist up.
>> + for (BasicBlock::iterator BI = Pred->begin(), BE = PBI; BI != BE;) {
>> + Instruction *CI = BI++;
>> + if (isa<PHINode>(CI) || CI->mayHaveSideEffects() ||
>> + !isSafeToSpeculativelyExecute(CI))
>> + return false;
>> + }
>> + } else {
>> + // This is the condition block to be merged into, e.g. BB1 in
>> + // both cases.
>> + if (FCond)
>> + return false;
>> + FCond = Pred;
>> + }
>> +
>> + // The terminator must have exactly two successors.
>> + if (PTI->getNumSuccessors() != 2)
>> + return false;
>> +
>> + // Find whether BB is uniformly on the true (or false) path
>> + // for all of its predecessors.
>> + BasicBlock *PS1 = PTI->getSuccessor(0);
>> + BasicBlock *PS2 = PTI->getSuccessor(1);
>> + BasicBlock *PS = (PS1 == BB) ? PS2 : PS1;
>> + int CIdx = (PS1 == BB) ? 0 : 1;
>> +
>> + if (Idx == -1)
>> + Idx = CIdx;
>> + else if (CIdx != Idx)
>> + return false;
>> +
>> + // PS is the successor which is not BB. Check successors to identify
>> + // the last conditional branch.
>> + if (Preds.count(PS) == 0) {
>> + // Case 2.
>> + // BB must have an unique successor.
>> + TerminatorInst *TBB = BB->getTerminator();
>> + if (TBB->getNumSuccessors() != 1)
>> + return false;
>> +
>> + BasicBlock *SBB = TBB->getSuccessor(0);
>> + PHI = dyn_cast<PHINode>(&SBB->front());
>> + if (PHI)
>> + return false;
>> +
>> + // PS (BB4) should be BB's successor.
>> + if (SBB != PS)
>> + return false;
>> + LCond = Pred;
>> + } else {
>> + TerminatorInst *TPS = PS->getTerminator();
>> + BranchInst *BPS = dyn_cast<BranchInst>(TPS);
>> + if (BPS&& BPS->isUnconditional()) {
>> + // Case 1: PS(BB3) should be an unconditional branch.
>> + LCond = Pred;
>> + }
>> + }
>> + }
>> +
>> + if (FCond&& LCond&& (FCond != LCond)) {
>> + // Do the transformation.
>> + BasicBlock *CB;
>> + bool ITER = true;
>> + BasicBlock::iterator ItOld = Builder.GetInsertPoint();
>> + TerminatorInst *PTI = FCond->getTerminator();
>> + BranchInst *PBI = dyn_cast<BranchInst>(PTI);
>> + Value *PC = PBI->getCondition();
>> + do {
>> + CB = PBI->getSuccessor(1 - Idx);
>> + // Delete the conditional branch.
>> + FCond->getInstList().pop_back();
>> + FCond->getInstList().splice(FCond->end(), CB->getInstList());
>> + PTI = FCond->getTerminator();
>> + PBI = dyn_cast<BranchInst>(PTI);
>> + Value *CC = PBI->getCondition();
>> + // Merge conditions.
>> + Builder.SetInsertPoint(PTI);
>> + Value *NC;
>> + if (Idx == 0)
>> + // Case 2, use parallel or.
>> + NC = Builder.CreateOr(PC, CC);
>> + else
>> + // Case 1, use parallel and.
>> + NC = Builder.CreateAnd(PC, CC);
>> +
>> + PBI->replaceUsesOfWith(CC, NC);
>> + PC = NC;
>> + if (CB == LCond)
>> + ITER = false;
>> + // Remove internal conditional branches.
>> + CB->dropAllReferences();
>> + // make CB unreachable and let downstream to delete the block.
>> + new UnreachableInst(CB->getContext(), CB);
>> + } while (ITER);
>> +
>> + Builder.SetInsertPoint(ItOld);
>> + DEBUG(dbgs()<< "Use parallel and/or in:\n"<< *FCond);
>> + return true;
>> + }
>> +
>> + return false;
>> +}
>> +
>> +/// Compare blocks from two if-regions, where \param Head1 is the
>> head of the
>> +/// 1st if-region. \param Head2 is the head of the 2nd if-region. \param
>> +/// Block1 is a block in the 1st if-region to compare. \param Block2
>> is a block
>> +// in the 2nd if-region to compare. \returns true if the blocks have
>> identical
>> +/// instructions that do not alias with \param Head2 and it is legal
>> to merge
>> +/// the two blocks so that only one instance of each instruction is kept.
>> +///
>> +bool SimplifyCFGOpt::CompareBlock(BasicBlock *Head1, BasicBlock *Head2,
>> + BasicBlock *Block1, BasicBlock *Block2) {
>> + TerminatorInst *PTI2 = Head2->getTerminator();
>> + Instruction *PBI2 = Head2->begin();
>> +
>> + if (Block1 == Head1) {
>> + if (Block2 != Head2)
>> + return false;
>> + } else if (Block2 == Head2)
>> + return false;
>> + else {
>> + // Check whether instructions in Block1 and Block2 are identical
>> + // and do not alias with instructions in Head2.
>> + BasicBlock::iterator iter1 = Block1->begin();
>> + BasicBlock::iterator end1 = Block1->getTerminator();
>> + BasicBlock::iterator iter2 = Block2->begin();
>> + BasicBlock::iterator end2 = Block2->getTerminator();
>> +
>> + while (1) {
>> + if (iter1 == end1) {
>> + if (iter2 != end2)
>> + return false;
>> + break;
>> + }
>> +
>> + if (!iter1->isIdenticalTo(iter2))
>> + return false;
>> +
>> + // Illegal to remove instructions with side effects except
>> + // non-volatile stores.
>> + if (iter1->mayHaveSideEffects()) {
>> + Instruction *CurI =&*iter1;
>> + StoreInst *SI = dyn_cast<StoreInst>(CurI);
>> + if (!SI || SI->isVolatile())
>> + return false;
>> + }
>> +
>> + // For simplicity and speed, data dependency check can be
>> + // avoided if read from memory doesn't exist.
>> + if (iter1->mayReadFromMemory())
>> + return false;
>> +
>> + if (iter1->mayWriteToMemory()) {
>> + for (BasicBlock::iterator BI = PBI2, BE = PTI2; BI != BE; ++BI) {
>> + if (BI->mayReadFromMemory() || BI->mayWriteToMemory()) {
>> + // Check alias with Head2.
>> + if (!AA || AA->alias(iter1, BI))
>> + return false;
>> + }
>> + }
>> + }
>> + ++iter1;
>> + ++iter2;
>> + }
>> + ;
>> + }
>> +
>> + return true;
>> +}
>> +
>> +/// Check whether \param BB is the merge block of a if-region. If
>> yes, check
>> +/// whether there exists an adjacent if-region upstream, the two
>> if-regions
>> +/// contain identical instuctions and can be legally merged. \returns
>> true if
>> +/// the two if-regions are merged.
>> +///
>> +/// From:
>> +/// if (a)
>> +/// statement;
>> +/// if (b)
>> +/// statement;
>> +///
>> +/// To:
>> +/// if (a || b)
>> +/// statement;
>> +///
>> +bool SimplifyCFGOpt::MergeIfRegion(BasicBlock *BB, IRBuilder<> &Builder,
>> + Pass *P) {
>> + BasicBlock *IfTrue2, *IfFalse2;
>> + Value *IfCond2 = GetIfCondition(BB, IfTrue2, IfFalse2);
>> + if (!IfCond2)
>> + return false;
>> +
>> + Instruction *CInst2 = dyn_cast<Instruction>(IfCond2);
>> + if (!CInst2)
>> + return false;
>> +
>> + BasicBlock *Head2 = CInst2->getParent();
>> + if (Head2->hasAddressTaken())
>> + return false;
>> +
>> + BasicBlock *IfTrue1, *IfFalse1;
>> + Value *IfCond1 = GetIfCondition(Head2, IfTrue1, IfFalse1);
>> + if (!IfCond1)
>> + return false;
>> +
>> + Instruction *CInst1 = dyn_cast<Instruction>(IfCond1);
>> + if (!CInst1)
>> + return false;
>> +
>> + BasicBlock *Head1 = CInst1->getParent();
>> +
>> + // Either then-path or else-path should be empty.
>> + if ((IfTrue1 != Head1)&& (IfFalse1 != Head1))
>> + return false;
>> + if ((IfTrue2 != Head2)&& (IfFalse2 != Head2))
>> + return false;
>> +
>> + TerminatorInst *PTI2 = Head2->getTerminator();
>> + Instruction *PBI2 = Head2->begin();
>> +
>> + if (!CompareBlock(Head1, Head2, IfTrue1, IfTrue2))
>> + return false;
>> +
>> + if (!CompareBlock(Head1, Head2, IfFalse1, IfFalse2))
>> + return false;
>> +
>> + // Check whether \param Head2 has side-effect and is safe to speculate.
>> + for (BasicBlock::iterator BI = PBI2, BE = PTI2; BI != BE; ++BI) {
>> + Instruction *CI = BI;
>> + if (isa<PHINode>(CI) || CI->mayHaveSideEffects() ||
>> + !isSafeToSpeculativelyExecute(CI))
>> + return false;
>> + }
>> +
>> + // Merge \param Head2 into \param Head1.
>> + Head1->getInstList().pop_back();
>> + Head1->getInstList().splice(Head1->end(), Head2->getInstList());
>> + TerminatorInst *PTI = Head1->getTerminator();
>> + BranchInst *PBI = dyn_cast<BranchInst>(PTI);
>> + Value *CC = PBI->getCondition();
>> + BasicBlock::iterator ItOld = Builder.GetInsertPoint();
>> + Builder.SetInsertPoint(PTI);
>> + Value *NC = Builder.CreateOr(CInst1, CC);
>> + PBI->replaceUsesOfWith(CC, NC);
>> + Builder.SetInsertPoint(ItOld);
>> +
>> + // Remove IfTrue1
>> + if (IfTrue1 != Head1) {
>> + IfTrue1->dropAllReferences();
>> + IfTrue1->eraseFromParent();
>> + }
>> +
>> + // Remove IfFalse1
>> + if (IfFalse1 != Head1) {
>> + IfFalse1->dropAllReferences();
>> + IfFalse1->eraseFromParent();
>> + }
>> +
>> + // Remove \param Head2
>> + Head2->dropAllReferences();
>> + Head2->eraseFromParent();
>> + DEBUG(dbgs()<< "If conditions merged into:\n"<< *Head1);
>> + return true;
>> +}
>> +
>> /// Check if passing a value to an instruction will cause undefined
>> behavior.
>> static bool passingValueIsAlwaysUndefined(Value *V, Instruction *I) {
>> Constant *C = dyn_cast<Constant>(V);
>> @@ -4142,6 +4579,10 @@
>>
>> IRBuilder<> Builder(BB);
>>
>> + if (ParallelAndOr&& TTI.hasBranchDivergence()&&
>> + SimplifyParallelAndOr(BB, Builder))
>> + return true;
>> +
>> // If there is a trivial two-entry PHI node in this basic block, and
>> we can
>> // eliminate it, do so now.
>> if (PHINode *PN = dyn_cast<PHINode>(BB->begin()))
>> @@ -4169,6 +4610,9 @@
>> if (SimplifyIndirectBr(IBI)) return true;
>> }
>>
>> + if (ParallelAndOr&& TTI.hasBranchDivergence()&& MergeIfRegion(BB,
>> Builder))
>> + return true;
>> +
>> return Changed;
>> }
>>
>> @@ -4178,6 +4622,6 @@
>> /// of the CFG. It returns true if a modification was made.
>> ///
>> bool llvm::SimplifyCFG(BasicBlock *BB, const TargetTransformInfo&TTI,
>> - const DataLayout *TD) {
>> - return SimplifyCFGOpt(TTI, TD).run(BB);
>> + const DataLayout *TD, AliasAnalysis *AA) {
>> + return SimplifyCFGOpt(TTI, TD, AA).run(BB);
>> }
>> Index: lib/Transforms/Scalar/SimplifyCFGPass.cpp
>> ===================================================================
>> --- lib/Transforms/Scalar/SimplifyCFGPass.cpp(revision 184607)
>> +++ lib/Transforms/Scalar/SimplifyCFGPass.cpp(working copy)
>> @@ -27,6 +27,7 @@
>> #include "llvm/ADT/SmallVector.h"
>> #include "llvm/ADT/Statistic.h"
>> #include "llvm/Analysis/TargetTransformInfo.h"
>> +#include "llvm/Analysis/AliasAnalysis.h"
>> #include "llvm/IR/Attributes.h"
>> #include "llvm/IR/Constants.h"
>> #include "llvm/IR/DataLayout.h"
>> @@ -49,10 +50,14 @@
>>
>> virtual bool runOnFunction(Function&F);
>>
>> - virtual void getAnalysisUsage(AnalysisUsage&AU) const {
>> - AU.addRequired<TargetTransformInfo>();
>> - }
>> - };
>> + virtual void getAnalysisUsage(AnalysisUsage&AU) const {
>> + AU.addRequired<TargetTransformInfo>();
>> + AU.addRequired<AliasAnalysis>();
>> + AU.addRequired<AliasAnalysis>();
>> + }
>> +private:
>> + AliasAnalysis *AA;
>> +};
>> }
>>
>> char CFGSimplifyPass::ID = 0;
>> @@ -301,7 +306,7 @@
>> /// iterativelySimplifyCFG - Call SimplifyCFG on all the blocks in the
>> function,
>> /// iterating until no more changes are made.
>> static bool iterativelySimplifyCFG(Function&F, const
>> TargetTransformInfo&TTI,
>> - const DataLayout *TD) {
>> + const DataLayout *TD, AliasAnalysis * AA) {
>> bool Changed = false;
>> bool LocalChange = true;
>> while (LocalChange) {
>> @@ -310,7 +315,7 @@
>> // Loop over all of the basic blocks and remove them if they are
>> unneeded...
>> //
>> for (Function::iterator BBIt = F.begin(); BBIt != F.end(); ) {
>> - if (SimplifyCFG(BBIt++, TTI, TD)) {
>> + if (SimplifyCFG(BBIt++, TTI, TD, AA)) {
>> LocalChange = true;
>> ++NumSimpl;
>> }
>> @@ -324,11 +329,12 @@
>> // simplify the CFG.
>> //
>> bool CFGSimplifyPass::runOnFunction(Function&F) {
>> + AA =&getAnalysis<AliasAnalysis>();
>> const TargetTransformInfo&TTI = getAnalysis<TargetTransformInfo>();
>> const DataLayout *TD = getAnalysisIfAvailable<DataLayout>();
>> bool EverChanged = removeUnreachableBlocksFromFn(F);
>> EverChanged |= mergeEmptyReturnBlocks(F);
>> - EverChanged |= iterativelySimplifyCFG(F, TTI, TD);
>> + EverChanged |= iterativelySimplifyCFG(F, TTI, TD, AA);
>>
>> // If neither pass changed anything, we're done.
>> if (!EverChanged) return false;
>> @@ -342,7 +348,7 @@
>> return true;
>>
>> do {
>> - EverChanged = iterativelySimplifyCFG(F, TTI, TD);
>> + EverChanged = iterativelySimplifyCFG(F, TTI, TD, AA);
>> EverChanged |= removeUnreachableBlocksFromFn(F);
>> } while (EverChanged);
>>
>> Index: lib/CodeGen/BasicTargetTransformInfo.cpp
>> ===================================================================
>> --- lib/CodeGen/BasicTargetTransformInfo.cpp(revision 183763)
>> +++ lib/CodeGen/BasicTargetTransformInfo.cpp(working copy)
>> @@ -63,6 +63,8 @@
>> return this;
>> }
>>
>> + virtual bool hasBranchDivergence() const;
>> +
>> /// \name Scalar TTI Implementations
>> /// @{
>>
>> @@ -122,6 +124,7 @@
>> return new BasicTTI(TLI);
>> }
>>
>> +bool BasicTTI::hasBranchDivergence() const { return false; }
>>
>> bool BasicTTI::isLegalAddImmediate(int64_t imm) const {
>> return TLI->isLegalAddImmediate(imm);
>>
>>
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>>
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> <odc_v5>
>