<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><br><div><div>On Jul 17, 2013, at 11:07 AM, Ye, Mei <<a href="mailto:Mei.Ye@amd.com">Mei.Ye@amd.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div lang="EN-US" link="blue" vlink="purple" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div class="WordSection1" style="page: WordSection1;"><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125);">Hi Andrew<o:p></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125);"> </span></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><span style="font-size: 11pt; font-family: Calibri, sans-serif; color: rgb(31, 73, 125);">Attached is a patch that follows what you suggested.  Thanks for your time.</span></div></div></div></blockquote><div><br></div><div>I’m ok with this patch now. Someone who can test it should commit. Maybe Tom?</div><div><br></div><div>-Andy</div><div><br></div><blockquote type="cite"><div lang="EN-US" link="blue" vlink="purple" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div class="WordSection1" style="page: WordSection1;"><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><o:p> </o:p></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;">On Jul 12, 2013, at 3:39 PM, Ye, Mei <<a href="mailto:Mei.Ye@amd.com" style="color: purple; text-decoration: underline;">Mei.Ye@amd.com</a>> wrote:<o:p></o:p></div><div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><br><br><o:p></o:p></div><div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><span style="font-size: 6pt;">The fundamental issue is whether target-specific works are allowed inside "machine-independent" passes.  In fact, almost all compiler optimizations are target-dependent.  Having an good infrastructure to enable target-specific tunings or optimizations will promote code-sharing and code-quality.  The common practice that I have seen is that compiler vendors keep their work "if-defed" inside their own branches and do not contribute back to the trunk.  There are IP and business strategy reasons.  But in the long run, code base diverges and merging gets more difficult if not impossible.  Not only the vendors will suffer, but also the larger community if less and less contributions flow back into the trunk.<o:p></o:p></span></div></div><div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><o:p> </o:p></div></div><div><div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;">Yes, please contribute your target-dependent optimizations! Usually the best way to introduce them is in target passes, which are only built when building your target. You can easily do this by overriding TargetPassConfig::addISelPrepare.<o:p></o:p></div></div><div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><o:p> </o:p></div></div><div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;">For this patch, you could make the argument that your transformations are sufficiently generic that they might as well be made available in SimplifyCFG.cpp. I agree.<o:p></o:p></div></div><div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><o:p> </o:p></div></div><div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;">However, (correct me if I'm wrong) your transformations are not required to expose downstream machine-independent optimizations. They should not run early in the pass pipeline. Doing so will lead to divergence across targets and lack of code sharing and code quality that you are afraid of. There is also no reason to run them evey time we invoke the SimplifyCFG pass.<o:p></o:p></div></div><div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;">We want to have a core SimplifyCFG pass that can be run repeatedly to canonicalize the IR. It's one of the first things we run. Introducing target heuristics this early would be a mistake.<o:p></o:p></div></div><div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><o:p> </o:p></div></div><div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;">I realize this design goal is not obvious from the current codebase and documentation. Tomorrow, I will send out a design proposal to llvm to make sure contributors are working toward a common goal.<o:p></o:p></div></div><div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><o:p> </o:p></div></div><div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;">You could help move toward this goal though by modifying your patch to run only once in the final round of SimplifyCFG (after jump-threading and DSE).<o:p></o:p></div></div><div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><o:p> </o:p></div></div><div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;">The most straightforward way to handle this is to add a new OptimizeCFG pass for all the "lowering" stuff and simply schedule it after the last SimplifyCFG.<o:p></o:p></div></div><div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><o:p> </o:p></div></div><div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;">If you want to combine it with a round of SimplifyCFG, that's also ok, but we should still have a distinct pass name. In this case you need to:<o:p></o:p></div></div><div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><o:p> </o:p></div></div><div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;">- Define a subclass of CFGSimplifyPass that passes a flag to the CFGSimplifyPass ctor to enable lowering optimizations.<o:p></o:p></div></div><div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><o:p> </o:p></div></div><div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;">- Add INITIALIZE_PASS_... declarations for the new pass name "optimizecfg".<o:p></o:p></div></div><div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><o:p> </o:p></div></div><div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;">- Add the same flag to createCFGSimplificationPass() and make the trivial changes to populateModulePassManager() and populateLTOPassManager().<o:p></o:p></div></div><div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><o:p> </o:p></div></div><div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;">See ScalerReplAggregates.cpp for an example of all this.<o:p></o:p></div></div></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><br><br><o:p></o:p></div><div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><span style="font-size: 6pt;">If we have a good repository of optimizations, each target can selectively enable/disable them to avoid unexpected performance impact.  To safeguard the correctness, the community can create a super target (just clone X86) that tests all optimizations.  <br><br>From the aspect of implementation, this patch fits into SimplifyCFG quite naturally.  It is often advantageous to change the "machine-independent" codes since you already traverse the whole function and reach the point where certain patterns are recognized in a similar pass.   Adding a new optimization/transformation at that point is free, while creating a separated pass will have to repeat works and increases compilation time.  JIT and on-line compilation models are very stingy in compilation time.  I unfortunately live in a world of compiling-from-source model, adding a new pass is prohibited in my organization.   Besides, this patch is not GPU specific. It is good for CPU in general unless you happen to have patterns of very expensive floating point comparisons.  Even in that case, it might still win due to reduction in branches.  <o:p></o:p></span></div></div><div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><o:p> </o:p></div></div><div><div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;">It's a moot point if you piggy back on the last SimplifyCFG as explained above. But I don't buy this argument at all.  What work do you need to "redo" if you run this as a separate pass?  Iterating over blocks? How does that compare to the cost of analyzing compound if-statements every time we run SimplifyCFG. Please quantify the impact of running this in a separate "LowerCFG" pass before claiming that you're optimizing the pass pipeline.<o:p></o:p></div></div><div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><o:p> </o:p></div></div><div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;">-Andy<o:p></o:p></div></div></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><br><br><o:p></o:p></div><div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><span style="font-size: 6pt;"><br>And, yes, GPU needs to do CFG transformations.  But code gen can introduce new control flows.  Vendors will need to recognize irreducible CFG in their jitters anyway as a verification.<br><br>-Mei<span class="apple-converted-space"> </span><br><br><br>-----Original Message-----<br>From: Nick Lewycky [<a href="mailto:nicholas@mxc.ca" style="color: purple; text-decoration: underline;">mailto:nicholas@mxc.ca</a>]<span class="apple-converted-space"> </span><br>Sent: Thursday, July 11, 2013 10:40 PM<br>To: Ye, Mei<br>Cc: Tom Stellard;<span class="Apple-converted-space"> </span><a href="mailto:llvm-commits@cs.uiuc.edu" style="color: purple; text-decoration: underline;">llvm-commits@cs.uiuc.edu</a><br>Subject: Re: [patch] simplifyCFG for review<br><br>The fundamental issue with this patch is that you're saying that<span class="apple-converted-space"> </span><br>simplifycfg should be responsible for structuring the CFG differently<span class="apple-converted-space"> </span><br>based on whether the target supports branches that go different ways in<span class="apple-converted-space"> </span><br>different threads (a common GPU restriction).<br><br>We try very, very hard to avoid doing target-specific things inside<span class="apple-converted-space"> </span><br>passes like this. Doing so requires an analysis of why it couldn't be<span class="apple-converted-space"> </span><br>done differently. (Don't GPUs have problems with irreducable CFG anyhow?<span class="apple-converted-space"> </span><br>Don't you already need to rewrite the CFG for that anyways? Why not have<span class="apple-converted-space"> </span><br>a GPU CFG simplification pass right before CodeGenPrep?)<br><br>I have some comments on the details of the patch, but the above issue is<span class="apple-converted-space"> </span><br>big and needs wider discussion and consensus first.<br><br>Nick<br><br>Ye, Mei wrote:<br><br><o:p></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><span style="font-size: 6pt;">Thanks Tom.   Attached is a new patch that adds comments as requested. I also made a new diff against most-recent trunk. But it looks like many files have locks, so some diffs are based on an earlier revision.<br><br>With regard to your question:<br>"I'm a little confused about why we need to add a BasicTargetTransformInfo and<br>also an AMDGPUTargetTransformInfo.  What is the reason for this?"<br><br>My answer is that same thing happens to X86, ARM, and other targets.  I didn't track whether BasicTargetTransformInfo is always needed, but I think you can build compiler for more than one targets, and for targets that you do not supply a target-spefici Tranform Info, you can always default back to use the basic one.<br><br>-Mei<br><br><br>-----Original Message-----<br>From: Tom Stellard [<a href="mailto:tom@stellard.net" style="color: purple; text-decoration: underline;">mailto:tom@stellard.net</a>]<br>Sent: Monday, July 08, 2013 11:14 AM<br>To: Ye, Mei<br>Cc: Evan Cheng; Renato Golin; Sean Silva;<span class="Apple-converted-space"> </span><a href="mailto:llvm-commits@cs.uiuc.edu" style="color: purple; text-decoration: underline;">llvm-commits@cs.uiuc.edu</a><br>Subject: Re: [patch] simplifyCFG for review<br><br>On Fri, Jun 28, 2013 at 09:46:13PM +0000, Ye, Mei wrote:<br><br><o:p></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><span style="font-size: 6pt;">Hi all<br><br>Thank you for your comments.   Attached is an updated patch.<br>- add TargetTransformInfo to R600 .<br>- add hasBranchDivergence to query whether a target has branch divergence.<br>- Invoke branch reduction opts only when the underlying target has branch divergence.  So currently it only gets triggered for R600.<br>- fixed a bug in opt.cpp<br>- correct clang-format issues.<br><br>I have tested the correctness on x86 (by adding branch divergence to X86TargetTransformInfo in my local workspace) using existing testing infrastructure and CPU2006, CPU2000.  Tom Stellard agrees to run R600 testings (thanks a lot, Tom).<br><br>-Mei<o:p></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><span style="font-size: 6pt;"><br><br><o:p></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><span style="font-size: 6pt;">Index: test/Transforms/SimplifyCFG/R600/lit.local.cfg<br>===================================================================<br>--- test/Transforms/SimplifyCFG/R600/lit.local.cfg<span class="apple-tab-span">               <span class="Apple-converted-space"> </span></span>(revision 0)<br>+++ test/Transforms/SimplifyCFG/R600/lit.local.cfg<span class="apple-tab-span">            <span class="Apple-converted-space"> </span></span>(revision 0)<br>@@ -0,0 +1,6 @@<br>+config.suffixes = ['.ll', '.c', '.cpp']<br>+<br>+targets = set(config.root.targets_to_build.split())<br>+if not 'R600' in targets:<br>+    config.unsupported = True<br>+<br>Index: test/Transforms/SimplifyCFG/R600/parallelorifcollapse.ll<br>===================================================================<br>--- test/Transforms/SimplifyCFG/R600/parallelorifcollapse.ll<span class="apple-tab-span">                       <span class="Apple-converted-space"> </span></span>(revision 0)<br>+++ test/Transforms/SimplifyCFG/R600/parallelorifcollapse.ll<span class="apple-tab-span">                    <span class="Apple-converted-space"> </span></span>(revision 0)<br>@@ -0,0 +1,51 @@<br>+; Function Attrs: nounwind<br>+; RUN: opt<  %s -mtriple=r600-unknown-linux-gnu -simplifycfg -basicaa -S | FileCheck %s<br>+; CHECK: or i1<br>+; CHECK-NEXT: br<br>+; CHECK: br<br>+; CHECK: ret<o:p></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><span style="font-size: 6pt;"><br>It's not clear to me what the expected output of this test is supposed<br>to be.  Could you add a comment explaining what the simplifycfg pass is<br>supposed to be doing here.<br><br><br><o:p></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><span style="font-size: 6pt;">+define void @_Z9chk1D_512v() #0 {<br>+entry:<br>+  %a0 = alloca i32, align 4<br>+  %b0 = alloca i32, align 4<br>+  %c0 = alloca i32, align 4<br>+  %d0 = alloca i32, align 4<br>+  %a1 = alloca i32, align 4<br>+  %b1 = alloca i32, align 4<br>+  %c1 = alloca i32, align 4<br>+  %d1 = alloca i32, align 4<br>+  %data = alloca i32, align 4<br>+  %0 = load i32* %a0, align 4<br>+  %1 = load i32* %b0, align 4<br>+  %cmp = icmp ne i32 %0, %1<br>+  br i1 %cmp, label %land.lhs.true, label %if.end<br>+<br>+land.lhs.true:                                    ; preds = %entry<br>+  %2 = load i32* %c0, align 4<br>+  %3 = load i32* %d0, align 4<br>+  %cmp1 = icmp ne i32 %2, %3<br>+  br i1 %cmp1, label %if.then, label %if.end<br>+<br>+if.then:                                          ; preds = %land.lhs.true<br>+  store i32 1, i32* %data, align 4<br>+  br label %if.end<br>+<br>+if.end:                                           ; preds = %if.then, %land.lhs.true, %entry<br>+  %4 = load i32* %a1, align 4<br>+  %5 = load i32* %b1, align 4<br>+  %cmp2 = icmp ne i32 %4, %5<br>+  br i1 %cmp2, label %land.lhs.true3, label %if.end6<br>+<br>+land.lhs.true3:                                   ; preds = %if.end<br>+  %6 = load i32* %c1, align 4<br>+  %7 = load i32* %d1, align 4<br>+  %cmp4 = icmp ne i32 %6, %7<br>+  br i1 %cmp4, label %if.then5, label %if.end6<br>+<br>+if.then5:                                         ; preds = %land.lhs.true3<br>+  store i32 1, i32* %data, align 4<br>+  br label %if.end6<br>+<br>+if.end6:                                          ; preds = %if.then5, %land.lhs.true3, %if.end<br>+  ret void<br>+}<br>Index: test/Transforms/SimplifyCFG/R600/parallelandifcollapse.ll<br>===================================================================<br>--- test/Transforms/SimplifyCFG/R600/parallelandifcollapse.ll<span class="apple-tab-span">                     <span class="Apple-converted-space"> </span></span>(revision 0)<br>+++ test/Transforms/SimplifyCFG/R600/parallelandifcollapse.ll<span class="apple-tab-span">                  <span class="Apple-converted-space"> </span></span>(revision 0)<br>@@ -0,0 +1,58 @@<br>+; Function Attrs: nounwind<br>+; RUN: opt<  %s -mtriple=r600-unknown-linux-gnu -simplifycfg -basicaa -S | FileCheck %s<br>+; CHECK: or i1<br>+; CHECK-NEXT: br<br>+; CHECK: br<br>+; CHECK: ret<o:p></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><span style="font-size: 6pt;"><br>Same thing here, a comment explaining what the expected output should be<br>would be helpful.<br><br><br><o:p></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><span style="font-size: 6pt;">+define void @_Z9chk1D_512v() #0 {<br>+entry:<br>+  %a0 = alloca i32, align 4<br>+  %b0 = alloca i32, align 4<br>+  %c0 = alloca i32, align 4<br>+  %d0 = alloca i32, align 4<br>+  %a1 = alloca i32, align 4<br>+  %b1 = alloca i32, align 4<br>+  %c1 = alloca i32, align 4<br>+  %d1 = alloca i32, align 4<br>+  %data = alloca i32, align 4<br>+  %0 = load i32* %a0, align 4<br>+  %1 = load i32* %b0, align 4<br>+  %cmp = icmp ne i32 %0, %1<br>+  br i1 %cmp, label %land.lhs.true, label %if.else<br>+<br>+land.lhs.true:                                    ; preds = %entry<br>+  %2 = load i32* %c0, align 4<br>+  %3 = load i32* %d0, align 4<br>+  %cmp1 = icmp ne i32 %2, %3<br>+  br i1 %cmp1, label %if.then, label %if.else<br>+<br>+if.then:                                          ; preds = %land.lhs.true<br>+  br label %if.end<br>+<br>+if.else:                                          ; preds = %land.lhs.true, %entry<br>+  store i32 1, i32* %data, align 4<br>+  br label %if.end<br>+<br>+if.end:                                           ; preds = %if.else, %if.then<br>+  %4 = load i32* %a1, align 4<br>+  %5 = load i32* %b1, align 4<br>+  %cmp2 = icmp ne i32 %4, %5<br>+  br i1 %cmp2, label %land.lhs.true3, label %if.else6<br>+<br>+land.lhs.true3:                                   ; preds = %if.end<br>+  %6 = load i32* %c1, align 4<br>+  %7 = load i32* %d1, align 4<br>+  %cmp4 = icmp ne i32 %6, %7<br>+  br i1 %cmp4, label %if.then5, label %if.else6<br>+<br>+if.then5:                                         ; preds = %land.lhs.true3<br>+  br label %if.end7<br>+<br>+if.else6:                                         ; preds = %land.lhs.true3, %if.end<br>+  store i32 1, i32* %data, align 4<br>+  br label %if.end7<br>+<br>+if.end7:                                          ; preds = %if.else6, %if.then5<br>+  ret void<br>+}<br>+<br>Index: include/llvm/Analysis/TargetTransformInfo.h<br>===================================================================<br>--- include/llvm/Analysis/TargetTransformInfo.h<span class="apple-tab-span">                  <span class="Apple-converted-space"> </span></span>(revision 183763)<br>+++ include/llvm/Analysis/TargetTransformInfo.h<span class="apple-tab-span">               <span class="Apple-converted-space"> </span></span>(working copy)<br>@@ -171,6 +171,9 @@<br>   /// comments for a detailed explanation of the cost values.<br>   virtual unsigned getUserCost(const User *U) const;<br><br>+  /// \brief hasBranchDivergence - Return true if branch divergence exists.<br>+  virtual bool hasBranchDivergence() const;<br>+<o:p></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><span style="font-size: 6pt;"><br>What exactly is branch divergence?  Could you explain this more in<br>the comment and give an example.<br><br><br><o:p></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><span style="font-size: 6pt;">   /// \brief Test whether calls to a function lower to actual program function<br>   /// calls.<br>   ///<br>Index: include/llvm/Transforms/Utils/Local.h<br>===================================================================<br>--- include/llvm/Transforms/Utils/Local.h<span class="apple-tab-span">      <span class="Apple-converted-space"> </span></span>(revision 184607)<br>+++ include/llvm/Transforms/Utils/Local.h<span class="apple-tab-span">   <span class="Apple-converted-space"> </span></span>(working copy)<br>@@ -39,6 +39,7 @@<br> class TargetLibraryInfo;<br> class TargetTransformInfo;<br> class DIBuilder;<br>+class AliasAnalysis;<br><br> template<typename T>  class SmallVectorImpl;<br><br>@@ -136,7 +137,7 @@<br> /// the basic block that was pointed to.<br> ///<br> bool SimplifyCFG(BasicBlock *BB, const TargetTransformInfo&TTI,<br>-                 const DataLayout *TD = 0);<br>+                 const DataLayout *TD = 0, AliasAnalysis * AA = 0);<br><br> /// FoldBranchToCommonDest - If this basic block is ONLY a setcc and a branch,<br> /// and if a predecessor branches to us and one of our successors, fold the<br>Index: tools/opt/opt.cpp<br>===================================================================<br>--- tools/opt/opt.cpp<span class="apple-tab-span">                <span class="Apple-converted-space"> </span></span>(revision 183763)<br>+++ tools/opt/opt.cpp<span class="apple-tab-span">             <span class="Apple-converted-space"> </span></span>(working copy)<br>@@ -668,6 +668,9 @@<br>     FPasses.reset(new FunctionPassManager(M.get()));<br>     if (TD)<br>       FPasses->add(new DataLayout(*TD));<br>+    if (TM.get())<br>+      TM->addAnalysisPasses(*FPasses);<br>+<br>   }<br><br>   if (PrintBreakpoints) {<br>Index: lib/Analysis/TargetTransformInfo.cpp<br>===================================================================<br>--- lib/Analysis/TargetTransformInfo.cpp<span class="apple-tab-span">      <span class="Apple-converted-space"> </span></span>(revision 183763)<br>+++ lib/Analysis/TargetTransformInfo.cpp<span class="apple-tab-span">   <span class="Apple-converted-space"> </span></span>(working copy)<br>@@ -88,6 +88,8 @@<br>   return PrevTTI->getUserCost(U);<br> }<br><br>+bool TargetTransformInfo::hasBranchDivergence() const { return false; }<br>+<br> bool TargetTransformInfo::isLoweredToCall(const Function *F) const {<br>   return PrevTTI->isLoweredToCall(F);<br> }<br>Index: lib/Target/R600/AMDGPUTargetMachine.cpp<br>===================================================================<br>--- lib/Target/R600/AMDGPUTargetMachine.cpp<span class="apple-tab-span">                <span class="Apple-converted-space"> </span></span>(revision 183763)<br>+++ lib/Target/R600/AMDGPUTargetMachine.cpp<span class="apple-tab-span">             <span class="Apple-converted-space"> </span></span>(working copy)<br>@@ -105,6 +105,18 @@<br>   return new AMDGPUPassConfig(this, PM);<br> }<br><br>+//===----------------------------------------------------------------------===//<br>+// AMDGPU Analysis Pass Setup<br>+//===----------------------------------------------------------------------===//<br>+<br>+void AMDGPUTargetMachine::addAnalysisPasses(PassManagerBase&PM) {<br>+  // Add first the target-independent BasicTTI pass, then our AMDGPU pass. This<br>+  // allows the AMDGPU pass to delegate to the target independent layer when<br>+  // appropriate.<br>+  PM.add(createBasicTargetTransformInfoPass(getTargetLowering()));<br>+  PM.add(createAMDGPUTargetTransformInfoPass(this));<o:p></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><span style="font-size: 6pt;"><br>I'm a little confused about why we need to add a BasicTargetTransformInfo and<br>also an AMDGPUTargetTransformInfo.  What is the reason for this?<br><br><br><o:p></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><span style="font-size: 6pt;">+}<br>+<br> bool<br> AMDGPUPassConfig::addPreISel() {<br>   const AMDGPUSubtarget&ST = TM->getSubtarget<AMDGPUSubtarget>();<br>Index: lib/Target/R600/AMDGPUTargetTransformInfo.cpp<br>===================================================================<br>--- lib/Target/R600/AMDGPUTargetTransformInfo.cpp<span class="apple-tab-span">      <span class="Apple-converted-space"> </span></span>(revision 0)<br>+++ lib/Target/R600/AMDGPUTargetTransformInfo.cpp<span class="apple-tab-span">   <span class="Apple-converted-space"> </span></span>(revision 0)<o:p></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><span style="font-size: 6pt;"><br>It looks like you forgot to add this new file to CMakeLists.txt.<br><br><br><o:p></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><span style="font-size: 6pt;">@@ -0,0 +1,90 @@<br>+//===-- AMDGPUTargetTransformInfo.cpp - AMDGPU specific TTI pass<br>+//----------------===//<br>+//<br>+//                     The LLVM Compiler Infrastructure<br>+//<br>+// This file is distributed under the University of Illinois Open Source<br>+// License. See LICENSE.TXT for details.<br>+//<br>+//===----------------------------------------------------------------------===//<br>+/// \file<br>+/// This file implements a TargetTransformInfo analysis pass specific to the<br>+/// AMDGPU target machine. It uses the target's detailed information to provide<br>+/// more precise answers to certain TTI queries, while letting the target<br>+/// independent and default TTI implementations handle the rest.<br>+///<br>+//===----------------------------------------------------------------------===//<br>+<br>+#define DEBUG_TYPE "AMDGPUtti"<br>+#include "AMDGPU.h"<br>+#include "AMDGPUTargetMachine.h"<br>+#include "llvm/Analysis/TargetTransformInfo.h"<br>+#include "llvm/Support/Debug.h"<br>+#include "llvm/Target/TargetLowering.h"<br>+#include "llvm/Target/CostTable.h"<br>+using namespace llvm;<br>+<br>+// Declare the pass initialization routine locally as target-specific passes<br>+// don't have a target-wide initialization entry point, and so we rely on the<br>+// pass constructor initialization.<br>+namespace llvm {<br>+void initializeAMDGPUTTIPass(PassRegistry&);<br>+}<br>+<br>+namespace {<br>+<br>+class AMDGPUTTI : public ImmutablePass, public TargetTransformInfo {<br>+  const AMDGPUTargetMachine *TM;<br>+  const AMDGPUSubtarget *ST;<br>+  const AMDGPUTargetLowering *TLI;<br>+<br>+  /// Estimate the overhead of scalarizing an instruction. Insert and Extract<br>+  /// are set if the result needs to be inserted and/or extracted from vectors.<br>+  unsigned getScalarizationOverhead(Type *Ty, bool Insert, bool Extract) const;<br>+<br>+public:<br>+  AMDGPUTTI() : ImmutablePass(ID), TM(0), ST(0), TLI(0) {<br>+    llvm_unreachable("This pass cannot be directly constructed");<br>+  }<br>+<br>+  AMDGPUTTI(const AMDGPUTargetMachine *TM)<br>+      : ImmutablePass(ID), TM(TM), ST(TM->getSubtargetImpl()),<br>+        TLI(TM->getTargetLowering()) {<br>+    initializeAMDGPUTTIPass(*PassRegistry::getPassRegistry());<br>+  }<br>+<br>+  virtual void initializePass() { pushTTIStack(this); }<br>+<br>+  virtual void finalizePass() { popTTIStack(); }<br>+<br>+  virtual void getAnalysisUsage(AnalysisUsage&AU) const {<br>+    TargetTransformInfo::getAnalysisUsage(AU);<br>+  }<br>+<br>+  /// Pass identification.<br>+  static char ID;<br>+<br>+  /// Provide necessary pointer adjustments for the two base classes.<br>+  virtual void *getAdjustedAnalysisPointer(const void *ID) {<br>+    if (ID ==&TargetTransformInfo::ID)<br>+      return (TargetTransformInfo *)this;<br>+    return this;<br>+  }<br>+<br>+  virtual bool hasBranchDivergence() const;<br>+<br>+  /// @}<br>+};<br>+<br>+} // end anonymous namespace<br>+<br>+INITIALIZE_AG_PASS(AMDGPUTTI, TargetTransformInfo, "AMDGPUtti",<br>+                   "AMDGPU Target Transform Info", true, true, false)<br>+char AMDGPUTTI::ID = 0;<br>+<br>+ImmutablePass *<br>+llvm::createAMDGPUTargetTransformInfoPass(const AMDGPUTargetMachine *TM) {<br>+  return new AMDGPUTTI(TM);<br>+}<br>+<br>+bool AMDGPUTTI::hasBranchDivergence() const { return true; }<br>Index: lib/Target/R600/AMDGPUTargetMachine.h<br>===================================================================<br>--- lib/Target/R600/AMDGPUTargetMachine.h<span class="apple-tab-span">                    <span class="Apple-converted-space"> </span></span>(revision 183763)<br>+++ lib/Target/R600/AMDGPUTargetMachine.h<span class="apple-tab-span">                 <span class="Apple-converted-space"> </span></span>(working copy)<br>@@ -63,6 +63,9 @@<br>   }<br>   virtual const DataLayout *getDataLayout() const { return&Layout; }<br>   virtual TargetPassConfig *createPassConfig(PassManagerBase&PM);<br>+<br>+  /// \brief Register R600 analysis passes with a pass manager.<br>+  virtual void addAnalysisPasses(PassManagerBase&PM);<br> };<br><br> } // End namespace llvm<br>Index: lib/Target/R600/AMDGPU.h<br>===================================================================<br>--- lib/Target/R600/AMDGPU.h<span class="apple-tab-span">                   <span class="Apple-converted-space"> </span></span>(revision 183763)<br>+++ lib/Target/R600/AMDGPU.h<span class="apple-tab-span">                <span class="Apple-converted-space"> </span></span>(working copy)<br>@@ -46,6 +46,10 @@<br> FunctionPass *createAMDGPUIndirectAddressingPass(TargetMachine&tm);<br> FunctionPass *createAMDGPUISelDag(TargetMachine&tm);<br><br>+/// \brief Creates an AMDGPU-specific Target Transformation Info pass.<br>+ImmutablePass *<br>+    createAMDGPUTargetTransformInfoPass(const AMDGPUTargetMachine *TM);<br>+<br> extern Target TheAMDGPUTarget;<br><br> } // End namespace llvm<br>Index: lib/Transforms/Utils/SimplifyCFG.cpp<br>===================================================================<br>--- lib/Transforms/Utils/SimplifyCFG.cpp<span class="apple-tab-span">     <span class="Apple-converted-space"> </span></span>(revision 184607)<br>+++ lib/Transforms/Utils/SimplifyCFG.cpp<span class="apple-tab-span">  <span class="Apple-converted-space"> </span></span>(working copy)<br>@@ -17,8 +17,10 @@<br> #include "llvm/ADT/STLExtras.h"<br> #include "llvm/ADT/SetVector.h"<br> #include "llvm/ADT/SmallPtrSet.h"<br>+#include "llvm/ADT/SmallSet.h"<br> #include "llvm/ADT/SmallVector.h"<br> #include "llvm/ADT/Statistic.h"<br>+#include "llvm/Analysis/AliasAnalysis.h"<br> #include "llvm/Analysis/InstructionSimplify.h"<br> #include "llvm/Analysis/TargetTransformInfo.h"<br> #include "llvm/Analysis/ValueTracking.h"<br>@@ -63,6 +65,10 @@<br> HoistCondStores("simplifycfg-hoist-cond-stores", cl::Hidden, cl::init(true),<br>        cl::desc("Hoist conditional stores if an unconditional store preceeds"));<br><br>+static cl::opt<bool><br>+    ParallelAndOr("simplifycfg-parallel-and-or", cl::Hidden, cl::init(true),<br>+                  cl::desc("Use parallel-and-or mode for branch conditions"));<br>+<br> STATISTIC(NumBitMaps, "Number of switch instructions turned into bitmaps");<br> STATISTIC(NumLookupTables, "Number of switch instructions turned into lookup tables");<br> STATISTIC(NumSinkCommons, "Number of common instructions sunk down to the end block");<br>@@ -88,6 +94,7 @@<br> class SimplifyCFGOpt {<br>   const TargetTransformInfo&TTI;<br>   const DataLayout *const TD;<br>+  AliasAnalysis *AA;<br><br>   Value *isValueEqualityComparison(TerminatorInst *TI);<br>   BasicBlock *GetValueEqualityComparisonCases(TerminatorInst *TI,<br>@@ -105,10 +112,25 @@<br>   bool SimplifyIndirectBr(IndirectBrInst *IBI);<br>   bool SimplifyUncondBranch(BranchInst *BI, IRBuilder<>  &Builder);<br>   bool SimplifyCondBranch(BranchInst *BI, IRBuilder<>&Builder);<br>+  /// \brief Use parallel-and or parallel-or to generate conditions for<br>+  /// conditional branches.<br>+  bool SimplifyParallelAndOr(BasicBlock *BB, IRBuilder<>  &Builder, Pass *P = 0);<br>+  /// \brief If \param BB is the merge block of an if-region, attempt to merge<br>+  /// the if-region with an adjacent if-region upstream if two if-regions<br>+  /// contain identical instructions.<br>+  bool MergeIfRegion(BasicBlock *BB, IRBuilder<>  &Builder, Pass *P = 0);<br>+  /// \brief Compare a pair of blocks: \param Block1 and \param Block2, which<br>+  /// are from two if-regions whose head blocks are \param Head1 and \param<br>+  /// Head2.  \returns true if \param Block1 and \param Block2 contain identical<br>+  /// instructions, and none of the instructions alias with \param Head2.<br>+  /// This is used as a legality check for merging if-regions.<br>+  bool CompareBlock(BasicBlock *Head1, BasicBlock *Head2, BasicBlock *Block1,<br>+                    BasicBlock *Block2);<br><br> public:<br>-  SimplifyCFGOpt(const TargetTransformInfo&TTI, const DataLayout *TD)<br>-      : TTI(TTI), TD(TD) {}<br>+  SimplifyCFGOpt(const TargetTransformInfo&TTI, const DataLayout *TD,<br>+                 AliasAnalysis *aa)<br>+      : TTI(TTI), TD(TD), AA(aa) {}<br>   bool run(BasicBlock *BB);<br> };<br> }<br>@@ -195,8 +217,8 @@<br> }<br><br><br>-/// GetIfCondition - Given a basic block (BB) with two predecessors (and at<br>-/// least one PHI node in it), check to see if the merge at this block is due<br>+/// GetIfCondition - Given a basic block (BB) with two predecessors,<br>+/// check to see if the merge at this block is due<br> /// to an "if condition".  If so, return the boolean condition that determines<br> /// which entry into BB will be taken.  Also, return by references the block<br> /// that will be entered from if the condition is true, and the block that will<br>@@ -206,12 +228,31 @@<br> /// instructions in them.<br> static Value *GetIfCondition(BasicBlock *BB, BasicBlock *&IfTrue,<br>                              BasicBlock *&IfFalse) {<br>-  PHINode *SomePHI = cast<PHINode>(BB->begin());<br>-  assert(SomePHI->getNumIncomingValues() == 2&&<br>-         "Function can only handle blocks with 2 predecessors!");<br>-  BasicBlock *Pred1 = SomePHI->getIncomingBlock(0);<br>-  BasicBlock *Pred2 = SomePHI->getIncomingBlock(1);<br>+  PHINode *SomePHI = dyn_cast<PHINode>(BB->begin());<br>+  BasicBlock *Pred1 = NULL;<br>+  BasicBlock *Pred2 = NULL;<br><br>+  if (SomePHI) {<br>+    if (SomePHI->getNumIncomingValues() != 2)<br>+       return NULL;<br>+    Pred1 = SomePHI->getIncomingBlock(0);<br>+    Pred2 = SomePHI->getIncomingBlock(1);<br>+  } else {<br>+    SmallSetVector<BasicBlock *, 16>  Preds(pred_begin(BB), pred_end(BB));<br>+    for (SmallSetVector<BasicBlock *, 16>::iterator PI = Preds.begin(),<br>+                                                    PE = Preds.end();<br>+         PI != PE; ++PI) {<br>+      if (Pred1 == NULL)<br>+        Pred1 = *PI;<br>+      else if (Pred2 == NULL)<br>+        Pred2 = *PI;<br>+      else<br>+        return NULL;<br>+    }<br>+    if (!Pred1 || !Pred2)<br>+       return NULL;<br>+  }<br>+<br>   // We can only handle branches.  Other control flow will be lowered to<br>   // branches if possible anyway.<br>   BranchInst *Pred1Br = dyn_cast<BranchInst>(Pred1->getTerminator());<br>@@ -4039,6 +4080,402 @@<br>   return false;<br> }<br><br>+/// If \param [in] BB has more than one predecessor that is a conditional<br>+/// branch, attempt to use parallel and/or for the branch condition. \returns<br>+/// true on success.<br>+///<br>+/// Before:<br>+///   ......<br>+///   %cmp10 = fcmp une float %tmp1, %tmp2<br>+///   br i1 %cmp1, label %if.then, label %lor.rhs<br>+///<br>+/// lor.rhs:<br>+///   ......<br>+///   %cmp11 = fcmp une float %tmp3, %tmp4<br>+///   br i1 %cmp11, label %if.then, label %ifend<br>+///<br>+/// if.end:  // the merge block<br>+///   ......<br>+///<br>+/// if.then: // has two predecessors, both of them contains conditional branch.<br>+///   ......<br>+///   br label %if.end;<br>+///<br>+/// After:<br>+///  ......<br>+///  %cmp10 = fcmp une float %tmp1, %tmp2<br>+///  ......<br>+///  %cmp11 = fcmp une float %tmp3, %tmp4<br>+///  %cmp12 = or i1 %cmp10, %cmp11    // parallel-or mode.<br>+///  br i1 %cmp12, label %if.then, label %ifend<br>+///<br>+///  if.end:<br>+///    ......<br>+///<br>+///  if.then:<br>+///    ......<br>+///    br label %if.end;<br>+///<br>+///  Current implementation handles two cases.<br>+///  Case 1: \param BB is on the else-path.<br>+///<br>+///          BB1<br>+///        /     |<br>+///       BB2    |<br>+///      /   \   |<br>+///     BB3   \  |     where, BB1, BB2 contain conditional branches.<br>+///      \    |  /     BB3 contains unconditional branch.<br>+///       \   | /      BB4 corresponds to \param BB which is also the merge.<br>+///  BB =>  BB4<br>+///<br>+///<br>+///  Corresponding source code:<br>+///<br>+///  if (a == b&&  c == d)<br>+///    statement; // BB3<br>+///<br>+///  Case 2: \param BB BB is on the then-path.<br>+///<br>+///             BB1<br>+///          /      |<br>+///         |      BB2<br>+///         \    /    |  where BB1, BB2 contain conditional branches.<br>+///  BB =>    BB3      |  BB3 contains unconditiona branch and corresponds<br>+///           \     /    to \param BB.  BB4 is the merge.<br>+///             BB4<br>+///<br>+///  Corresponding source code:<br>+///<br>+///  if (a == b || c == d)<br>+///    statement;  // BB3<br>+///<br>+///  In both cases,  \param BB is the common successor of conditional branches.<br>+///  In Case 1, \param BB (BB4) has an unconditional branch (BB3) as<br>+///  its predecessor.  In Case 2, \param BB (BB3) only has conditional branches<br>+///  as its predecessors.<br>+///<br>+bool SimplifyCFGOpt::SimplifyParallelAndOr(BasicBlock *BB, IRBuilder<>  &Builder,<br>+                                           Pass *P) {<br>+  PHINode *PHI = dyn_cast<PHINode>(&BB->front());<br>+  if (PHI)<br>+    return false; // For simplicity, avoid cases containing PHI nodes.<br>+<br>+  BasicBlock *LCond = NULL;<br>+  BasicBlock *FCond = NULL;<br>+  BasicBlock *UCond = NULL;<br>+  int Idx = -1;<br>+<br>+  SmallSetVector<BasicBlock *, 16>  Preds(pred_begin(BB), pred_end(BB));<br>+  // Check predecessors of \param BB.<br>+  for (SmallSetVector<BasicBlock *, 16>::iterator PI = Preds.begin(),<br>+                                                  PE = Preds.end();<br>+       PI != PE; ++PI) {<br>+    BasicBlock *Pred = *PI;<br>+    TerminatorInst *PTI = Pred->getTerminator();<br>+    BranchInst *PBI = dyn_cast<BranchInst>(PTI);<br>+<br>+    // All predecessors should terminate with a branch.<br>+    if (!PBI)<br>+      return false;<br>+<br>+    BasicBlock *PP = Pred->getSinglePredecessor();<br>+<br>+    if (PBI->isUnconditional()) {<br>+      // Case 1: Pred (BB3) is an unconditional block, it should<br>+      // have a single successor and a single predecessor (BB2) that<br>+      // is also a predecessor of \param BB (BB4) and should not have<br>+      // address-taken.  There should exist only one such unconditional<br>+      // branch among the predecessors.<br>+      if (UCond || !PP || (Preds.count(PP) == 0) ||<br>+          (PTI->getNumSuccessors() != 1) || Pred->hasAddressTaken())<br>+        return false;<br>+<br>+      UCond = Pred;<br>+      continue;<br>+    }<br>+<br>+    // Only conditional branches are allowed beyond this point.<br>+    if (!PBI->isConditional())<br>+      return false;<br>+<br>+    // Condition's unique use should be the branch instruction.<br>+    Value *PC = PBI->getCondition();<br>+    if (!PC || (PC->getNumUses() != 1))<br>+      return false;<br>+<br>+    if (PP&&  (Preds.count(PP) != 0)) {<br>+      // These are internal condition blocks to be merged from, e.g.,<br>+      // BB2 in both cases.<br>+      // Should not be address-taken.<br>+      if (Pred->hasAddressTaken())<br>+        return false;<br>+<br>+      // Instructions in the internal condition blocks should be safe<br>+      // to hoist up.<br>+      for (BasicBlock::iterator BI = Pred->begin(), BE = PBI; BI != BE;) {<br>+        Instruction *CI = BI++;<br>+        if (isa<PHINode>(CI) || CI->mayHaveSideEffects() ||<br>+            !isSafeToSpeculativelyExecute(CI))<br>+          return false;<br>+      }<br>+    } else {<br>+      // This is the condition block to be merged into, e.g. BB1 in<br>+      // both cases.<br>+      if (FCond)<br>+        return false;<br>+      FCond = Pred;<br>+    }<br>+<br>+    // The terminator must have exactly two successors.<br>+    if (PTI->getNumSuccessors() != 2)<br>+      return false;<br>+<br>+    // Find whether BB is uniformly on the true (or false) path<br>+    // for all of its predecessors.<br>+    BasicBlock *PS1 = PTI->getSuccessor(0);<br>+    BasicBlock *PS2 = PTI->getSuccessor(1);<br>+    BasicBlock *PS = (PS1 == BB) ? PS2 : PS1;<br>+    int CIdx = (PS1 == BB) ? 0 : 1;<br>+<br>+    if (Idx == -1)<br>+      Idx = CIdx;<br>+    else if (CIdx != Idx)<br>+      return false;<br>+<br>+    // PS is the successor which is not BB. Check successors to identify<br>+    // the last conditional branch.<br>+    if (Preds.count(PS) == 0) {<br>+      // Case 2.<br>+      // BB must have an unique successor.<br>+      TerminatorInst *TBB = BB->getTerminator();<br>+      if (TBB->getNumSuccessors() != 1)<br>+        return false;<br>+<br>+      BasicBlock *SBB = TBB->getSuccessor(0);<br>+      PHI = dyn_cast<PHINode>(&SBB->front());<br>+      if (PHI)<br>+        return false;<br>+<br>+      // PS (BB4) should be BB's successor.<br>+      if (SBB != PS)<br>+        return false;<br>+      LCond = Pred;<br>+    } else {<br>+      TerminatorInst *TPS = PS->getTerminator();<br>+      BranchInst *BPS = dyn_cast<BranchInst>(TPS);<br>+      if (BPS&&  BPS->isUnconditional()) {<br>+        // Case 1: PS(BB3) should be an unconditional branch.<br>+        LCond = Pred;<br>+      }<br>+    }<br>+  }<br>+<br>+  if (FCond&&  LCond&&  (FCond != LCond)) {<br>+    // Do the transformation.<br>+    BasicBlock *CB;<br>+    bool ITER = true;<br>+    BasicBlock::iterator ItOld = Builder.GetInsertPoint();<br>+    TerminatorInst *PTI = FCond->getTerminator();<br>+    BranchInst *PBI = dyn_cast<BranchInst>(PTI);<br>+    Value *PC = PBI->getCondition();<br>+    do {<br>+      CB = PBI->getSuccessor(1 - Idx);<br>+      // Delete the conditional branch.<br>+      FCond->getInstList().pop_back();<br>+      FCond->getInstList().splice(FCond->end(), CB->getInstList());<br>+      PTI = FCond->getTerminator();<br>+      PBI = dyn_cast<BranchInst>(PTI);<br>+      Value *CC = PBI->getCondition();<br>+      // Merge conditions.<br>+      Builder.SetInsertPoint(PTI);<br>+      Value *NC;<br>+      if (Idx == 0)<br>+        // Case 2, use parallel or.<br>+        NC = Builder.CreateOr(PC, CC);<br>+      else<br>+        // Case 1, use parallel and.<br>+        NC = Builder.CreateAnd(PC, CC);<br>+<br>+      PBI->replaceUsesOfWith(CC, NC);<br>+      PC = NC;<br>+      if (CB == LCond)<br>+        ITER = false;<br>+      // Remove internal conditional branches.<br>+      CB->dropAllReferences();<br>+      // make CB unreachable and let downstream to delete the block.<br>+      new UnreachableInst(CB->getContext(), CB);<br>+    } while (ITER);<br>+<br>+    Builder.SetInsertPoint(ItOld);<br>+    DEBUG(dbgs()<<  "Use parallel and/or in:\n"<<  *FCond);<br>+    return true;<br>+  }<br>+<br>+  return false;<br>+}<br>+<br>+/// Compare blocks from two if-regions, where \param Head1 is the head of the<br>+/// 1st if-region. \param Head2 is the head of the 2nd if-region. \param<br>+/// Block1 is a block in the 1st if-region to compare. \param Block2 is a block<br>+//  in the 2nd if-region to compare.  \returns true if the blocks have identical<br>+/// instructions that do not alias with \param Head2 and it is legal to merge<br>+/// the two blocks so that only one instance of each instruction is kept.<br>+///<br>+bool SimplifyCFGOpt::CompareBlock(BasicBlock *Head1, BasicBlock *Head2,<br>+                                  BasicBlock *Block1, BasicBlock *Block2) {<br>+  TerminatorInst *PTI2 = Head2->getTerminator();<br>+  Instruction *PBI2 = Head2->begin();<br>+<br>+  if (Block1 == Head1) {<br>+    if (Block2 != Head2)<br>+      return false;<br>+  } else if (Block2 == Head2)<br>+    return false;<br>+  else {<br>+    // Check whether instructions in Block1 and Block2 are identical<br>+    // and do not alias with instructions in Head2.<br>+    BasicBlock::iterator iter1 = Block1->begin();<br>+    BasicBlock::iterator end1 = Block1->getTerminator();<br>+    BasicBlock::iterator iter2 = Block2->begin();<br>+    BasicBlock::iterator end2 = Block2->getTerminator();<br>+<br>+    while (1) {<br>+      if (iter1 == end1) {<br>+        if (iter2 != end2)<br>+          return false;<br>+        break;<br>+      }<br>+<br>+      if (!iter1->isIdenticalTo(iter2))<br>+        return false;<br>+<br>+      // Illegal to remove instructions with side effects except<br>+      // non-volatile stores.<br>+      if (iter1->mayHaveSideEffects()) {<br>+        Instruction *CurI =&*iter1;<br>+        StoreInst *SI = dyn_cast<StoreInst>(CurI);<br>+        if (!SI || SI->isVolatile())<br>+          return false;<br>+      }<br>+<br>+      // For simplicity and speed, data dependency check can be<br>+      // avoided if read from memory doesn't exist.<br>+      if (iter1->mayReadFromMemory())<br>+        return false;<br>+<br>+      if (iter1->mayWriteToMemory()) {<br>+        for (BasicBlock::iterator BI = PBI2, BE = PTI2; BI != BE; ++BI) {<br>+          if (BI->mayReadFromMemory() || BI->mayWriteToMemory()) {<br>+            // Check alias with Head2.<br>+            if (!AA || AA->alias(iter1, BI))<br>+              return false;<br>+          }<br>+        }<br>+      }<br>+      ++iter1;<br>+      ++iter2;<br>+    }<br>+    ;<br>+  }<br>+<br>+  return true;<br>+}<br>+<br>+/// Check whether \param BB is the merge block of a if-region.  If yes, check<br>+/// whether there exists an adjacent if-region upstream, the two if-regions<br>+/// contain identical instuctions and can be legally merged.  \returns true if<br>+/// the two if-regions are merged.<br>+///<br>+/// From:<br>+/// if (a)<br>+///   statement;<br>+/// if (b)<br>+///   statement;<br>+///<br>+/// To:<br>+/// if (a || b)<br>+///   statement;<br>+///<br>+bool SimplifyCFGOpt::MergeIfRegion(BasicBlock *BB, IRBuilder<>  &Builder,<br>+                                   Pass *P) {<br>+  BasicBlock *IfTrue2, *IfFalse2;<br>+  Value *IfCond2 = GetIfCondition(BB, IfTrue2, IfFalse2);<br>+  if (!IfCond2)<br>+    return false;<br>+<br>+  Instruction *CInst2 = dyn_cast<Instruction>(IfCond2);<br>+  if (!CInst2)<br>+    return false;<br>+<br>+  BasicBlock *Head2 = CInst2->getParent();<br>+  if (Head2->hasAddressTaken())<br>+    return false;<br>+<br>+  BasicBlock *IfTrue1, *IfFalse1;<br>+  Value *IfCond1 = GetIfCondition(Head2, IfTrue1, IfFalse1);<br>+  if (!IfCond1)<br>+    return false;<br>+<br>+  Instruction *CInst1 = dyn_cast<Instruction>(IfCond1);<br>+  if (!CInst1)<br>+    return false;<br>+<br>+  BasicBlock *Head1 = CInst1->getParent();<br>+<br>+  // Either then-path or else-path should be empty.<br>+  if ((IfTrue1 != Head1)&&  (IfFalse1 != Head1))<br>+    return false;<br>+  if ((IfTrue2 != Head2)&&  (IfFalse2 != Head2))<br>+    return false;<br>+<br>+  TerminatorInst *PTI2 = Head2->getTerminator();<br>+  Instruction *PBI2 = Head2->begin();<br>+<br>+  if (!CompareBlock(Head1, Head2, IfTrue1, IfTrue2))<br>+    return false;<br>+<br>+  if (!CompareBlock(Head1, Head2, IfFalse1, IfFalse2))<br>+    return false;<br>+<br>+  // Check whether \param Head2 has side-effect and is safe to speculate.<br>+  for (BasicBlock::iterator BI = PBI2, BE = PTI2; BI != BE; ++BI) {<br>+    Instruction *CI = BI;<br>+    if (isa<PHINode>(CI) || CI->mayHaveSideEffects() ||<br>+        !isSafeToSpeculativelyExecute(CI))<br>+      return false;<br>+  }<br>+<br>+  // Merge \param Head2 into \param Head1.<br>+  Head1->getInstList().pop_back();<br>+  Head1->getInstList().splice(Head1->end(), Head2->getInstList());<br>+  TerminatorInst *PTI = Head1->getTerminator();<br>+  BranchInst *PBI = dyn_cast<BranchInst>(PTI);<br>+  Value *CC = PBI->getCondition();<br>+  BasicBlock::iterator ItOld = Builder.GetInsertPoint();<br>+  Builder.SetInsertPoint(PTI);<br>+  Value *NC = Builder.CreateOr(CInst1, CC);<br>+  PBI->replaceUsesOfWith(CC, NC);<br>+  Builder.SetInsertPoint(ItOld);<br>+<br>+  // Remove IfTrue1<br>+  if (IfTrue1 != Head1) {<br>+    IfTrue1->dropAllReferences();<br>+    IfTrue1->eraseFromParent();<br>+  }<br>+<br>+  // Remove IfFalse1<br>+  if (IfFalse1 != Head1) {<br>+    IfFalse1->dropAllReferences();<br>+    IfFalse1->eraseFromParent();<br>+  }<br>+<br>+  // Remove \param Head2<br>+  Head2->dropAllReferences();<br>+  Head2->eraseFromParent();<br>+  DEBUG(dbgs()<<  "If conditions merged into:\n"<<  *Head1);<br>+  return true;<br>+}<br>+<br> /// Check if passing a value to an instruction will cause undefined behavior.<br> static bool passingValueIsAlwaysUndefined(Value *V, Instruction *I) {<br>   Constant *C = dyn_cast<Constant>(V);<br>@@ -4142,6 +4579,10 @@<br><br>   IRBuilder<>  Builder(BB);<br><br>+  if (ParallelAndOr&&  TTI.hasBranchDivergence()&&<br>+      SimplifyParallelAndOr(BB, Builder))<br>+    return true;<br>+<br>   // If there is a trivial two-entry PHI node in this basic block, and we can<br>   // eliminate it, do so now.<br>   if (PHINode *PN = dyn_cast<PHINode>(BB->begin()))<br>@@ -4169,6 +4610,9 @@<br>     if (SimplifyIndirectBr(IBI)) return true;<br>   }<br><br>+  if (ParallelAndOr&&  TTI.hasBranchDivergence()&&  MergeIfRegion(BB, Builder))<br>+    return true;<br>+<br>   return Changed;<br> }<br><br>@@ -4178,6 +4622,6 @@<br> /// of the CFG.  It returns true if a modification was made.<br> ///<br> bool llvm::SimplifyCFG(BasicBlock *BB, const TargetTransformInfo&TTI,<br>-                       const DataLayout *TD) {<br>-  return SimplifyCFGOpt(TTI, TD).run(BB);<br>+                       const DataLayout *TD, AliasAnalysis *AA) {<br>+  return SimplifyCFGOpt(TTI, TD, AA).run(BB);<br> }<br>Index: lib/Transforms/Scalar/SimplifyCFGPass.cpp<br>===================================================================<br>--- lib/Transforms/Scalar/SimplifyCFGPass.cpp<span class="apple-tab-span">                   <span class="Apple-converted-space"> </span></span>(revision 184607)<br>+++ lib/Transforms/Scalar/SimplifyCFGPass.cpp<span class="apple-tab-span">                <span class="Apple-converted-space"> </span></span>(working copy)<br>@@ -27,6 +27,7 @@<br> #include "llvm/ADT/SmallVector.h"<br> #include "llvm/ADT/Statistic.h"<br> #include "llvm/Analysis/TargetTransformInfo.h"<br>+#include "llvm/Analysis/AliasAnalysis.h"<br> #include "llvm/IR/Attributes.h"<br> #include "llvm/IR/Constants.h"<br> #include "llvm/IR/DataLayout.h"<br>@@ -49,10 +50,14 @@<br><br>     virtual bool runOnFunction(Function&F);<br><br>-    virtual void getAnalysisUsage(AnalysisUsage&AU) const {<br>-      AU.addRequired<TargetTransformInfo>();<br>-    }<br>-  };<br>+  virtual void getAnalysisUsage(AnalysisUsage&AU) const {<br>+    AU.addRequired<TargetTransformInfo>();<br>+    AU.addRequired<AliasAnalysis>();<br>+    AU.addRequired<AliasAnalysis>();<br>+  }<br>+private:<br>+  AliasAnalysis *AA;<br>+};<br> }<br><br> char CFGSimplifyPass::ID = 0;<br>@@ -301,7 +306,7 @@<br> /// iterativelySimplifyCFG - Call SimplifyCFG on all the blocks in the function,<br> /// iterating until no more changes are made.<br> static bool iterativelySimplifyCFG(Function&F, const TargetTransformInfo&TTI,<br>-                                   const DataLayout *TD) {<br>+                                   const DataLayout *TD, AliasAnalysis * AA) {<br>   bool Changed = false;<br>   bool LocalChange = true;<br>   while (LocalChange) {<br>@@ -310,7 +315,7 @@<br>     // Loop over all of the basic blocks and remove them if they are unneeded...<br>     //<br>     for (Function::iterator BBIt = F.begin(); BBIt != F.end(); ) {<br>-      if (SimplifyCFG(BBIt++, TTI, TD)) {<br>+      if (SimplifyCFG(BBIt++, TTI, TD, AA)) {<br>         LocalChange = true;<br>         ++NumSimpl;<br>       }<br>@@ -324,11 +329,12 @@<br> // simplify the CFG.<br> //<br> bool CFGSimplifyPass::runOnFunction(Function&F) {<br>+  AA =&getAnalysis<AliasAnalysis>();<br>   const TargetTransformInfo&TTI = getAnalysis<TargetTransformInfo>();<br>   const DataLayout *TD = getAnalysisIfAvailable<DataLayout>();<br>   bool EverChanged = removeUnreachableBlocksFromFn(F);<br>   EverChanged |= mergeEmptyReturnBlocks(F);<br>-  EverChanged |= iterativelySimplifyCFG(F, TTI, TD);<br>+  EverChanged |= iterativelySimplifyCFG(F, TTI, TD, AA);<br><br>   // If neither pass changed anything, we're done.<br>   if (!EverChanged) return false;<br>@@ -342,7 +348,7 @@<br>     return true;<br><br>   do {<br>-    EverChanged = iterativelySimplifyCFG(F, TTI, TD);<br>+    EverChanged = iterativelySimplifyCFG(F, TTI, TD, AA);<br>     EverChanged |= removeUnreachableBlocksFromFn(F);<br>   } while (EverChanged);<br><br>Index: lib/CodeGen/BasicTargetTransformInfo.cpp<br>===================================================================<br>--- lib/CodeGen/BasicTargetTransformInfo.cpp<span class="apple-tab-span">                    <span class="Apple-converted-space"> </span></span>(revision 183763)<br>+++ lib/CodeGen/BasicTargetTransformInfo.cpp<span class="apple-tab-span">                 <span class="Apple-converted-space"> </span></span>(working copy)<br>@@ -63,6 +63,8 @@<br>     return this;<br>   }<br><br>+  virtual bool hasBranchDivergence() const;<br>+<br>   /// \name Scalar TTI Implementations<br>   /// @{<br><br>@@ -122,6 +124,7 @@<br>   return new BasicTTI(TLI);<br> }<br><br>+bool BasicTTI::hasBranchDivergence() const { return false; }<br><br> bool BasicTTI::isLegalAddImmediate(int64_t imm) const {<br>   return TLI->isLegalAddImmediate(imm);<o:p></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><span style="font-size: 6pt;"><br><br><br><br>_______________________________________________<br>llvm-commits mailing list<br><a href="mailto:llvm-commits@cs.uiuc.edu" style="color: purple; text-decoration: underline;">llvm-commits@cs.uiuc.edu</a><br><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><o:p></o:p></span></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><span style="font-size: 6pt;"><br><br><br><br>_______________________________________________<br>llvm-commits mailing list<br><a href="mailto:llvm-commits@cs.uiuc.edu" style="color: purple; text-decoration: underline;">llvm-commits@cs.uiuc.edu</a><br><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><o:p></o:p></span></div></div></div><div style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: 'Times New Roman', serif;"><o:p> </o:p></div></div><span><odc_v5></span></div></blockquote></div><br></body></html>