submit [PATCH] SimplifyCFG for code review

Fri Jun 21 00:31:40 PDT 2013

Sent from my iPad

On Jun 20, 2013, at 10:38 PM, "Ye, Mei" <Mei.Ye at amd.com> wrote:

> Hi Evan
>  
> This transformation reduces branches.  It can benefit architectures with deep pipelines, less powerful branch predictors, or branch divergence on GPUs.
> I did have a GPU benchmark that shows roughly 1.5X performance improvement.

Yes, I except this can make improvements to GPUs. The problem is the same transformation can hurt many CPUs. 

>  
> But on the other hand, there is probably very few optimizations that can benefit all architectures.  And it is also unrealistic to have performance measurement on all architectures to justify an optimization item.    What I am seeing is that compiler vendors have a tendency to push codes into their target space as much as possible, often at the expense of code quality that minimizes code-sharing and increases compilation time.   
>  
> There is definitely a need to enable target-specific tuning in machine-independent optimizations.  Is there a guide line on a good approach to do this?  I have seen some cases that rely on threshold tuning, which can be non-deterministic and therefore unreliable.

LLVM's philosophy on this is a bit different from other compilers. The transformation passes fall under two general categories: 1. Optimizations, 2. Canonicalization. Unfortunately the optimization pipeline currently doesn't make a good distinction of what passes fall in which category. SimplifyCFG is *mostly* about canonicalization. That is, it's meant to transform llvm IR which expose more optimization opportunities. I think it is not an ideal place for the proposed transformation. 

The approach I would take is a separate height reduction pass. It should be a late pass that is run before codegen (think codegenprep). You can introduce target hooks that allow each target to specify when is the transformation profitable for the given target. If the community can't decide on the appropriate target hooks, then at least you can insert the pass into the pipeline for your target of interests. As it is, I don't think the patch would be acceptable without having proven it provides benefit to at least the most widely used CPU targets (e.g. ARM, x86). 

Evan

>  
> -Mei    
>  
>  
>  
>  
> From: Evan Cheng [mailto:evan.cheng at apple.com] 
> Sent: Thursday, June 20, 2013 4:51 PM
> To: Ye, Mei
> Cc: Sean Silva; llvm-commits at cs.uiuc.edu
> Subject: Re: submit [PATCH] SimplifyCFG for code review
>  
> Hi Mei,
>  
> I have concerns about the general approach. It's not clear to me this kind of transformation is generally a win on all modern architectures. As an example from your code:
>  
> +/// Before:
> +///   ......
> +///   %cmp10 = fcmp une float %tmp1, %tmp2
> +///   br i1 %cmp1, label %if.then, label %lor.rhs
> +///
> +/// lor.rhs:
> +///   ......
> +///   %cmp11 = fcmp une float %tmp3, %tmp4
> +///   br i1 %cmp11, label %if.then, label %ifend
> +///
> +/// if.end:  // the merge block
> +///   ......
> +///
> +/// if.then: // has two predecessors, both of them contains conditional branch.
> +///   ......
> +///   br label %if.end;
> +///
> +/// After:
> +///  ......
> +///  %cmp10 = fcmp une float %tmp1, %tmp2
> +///  ......
> +///  %cmp11 = fcmp une float %tmp3, %tmp4
> +///  %cmp12 = or i1 %cmp10, %cmp11    // parallel-or mode.
> +///  br i1 %cmp12, label %if.then, label %ifend
> +///
> +///  if.end:
> +///    ......
> +/// 
> +///  if.then:
> +///    ......
> +///    br label %if.end;
>  
> The pre-transformed code is probably going to run faster than the transformed code on a modern Intel x86 processor. On ARM processors, saving the value of conditional registers and or'ing them is probably also quite expensive. I also suspect the transformation will prohibit the target from some if-conversion opportunities.
>  
> Have you thought about having targets providing cost information to the pass? Alternatively this may be done as a codegen pass?
>  
> Evan
>  
>  
> On Jun 20, 2013, at 4:26 PM, "Ye, Mei" <Mei.Ye at amd.com> wrote:
> 
> 
> Hi Sean
>  
> See the attached new patch, which fixes all issues brought up by you.
> I ran “clang-format” on whole files which makes more diffs.  Hopefully this is OK with you.
> It will be nice if the check-in process can automatically run “clang-format”.
>  
> -Mei
>  
> From: Sean Silva [mailto:silvas at purdue.edu] 
> Sent: Tuesday, June 18, 2013 4:48 PM
> To: Ye, Mei
> Cc: llvm-commits at cs.uiuc.edu
> Subject: Re: submit [PATCH] SimplifyCFG for code review
>  
>  
>  
> 
> On Tue, Jun 18, 2013 at 4:27 PM, Ye, Mei <Mei.Ye at amd.com> wrote:
> Hi Sean
>  
> There are some considerations that go into this test sample to create an opportunity for the transformation.
> The test case should run *only* the simplifycfg pass (I see now that you have `-O3` in your test case; please remove `-O3`). You can probably just write a simple .ll function with IR that looks like the "Before" case in your documentation comment. 
>  
> Also,
>  
> +; RUN: opt < %s -O3 -simplifycfg -S | grep br | count 2
> +%struct.float2_tag = type { float, float }
>  
> Please use FileCheck for all new tests. Using grep is deprecated.
>  
> -- Sean Silva
> <odc_simplifycfg>_______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130621/06c07434/attachment.html>