[patch] simplifyCFG for review

Sun Jul 28 15:10:22 PDT 2013

Hi Nick

The transformation is not expensive.  It is early-inlining that bloats code.  So each added new pass will process a significant amount of codes.   I agree that this is a design mishap inside my organization.   I will re-factor the code following your suggestion.  If the check-in process can't be finished next week,  please allow some delay since my sabbatical is coming.

-Mei

From: Andrew Trick [mailto:atrick at apple.com]
Sent: Sunday, July 28, 2013 2:43 AM
To: Ye, Mei
Cc: Matt Arsenault; Evan Cheng; llvm commits; Chandler Carruth; Koenig, Christian
Subject: Re: [patch] simplifyCFG for review

On Jul 27, 2013, at 12:17 PM, "Ye, Mei" <Mei.Ye at amd.com<mailto:Mei.Ye at amd.com>> wrote:

Hi all

Code re-factoring and re-org to separate canonicalization and optimization sounds like a good long-term solution.   These two functionalities aren't always well separated in existing "generic" passes.  From time to time, there are demands to fine-tune "generic" pass for underlying targets, e.g., CSE can cause register pressure, strength reduction must consider target's addressing mode.   It is probably a no-win battle to argue whether certain opts are "generic" enough.

W.R.T Andy's question on ordering of MergeIfRegion and SimplifyParallelAndOr, if the optimization bails out and iterates whenever a change of CFG happens, then there is no ordering concern.   If one iteration can do invoke more than one transformations (which can improve compilation time), then it makes sense to put SimplifyParallelAndOr before MergeIfRegion, since the former can expose opportunities for the latter.  An example is that the if-region is inside a loop and the loop is completely unrolled, instances of if-regions can be merged after the height of conditions are reduced.

Mei,

It does not appear that you need to interleave your new transforms with the standard SimplifyCFG transforms. Making them a utility for use in target passes would add a lot of flexibility.

The only issue driving the less flexible design then is compile time. I think I misunderstood the issue, which you and Christian have just clarified. It's not that SimplifyCFG pass carries a significant cost (I wouldn't expect it to), it's actually that your transformation itself is costly, and you want to apply it before inlining.

If my assertions above are true, then Christian's suggestion was excellent if it can be made to work. Here's a concrete proposal:

- Move SimplifyParallelAndOr and MergeIfRegion either into a new file Transforms/Utils/FlattenCFG.cpp. Or pick a better filename. I actually think what you're doing is branch folding, but that name is taken.

- Expose the SimplifyParallelAndOr and MergeIfRegion in Transforms/Utils/FlattenCFG.hpp.

- Create a simple pass in Target/R600 to run your utils on the CFG. Invoke it the target pass pipe--maybe addCodeGenPrepare(). I don't see much value in sharing the pass driver code, and it can always be factored later.

- Measure the compile-time impact. How bad is it really? If it's a problem, your current approach won't work in the long-term anyway because we plan to move OptimizeCFG outside of CGSCC (after inlining).

- If you need to run FlattenCFG stuff before inlining, follow Christian's suggestion.

Incidentally, it seems logical to me that you should be able implement inlining as a function pass provided by TargetMachine. I know that's verboten in the initial passmanager, but by the time we reach codegen, all the function bodies should be complete. Then somehow forcing a CGSCC pass manager would allow you to visit functions in call-tree bottom-up order. It's likely this is impossible though, and you'd need to split your target pass pipeline with a CGSCC inliner in the middle.

-Andy

Thanks.

-Mei

From: Andrew Trick [mailto:atrick at apple.com<http://apple.com>]
Sent: Saturday, July 27, 2013 1:56 AM
To: Ye, Mei
Cc: Matt Arsenault; Evan Cheng; llvm commits; Chandler Carruth
Subject: Re: [patch] simplifyCFG for review

On Jul 27, 2013, at 1:30 AM, Chandler Carruth <chandlerc at google.com<mailto:chandlerc at google.com>> wrote:

On Sat, Jul 27, 2013 at 1:07 AM, Andrew Trick <atrick at apple.com<mailto:atrick at apple.com>> wrote:
(1) In terms of code organization, these anti-canonical, target-selected transforms should live somewhere else. I kept quiet becase I hadn't come up with an alternative, but we can do that now. OptimizeCFG.cpp?

It would be even more self-evident to group all of these type of branch avoidance utils into a FlattenCFG package. So I'm changing my vote to FlattenCFG.cpp.

To focus on the immediate issue: I agree.

Design wise, I would suggest one step further: I think that CFG-flattening of this form is somewhat specialized and we should just build a nice specialized set of optimizations targetting that use case, and have the targets add those passes from their target machine rather than monkey with the general purpose PMB. We can't easily get this right in the PMB because of the silly way CGSCC stuff is defined, and I think that this is likely to be best as a late-stage CFG pass anyways not unlike LSR, etc. I'm not sure that it really has anything to do with SimplifyCFG (or OptimizeCFG, which I've begun to think is somewhat likely to make more sense in MI w/ register pressure and critical path info). So, my vote is for a more targeted tool in the toolbox.

Mei,

You seem to have a reason for running SimplifyParallelAndOr after the basic cleanup transforms but before branch simplification. Whereas MergeIfRegion runs only after all other simplifications. That does seem intuitive to me, but I wonder if it is absolutely necessary. Can you illustrate with an example or two why the simplifications need to be interleaved this way? Are you just trying to avoid the need for another round of iteration in the SimplifyCFGPass driver? I think the direction of design refactoring depends on it.

-Andy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130728/0143ccb5/attachment.html>