[llvm-commits] PATCH: Add a function splitting pass to LLVM which extracts cold regions into their own functions

Mon May 21 05:06:17 PDT 2012

Hi Andrew, hi Chandler,

sorry for jumping in a little late.

On 05/14/2012 10:52 PM, Andrew Trick wrote:
> On May 14, 2012, at 11:30 AM, Chandler Carruth <chandlerc at gmail.com
> <mailto:chandlerc at gmail.com>> wrote:
>
>> On Mon, May 14, 2012 at 11:23 AM, Andrew Trick <atrick at apple.com
>> <mailto:atrick at apple.com>> wrote:
>>
>>
>>     On May 5, 2012, at 12:54 AM, Chandler Carruth <chandlerc at gmail.com
>>     <mailto:chandlerc at gmail.com>> wrote:
>>     > I haven't run benchmarks, and this patch doesn't turn the pass
>>     on, I just want to get initial feedback, and get the code into the
>>     shape where I can put it in-tree, and then look at turning it on
>>     if benchmarks prove positive.
>>
>>     Chandler,
>>
>>     Very nice.
>>
>>     I'm dismayed that you depend on RegionInfo, but it is
>>     understandable. We could probably improve RegionInfo to be more
>>     efficient and self-contained (no DomFrontier), but I think we'd be
>>     much better off dropping RegionInfo and handling SEME regions.

First of all some information about the RegionInfo pass:

=======================================================================
The very first paper I was looking at, when implementing it was "The 
Program Structure Tree: Computing Control Regions in Linear Time 
(1994)". Nadav already mentioned this and it is also mentioned in the 
"RegionInfo.h" file.

The algorithm described in this paper is very easy to implement. 
However, it only calculates SESE regions, where SESE means a region with 
a single entry and a single exit _edge_. The word _edge_ is here very 
important, as the algorithm misses a lot of regions that could easily be 
transformed to a SESE _edge_ region, but that are no SESE _edge_ regions.

Have a look at:

opt -view-regions-only -only-simple-regions 
test/Analysis/RegionInfo/condition_complicated.ll

All regions in this examples would not be detected by the first 
algorithm, because they have both multiple entry and multiple exit 
edges. However, transforming them to regions with a single entry and a 
single exit _edge_ is simple. We call regions that are not single entry 
single exit edge regions, but can easily transformed to such regions 
'refined' regions. Simple single entry singe exit _edge_ regions are 
called 'simple'.

For my work on Polly, I wanted to detect both 'simple' and 'refined' 
regions. The paper 'The Refined Process Structure Tree - Jussi 
Vanhatalo, Hagen Voelyer, Jana Koehler - 2009' describes an algorithm 
how to detect such regions in linear time. However, unfortunately, the 
proposed algorithm is complex to implement (and I am not convinced it is 
fast in the common cases).

At that point I started an IRC discussion about what to do and Chris
suggested to look for an dominance information based algorithm. I did 
what he said and came up with the existing algorithm, which is, due to 
taking advantage of dominance information, rather simple. Unfortunately, 
I think I misunderstood Chris, as I later realized he did not want me to 
use the DominanceFrontier analysis, but just the normal DominatorTree.

So why did I not change the algorithm and remove the use of 
DominanceFrontier? There are two reasons:

1) It was and is not obvious for me how to do so

I did some analysis on SPEC and polyhedron.com and there are 3-8 times 
more 'refined' regions than 'simple' regions [1]. For the use in Polly, 
I wanted to retain the ability to optimize 'refined' regions.

Unfortunately, until today I was not aware of an algorithm that has 
linear complexity, is in practice fast and can calculate 'refined' regions.

2) For many use cases the existing solution works perfectly fine

The RegionInfo analysis is obviously non-linear as it uses the 
DominanceFrontier analysis. However, for me it works surprisingly well.
When I committed it I tested it on the LLVM test suite, polyhedron.com 
and the SPEC 2006 benchmarks [1]. I got the following timings (in seconds):

Name             DomTree  PostDomTree DomFrontier RegionTree
SPEC 2006        1.109    0.911       0.525       0.662
Polyhedron.com   0.034    0.029       0.016       0.022

On these examples DomFrontier and RegionTree calculation is very fast. 
Also, I used RegionInfo for over two years in my daily Polly work. And I 
know people who test Polly on their larger internal test suites. I got 
_never_ any complains about RegionInfo speed issues. So RegionInfo is 
for now good enough for my work.

I also heard that RegionInfo is used in some commercial compilers, but 
did not hear of any performance problems there.

Reason 2) does not mean I am not in favor of improving RegionInfo. Very 
much in contrast, I support any work that makes RegionInfo useable on 
the LLVM machine cfgs or that allows it to be used as a clang default 
analysis pass. I just raised that point to explain why I did not spend a 
large amount of time on 1)
=======================================================================

Andrew:
 > I'm dismayed that you depend on RegionInfo, but it is understandable. 
 > We could probably improve RegionInfo to be more efficient and
 > self-contained (no DomFrontier), but I think we'd be much better off
 > dropping RegionInfo and handling SEME regions. AFAIK CodeExtractor
 > can already handle that. Is there any particular reason you need
 > RegionInfo other than finding a connected set of blocks dominated by 
 > a cold block? Can FunctionSplitter just find its own regions?

Andrew, can you define what you mean by SEME region? As written above, 
RegionInfo already detects SEME regions with a single entry block and 
multiple exit edges that terminate at the same block. What is the 
_exact_ kind of SEME region you are talking about? Does it require 
single entry edge or do you allow various entry edges if they terminate 
at the same basic block?

Also, is there a good algorithm available to detect such regions. I am 
very much in favor of replacing the existing RegionInfo algorithm with 
something faster _and_ more generic.

Also, supporting SEME regions with one entry block would be great
as this would make the LoopInfo tree a subtree of the RegionTree. So
a LoopTree would be a simple RegionTree that is already available by 
default. Also, the code extractor

Chandler:
 > I'm dismayed that RegionInfo is still in the tree and not being
 > improved. ;] I think we should fix RegionInfo to be efficient and
 > reasonably well implemented. It as *already* handling SEME regions
 > except for the "ME" case of function returns (something that is
 > trivial to fix).

I explained above, why RegionInfo is good enough for our cases and why 
improving it is not straightforward. Advice how to make it better is 
highly welcome.

Andrew:
> There are two issues then, Region API vs. Region discovery.
>
> A convenient API for visiting and traversing regions is nice, though by
> design it's never needed in LLVM. A problem with the existing RegionInfo
> API is that it is superficially limited to SESE--though you say
> otherwise.

Not exactly SESE, but yes. I don't see a reason to widen the API to a 
more generic SEME region, in case there is an algorithm to detect such 
regions.

Andrew:
> That may be right for some clients, but then those clients
> likely don't need the region discovery that it provides. They can
> probably get by with finding control equivalent regions, which is
> trivial if you have postdominators--walk down in the dom tree and up in
> the posdom tree at the same time.

Are you referring to the SESE regions as defined in that paper:
"The Program Structure Tree: Computing Control Regions in Linear Time"

As explained above, they have a very low resolution.

Andrew:
> Maybe you just want a neat Region iterator API utility that could be
> used by anyone doing region discovery, including RegionInfo. But you
> wouldn't need to pull in RegionInfo analysis.

Sounds like a good idea.

Chandler:
 > It would be better to have such clarification in documentation,
 > ideally in the header files of these passes. This is a very "meta"
 > point, but it is frustrating to contributors to have arbitrary
 > restrictions materialize only after designing an optimization. I
 > would rather that people are aware of the constraints they need to
 > work within if they want to implement a particular optimization pass.
 > Essentially, if there are analyses or passes which are known to be
 > unacceptable for the normal compilation pipeline, I think that would
 > be an important thing to mention in the high-level comments for the
 > pass. (It is possible that I missed such comments, or that I never
 > looked in the right place, but I feel like the fact that dom-frontier
 > and region-info is essentially on the chopping block should be more
 > clear than it currently is... ;]

True. We should clearly document which passes are acceptable in the 
default optimization chain.

I also looked myself into papers again and discovered two interesting ones:

1) Simplified Computation and Generalization of
    the Refined Process Structure Tree
    Azrtem Polyvyanyy, Jussi Vanhatalo, and Hagen Völzer, 2010

A very recent paper, that gives us high resolution SESE regions 
('simple' and 'refined' and even a little bit more) in linear time.
They state this algorithm is simpler than the previous. Basically, it 
seems to be equivalent to calculate the treeconnected components.

2) Code Compaction of Matching Single-Entry Multiple-Exit Regions (2003)
    Cached
    Wen-ke Chen , Bengu Li , Rajiv Gupta, 2003

I missed this one earlier. They calculate SEME regions based on the 
control dependence graph.

I am currently a little busy, but I like the idea of moving to SEME 
regions and having a generic Region iterator API. We could test this API 
with the current RegionInfo and the LoopInfo passes as region discovery 
algorithms (one more precise, the other available by default). In case a 
more generic and faster SEME region algorithm is implemented, the Region 
iterator API users would automatically take advantage of it.

Cheers
Tobi