[llvm-commits] PATCH: Add a function splitting pass to LLVM which extracts cold regions into their own functions
Tobias Grosser
tobias at grosser.es
Mon May 21 05:06:17 PDT 2012
Hi Andrew, hi Chandler,
sorry for jumping in a little late.
On 05/14/2012 10:52 PM, Andrew Trick wrote:
> On May 14, 2012, at 11:30 AM, Chandler Carruth <chandlerc at gmail.com
> <mailto:chandlerc at gmail.com>> wrote:
>
>> On Mon, May 14, 2012 at 11:23 AM, Andrew Trick <atrick at apple.com
>> <mailto:atrick at apple.com>> wrote:
>>
>>
>> On May 5, 2012, at 12:54 AM, Chandler Carruth <chandlerc at gmail.com
>> <mailto:chandlerc at gmail.com>> wrote:
>> > I haven't run benchmarks, and this patch doesn't turn the pass
>> on, I just want to get initial feedback, and get the code into the
>> shape where I can put it in-tree, and then look at turning it on
>> if benchmarks prove positive.
>>
>> Chandler,
>>
>> Very nice.
>>
>> I'm dismayed that you depend on RegionInfo, but it is
>> understandable. We could probably improve RegionInfo to be more
>> efficient and self-contained (no DomFrontier), but I think we'd be
>> much better off dropping RegionInfo and handling SEME regions.
First of all some information about the RegionInfo pass:
=======================================================================
The very first paper I was looking at, when implementing it was "The
Program Structure Tree: Computing Control Regions in Linear Time
(1994)". Nadav already mentioned this and it is also mentioned in the
"RegionInfo.h" file.
The algorithm described in this paper is very easy to implement.
However, it only calculates SESE regions, where SESE means a region with
a single entry and a single exit _edge_. The word _edge_ is here very
important, as the algorithm misses a lot of regions that could easily be
transformed to a SESE _edge_ region, but that are no SESE _edge_ regions.
Have a look at:
opt -view-regions-only -only-simple-regions
test/Analysis/RegionInfo/condition_complicated.ll
All regions in this examples would not be detected by the first
algorithm, because they have both multiple entry and multiple exit
edges. However, transforming them to regions with a single entry and a
single exit _edge_ is simple. We call regions that are not single entry
single exit edge regions, but can easily transformed to such regions
'refined' regions. Simple single entry singe exit _edge_ regions are
called 'simple'.
For my work on Polly, I wanted to detect both 'simple' and 'refined'
regions. The paper 'The Refined Process Structure Tree - Jussi
Vanhatalo, Hagen Voelyer, Jana Koehler - 2009' describes an algorithm
how to detect such regions in linear time. However, unfortunately, the
proposed algorithm is complex to implement (and I am not convinced it is
fast in the common cases).
At that point I started an IRC discussion about what to do and Chris
suggested to look for an dominance information based algorithm. I did
what he said and came up with the existing algorithm, which is, due to
taking advantage of dominance information, rather simple. Unfortunately,
I think I misunderstood Chris, as I later realized he did not want me to
use the DominanceFrontier analysis, but just the normal DominatorTree.
So why did I not change the algorithm and remove the use of
DominanceFrontier? There are two reasons:
1) It was and is not obvious for me how to do so
I did some analysis on SPEC and polyhedron.com and there are 3-8 times
more 'refined' regions than 'simple' regions [1]. For the use in Polly,
I wanted to retain the ability to optimize 'refined' regions.
Unfortunately, until today I was not aware of an algorithm that has
linear complexity, is in practice fast and can calculate 'refined' regions.
2) For many use cases the existing solution works perfectly fine
The RegionInfo analysis is obviously non-linear as it uses the
DominanceFrontier analysis. However, for me it works surprisingly well.
When I committed it I tested it on the LLVM test suite, polyhedron.com
and the SPEC 2006 benchmarks [1]. I got the following timings (in seconds):
Name DomTree PostDomTree DomFrontier RegionTree
SPEC 2006 1.109 0.911 0.525 0.662
Polyhedron.com 0.034 0.029 0.016 0.022
On these examples DomFrontier and RegionTree calculation is very fast.
Also, I used RegionInfo for over two years in my daily Polly work. And I
know people who test Polly on their larger internal test suites. I got
_never_ any complains about RegionInfo speed issues. So RegionInfo is
for now good enough for my work.
I also heard that RegionInfo is used in some commercial compilers, but
did not hear of any performance problems there.
Reason 2) does not mean I am not in favor of improving RegionInfo. Very
much in contrast, I support any work that makes RegionInfo useable on
the LLVM machine cfgs or that allows it to be used as a clang default
analysis pass. I just raised that point to explain why I did not spend a
large amount of time on 1)
=======================================================================
Andrew:
> I'm dismayed that you depend on RegionInfo, but it is understandable.
> We could probably improve RegionInfo to be more efficient and
> self-contained (no DomFrontier), but I think we'd be much better off
> dropping RegionInfo and handling SEME regions. AFAIK CodeExtractor
> can already handle that. Is there any particular reason you need
> RegionInfo other than finding a connected set of blocks dominated by
> a cold block? Can FunctionSplitter just find its own regions?
Andrew, can you define what you mean by SEME region? As written above,
RegionInfo already detects SEME regions with a single entry block and
multiple exit edges that terminate at the same block. What is the
_exact_ kind of SEME region you are talking about? Does it require
single entry edge or do you allow various entry edges if they terminate
at the same basic block?
Also, is there a good algorithm available to detect such regions. I am
very much in favor of replacing the existing RegionInfo algorithm with
something faster _and_ more generic.
Also, supporting SEME regions with one entry block would be great
as this would make the LoopInfo tree a subtree of the RegionTree. So
a LoopTree would be a simple RegionTree that is already available by
default. Also, the code extractor
Chandler:
> I'm dismayed that RegionInfo is still in the tree and not being
> improved. ;] I think we should fix RegionInfo to be efficient and
> reasonably well implemented. It as *already* handling SEME regions
> except for the "ME" case of function returns (something that is
> trivial to fix).
I explained above, why RegionInfo is good enough for our cases and why
improving it is not straightforward. Advice how to make it better is
highly welcome.
Andrew:
> There are two issues then, Region API vs. Region discovery.
>
> A convenient API for visiting and traversing regions is nice, though by
> design it's never needed in LLVM. A problem with the existing RegionInfo
> API is that it is superficially limited to SESE--though you say
> otherwise.
Not exactly SESE, but yes. I don't see a reason to widen the API to a
more generic SEME region, in case there is an algorithm to detect such
regions.
Andrew:
> That may be right for some clients, but then those clients
> likely don't need the region discovery that it provides. They can
> probably get by with finding control equivalent regions, which is
> trivial if you have postdominators--walk down in the dom tree and up in
> the posdom tree at the same time.
Are you referring to the SESE regions as defined in that paper:
"The Program Structure Tree: Computing Control Regions in Linear Time"
As explained above, they have a very low resolution.
Andrew:
> Maybe you just want a neat Region iterator API utility that could be
> used by anyone doing region discovery, including RegionInfo. But you
> wouldn't need to pull in RegionInfo analysis.
Sounds like a good idea.
Chandler:
> It would be better to have such clarification in documentation,
> ideally in the header files of these passes. This is a very "meta"
> point, but it is frustrating to contributors to have arbitrary
> restrictions materialize only after designing an optimization. I
> would rather that people are aware of the constraints they need to
> work within if they want to implement a particular optimization pass.
> Essentially, if there are analyses or passes which are known to be
> unacceptable for the normal compilation pipeline, I think that would
> be an important thing to mention in the high-level comments for the
> pass. (It is possible that I missed such comments, or that I never
> looked in the right place, but I feel like the fact that dom-frontier
> and region-info is essentially on the chopping block should be more
> clear than it currently is... ;]
True. We should clearly document which passes are acceptable in the
default optimization chain.
I also looked myself into papers again and discovered two interesting ones:
1) Simplified Computation and Generalization of
the Refined Process Structure Tree
Azrtem Polyvyanyy, Jussi Vanhatalo, and Hagen Völzer, 2010
A very recent paper, that gives us high resolution SESE regions
('simple' and 'refined' and even a little bit more) in linear time.
They state this algorithm is simpler than the previous. Basically, it
seems to be equivalent to calculate the treeconnected components.
2) Code Compaction of Matching Single-Entry Multiple-Exit Regions (2003)
Cached
Wen-ke Chen , Bengu Li , Rajiv Gupta, 2003
I missed this one earlier. They calculate SEME regions based on the
control dependence graph.
I am currently a little busy, but I like the idea of moving to SEME
regions and having a generic Region iterator API. We could test this API
with the current RegionInfo and the LoopInfo passes as region discovery
algorithms (one more precise, the other available by default). In case a
more generic and faster SEME region algorithm is implemented, the Region
iterator API users would automatically take advantage of it.
Cheers
Tobi
More information about the llvm-commits
mailing list