[LLVMdev] Add a function splitting pass to LLVM which extracts cold regions into their own functions

Mon May 21 16:44:25 PDT 2012

On 05/21/2012 10:36 PM, Andrew Trick wrote:
> Tobias,
>
> Thanks for taking the time to summarize all this. It's a great writeup. I'm moving the thread to llvm-dev. My responses below.
>

[..]

>> Name             DomTree  PostDomTree DomFrontier RegionTree
>> SPEC 2006        1.109    0.911       0.525       0.662
>> Polyhedron.com   0.034    0.029       0.016       0.022
>>
>> On these examples DomFrontier and RegionTree calculation is very fast. Also, I used RegionInfo for over two years in my daily Polly work. And I know people who test Polly on their larger internal test suites. I got _never_ any complains about RegionInfo speed issues. So RegionInfo is for now good enough for my work.
>>
>> I also heard that RegionInfo is used in some commercial compilers, but did not hear of any performance problems there.
>>
>> Reason 2) does not mean I am not in favor of improving RegionInfo. Very much in contrast, I support any work that makes RegionInfo useable on the LLVM machine cfgs or that allows it to be used as a clang default analysis pass. I just raised that point to explain why I did not spend a large amount of time on 1)
>> =======================================================================
>
> That's interesting. It looks like PostDomTree is the only thing we're really concerned with pulling in. If RegionInfo can be further improved, that's great, but I'm confident that any algorithm that does what you're doing will require PostDomTree.

I thought the reason DomFrontier is a problem was the worst case 
quadratic complexity. I have never seen a case triggering that behavior, 
but I must admit I did not specifically look for it.

> As a side note, in a perfectly efficient compiler, we should only have to pay for either LoopInfo or Dominators, not both, since one can be efficiently computed from the other. However, I think we're always stuck computing PostDomTree from scratch whenever we need to determine control dependence, or regions based on it.

Yes, probably the PostDomtree or something equally expensive is 
necessary. I believe in case a transformation needs detailed control 
regions, we need to see if the benefits it gives are worth the added 
analysis overhead. In case we can get around calculating detailed 
control regions, we should definitely try to go this way.

>> Andrew:
>>> I'm dismayed that you depend on RegionInfo, but it is understandable.>  We could probably improve RegionInfo to be more efficient and
>>> self-contained (no DomFrontier), but I think we'd be much better off
>>> dropping RegionInfo and handling SEME regions. AFAIK CodeExtractor
>>> can already handle that. Is there any particular reason you need
>>> RegionInfo other than finding a connected set of blocks dominated by>  a cold block? Can FunctionSplitter just find its own regions?
>>
>> Andrew, can you define what you mean by SEME region? As written above, RegionInfo already detects SEME regions with a single entry block and multiple exit edges that terminate at the same block. What is the _exact_ kind of SEME region you are talking about? Does it require single entry edge or do you allow various entry edges if they terminate at the same basic block?
>>
>> Also, is there a good algorithm available to detect such regions. I am very much in favor of replacing the existing RegionInfo algorithm with something faster _and_ more generic.
>
> I was calling your regions SESE, otherwise I need a new acronym!
>
> I'm not considering the number of entry or exit edges, because restricting them to end in the same block gives you the same properties as a single block entry/exit AFAICT.

Yes. It is trivial to transform between them, but it is a lot easier to 
detect SESE regions if they have just a single entry and exit edge.

> I'm not aware of a better algorithm for building RegionInfo when one of the goals is to represent nested control dependence.

Even without. I did not find a way to check for two given basic blocks, 
if they form a SESE region without using DomFrontier (just using DomTree 
and PostDomTree). It is trivial if there is a single entry and a single 
exit edge, but for multiple edges (ending at the same block) I could not 
find a good solution.

(Solving this problem, would remove the requirement for DomFrontier in 
RegionInfo)

> I don't know if we need a standalone SEME RegionInfo pass. Beyond LoopInfo, I don't know what criteria would be used to pick the regions, but I haven't read the papers you mention below.
> Some passes, like FunctionSplitting, may have ad-hoc profile-driven region formation logic. For them, the Region API is just a way to package a SEME-block set of connected blocks.
>
>> Also, supporting SEME regions with one entry block would be great
>> as this would make the LoopInfo tree a subtree of the RegionTree. So
>> a LoopTree would be a simple RegionTree that is already available by default. Also, the code extractor
>
> Yes, exactly.
>
>> Chandler:
>>> I'm dismayed that RegionInfo is still in the tree and not being
>>> improved. ;] I think we should fix RegionInfo to be efficient and
>>> reasonably well implemented. It as *already* handling SEME regions
>>> except for the "ME" case of function returns (something that is
>>> trivial to fix).
>>
>> I explained above, why RegionInfo is good enough for our cases and why improving it is not straightforward. Advice how to make it better is highly welcome.
>
> I would have to do a lot more homework before I could suggest improving RegionInfo. DomFrontiers seemed unnecessary, but if you say it's an efficient and easy to reason about way of handling CFGs that aren't already neatly nested, then I believe you.

I said it's good enough for my uses. Different tools may have different 
requirements.

> The problem I was pointing out is that FunctionSplitting does not need to be informed by control dependence. So it's pulling in a costly analysis (PostDominators), and the only thing it gets in return is an unnecessary limitation. FunctionSplitting should be able to work on any region that LoopInfo can detect.

Sure. If FunctionSplitting can be written without using the RegionInfo 
analysis, avoiding the (re)calculation of it is the way to go. To me it 
seems RegionInfo gives more details. It can e.g. give separate regions 
for the if/else parts of a condition. If this level of detail is needed, 
should to be evaluated. For this it seems to be good to define generic 
regions that can either be calculated based on LoopInfo or on 
RegionInfo. Like this the same pass can be used with different analysis 
depending on how much details are required.

>> As explained above, they have a very low resolution.
>>
>> Andrew:
>>> Maybe you just want a neat Region iterator API utility that could be
>>> used by anyone doing region discovery, including RegionInfo. But you
>>> wouldn't need to pull in RegionInfo analysis.
>>
>> Sounds like a good idea.
>>
>> Chandler:
>>> It would be better to have such clarification in documentation,
>>> ideally in the header files of these passes. This is a very "meta"
>>> point, but it is frustrating to contributors to have arbitrary
>>> restrictions materialize only after designing an optimization. I
>>> would rather that people are aware of the constraints they need to
>>> work within if they want to implement a particular optimization pass.
>>> Essentially, if there are analyses or passes which are known to be
>>> unacceptable for the normal compilation pipeline, I think that would
>>> be an important thing to mention in the high-level comments for the
>>> pass. (It is possible that I missed such comments, or that I never
>>> looked in the right place, but I feel like the fact that dom-frontier
>>> and region-info is essentially on the chopping block should be more
>>> clear than it currently is... ;]
>>
>> True. We should clearly document which passes are acceptable in the default optimization chain.
>
> One (hopefully obvious) rule of thumb. When looking at the cost/benefit of a new transformation, the compile time must include time spent in  analysis passes that now get pulled in at a new point in the standard pass order. If you're pulling in a new Analysis, you need to be able to show that it solves an important problem for you in the most reasonably efficient way.

Yes, that's why I want to the RegionInfo to give high detail results. 
They are costly, but allow us to measure the benefit of a high 
resolution. When pulling passes in the default optimization chain, it 
obviously needs to be understood if the high resolution is necessary or 
if a faster, lower resolution analysis is the better option.

> There may be some analysis passes that are currently in use but are out-of-favor and we would like to remove. This is a collective "we", so my strategy is to look at commit logs and ask around. There are many cases in which I would have benefitted from more documentation. Stale docs hurt though.

Maybe we should just add your 'rule of thumb' to some documentation and 
mention the PostDom or RegionInfo pass as an example.

Tobi