[LLVMdev] Make LoopBase inherit from "RegionBase"?
Tobias Grosser
grosser at fim.uni-passau.de
Wed Jan 13 01:19:09 PST 2010
On 01/13/10 05:09, Jan Sjodin wrote:
> Hi Tobias
>
>> In general this approach saves a preliminary pass that has to insert new
>
>> bbs, to generate these edges. As we do not have to modify the CFG other
>> passes like dominance information are still valid, and we do not have to
>> create a lot of auxiliary bbs, to be able to detect all regions. This
>> saves memory and runtime. In general it is probably not too easy to
>> decide where to insert these bbs either:
>
> The general rule is to split all blocks with multiple in-edges and multiple out-edges
> into blocks with either multiple in-edges or multiple out-edges, but not both.
This is not sufficient, as shown in the example below. It would allow
only one region.
> One option is to keep this as an invariant throughout the compiler and make use
> of the merge blocks (multiple in-edges) to contain only PHI-nodes, and all other code
> in regular basic blocks. There are different variations on this that may or may not be
> useful.
This might be possible, however probably doubling the number of bbs.
>
>> CFG:
>> 0
>> |
>> 1
>> / |
>> 2 |
>> / \ 3
>> 4 5 |
>> | | |
>> 6 7 8
>> \ | /
>> \ |/ region A: 1 -> 9 {1,2,3,4,5,6,7,8}
>> 9 region B: 2 -> 9 {2,4,5,6,7}
>>
>> So we need one bb that joins 6 and 7 and one that joins the two regions
>>
>> CFG: 0
>> |
>> 1
>> / |
>> 2 |
>> / \ 3
>> 4 5 |
>> | | |
>> 6 7 8
>> \ | |
>> \ | | region A: (0,1) -> (9b,9) {1,2,3,4,5,6,7,8,9a,9b}
>> 9a | region B: (1,2) -> (9a,9b) {2,4,5,6,7,9a}
>> \ /
>> 9b
>> |
>> 9
>
> It is fairly simple to use the information from the algorithm to decide
> where those merges should be inserted to get the expected regions.
> This may be needed in the cases where a sub-region is too complicated
> to be represented and must be abstracted into a "black box".
>From which algorithm? The program structure tree does not give this
information, does it?
>> My approach is comparable to this paper:
>> The Refined Process Structure Tree by JussiVanhatalo, Hagen Völzer,
>> Jana Koehler
>
> I was looking through some slides that described their algorithm. One case that
> seems to be legal is this:
>
> |
> Entry
> / \
> R0 R1
> \ /
> Exit
> |
>
> With two fragments: Entry->R0->Exit and Entry->R1->Exit, which means
> that a fragment cannot be identified using only the entry and exit blocks, but
> the internal blocks or edges will also need to be listed. I don't know if this is
> relevant to your implementation.
No. The ideas are comparable, however I believe their implementation is
a little complicated. ;-)
I would mark the regions as:
Region A: R0 -> Exit, containing {R0}
Region B: R1 -> Exit, containing {R1}
>> The implementation however takes advantage of the existence of
>> Dominance/PostDominance information. Therefore it is simpler and
>> hopefully faster. At the moment run time is comparable to dominance tree
>> calculation.
>
> Both algorithms are linear so there is really no big difference in time imo.
Sure. However in terms of maintainability it is nice to be able to reuse
existing analysis instead of write another triconnected component
analysis upfront.
> I believe the biggest difference that you mention is that you can capture more
> complicated regions without having to modify the CFG with the current
> algorithm.
Yes.
>> If you want, have a look into some results I got with a pass extracting
>> maximal non trivial regions that do not contain loops from libbz2:
>>
>> http://tobias.osaft.eu/llvm/region/bz2NoLoops/
>>
>> Interesting ones:
>>
>> regions_without_loops.BZ2_bzBuffToBuffCompress.dot.png
>> has a lot of exit edges.
>
> I think this example proves the strengths and weaknesses of both
> approaches. Making that region into structured control flow would add a lot
> of additional blocks. This will also happen after generating code
> from the polyhedral model, so either way the cost is there if the optimization
> is successful.
Yes, but just in this case and not for regions where the model cannot be
applied. If the regions pass is used for analysis purposes, nothing has
to be touched.
> The second case is where the optimization fails (no profitable
> transformation found) and the CFG can remain untouched.
>
> The third case is if one of those blocks contains something complicated.
> I believe the current algorithm simply fails and cannot detect the region.
Which algorithm? The one in Graphite? Or the region detection I wrote
here? This is just plain region detection, that does not even look at
the content but builds a region tree (program structure tree). It just
detects every possible region.
The selection would be a later pass.
> If the
> CFG is modified this would allow an internal SESE-region to become a black box, and the
> the outer regions could be optimized.
This is an optimization, however I think it is orthogonal to the region
detection problem. Say it works with any algorithm.
>> regions_without_loops.bzopen_or_bzdopen.dot.png
>> the first region has two entry edges. One is the loop latch.
>> (Keep in mind all regions have the same color, so if it seems there is
>> an edge into a region, there are just two regions close by)
>>
>> Without a prepass that exposes the edges almost no region could be
>> detected with the "standard" approach.
>
> Indeed the CFG will have to be modified for these cases. I it seems to me that the trade-off
> between the two approaches is that the algorithm that you currently have is a cheaper up
> front, but may be less capable in some cases, while the "standard" algorithm will be more
> expensive, but can handle problematic regions better. Would you agree?
I agree that the algorithm I have is cheaper upfront, but I do not yet
see a case where the algorithm is less capable. Would you mind to give
an example or to highlight the relevant part of the discussion?
Thanks a lot
Tobias
More information about the llvm-dev
mailing list