[LLVMdev] Make LoopBase inherit from "RegionBase"?

Wed Jan 20 09:05:50 PST 2010

>>> bbs, to generate these edges. As we do not have to modify the CFG other
>>> passes like dominance information are still valid, and we do not have to
>>> create a lot of auxiliary bbs, to be able to detect all regions. This
>>> saves memory and runtime. In general it is probably not too easy to
>>> decide where to insert these bbs either:
>>
>> The general rule is to split all blocks with multiple in-edges and multiple out-edges
>> into blocks with either multiple in-edges or multiple out-edges, but not both.
> This is not sufficient, as shown in the example below. It would allow
> only one region.

Yes, but with the insertion of merge-blocks it will allow more (see below).

>> One option is to keep this as an invariant throughout the compiler and make use
>> of the merge blocks (multiple in-edges) to contain only PHI-nodes, and all other code
>> in regular basic blocks. There are different variations on this that may or may not be
>> useful.
> This might be possible, however probably doubling the number of bbs.

I don't know if it will be double, but there will be more basic blocks for sure.

>>>> CFG:
>>>        0
>>>        |
>>>        1
>>>       / |
>>>      2   |
>>>     / \   3
>>>    4   5  |
>>>    |   |  |
>>>    6   7  8
>>>     \  | /
>>>      \ |/       region A: 1 -> 9 {1,2,3,4,5,6,7,8}
>>>        9        region B: 2 -> 9 {2,4,5,6,7}
>>>
>>> So we need one bb that joins 6 and 7 and one that joins the two regions
>>>
>>> CFG:    0
>>>        |
>>>        1
>>>       / |
>>>      2   |
>>>     / \   3
>>>    4   5  |
>>>    |   |  |
>>>    6   7  8
>>>     \  |  |
>>>      \ |  |     region A: (0,1) -> (9b,9)  {1,2,3,4,5,6,7,8,9a,9b}
>>>       9a  |     region B: (1,2) -> (9a,9b) {2,4,5,6,7,9a}
>>>        \ /
>>>         9b
>>>         |
>>>         9
>>
>> It is fairly simple to use the information from the algorithm to decide
>> where those merges should be inserted to get the expected regions.
>> This may be needed in the cases where a sub-region is too complicated
>> to be represented and must be abstracted into a "black box".
> From which algorithm?  The program structure tree does not give this
> information, does it?

The algorithm that computes the SESE-regions can be used to determine
where the merge-nodes should be inserted. There are a couple of ways
of doing it, but if the bracket sets on two eges can be intersected to
match a third edge (which dominates the first two), you can insert a
merge block for the two edges. You don't have to compute the
dominators, but it helps to explain the problem that way.

>>>> My approach is comparable to this paper:
>>>> The Refined Process Structure Tree by JussiVanhatalo, Hagen Völzer,
>>>> Jana Koehler
>>>
>>> I was looking through some slides that described their algorithm. One case that
>>> seems to be legal is this:
>>>
>>>       |
>>>      Entry
>>>     /    \
>>>   R0      R1
>>>     \     /
>>>      Exit
>>>        |
>>>
>>> With two fragments: Entry->R0->Exit and Entry->R1->Exit, which means
>>> that a fragment cannot be identified using only the entry and exit blocks, but
>>> the internal blocks or edges will also need to be listed. I don't know if this is
>>> relevant to your implementation.
>
> No. The ideas are comparable, however I believe their implementation is
> a little complicated. ;-)

Do you have the same definition of a region and entry/exit blocks as they do?

> I would mark the regions as:
>
> Region A: R0 -> Exit, containing {R0}
> Region B: R1 -> Exit, containing {R1}

Is the entry always contained and is the exit never contained, or is that specified
per region? Depending on the restrictions of entry and exit blocks a loop with a single
basic block cannot be an entry or exit by itself. Example:

    |
    A
   /| _
  / |/ \
 B  R  |
  \ |\_/
   \|
    C
    |

If you only care about R in this case how is the region formed?

>>> The implementation however takes advantage of the existence of
>>> Dominance/PostDominance information. Therefore it is simpler and
>>> hopefully faster. At the moment run time is comparable to dominance tree
>>> calculation.
>>
>> Both algorithms are linear so there is really no big difference in time imo.
> Sure. However in terms of maintainability it is nice to be able to reuse
> existing analysis instead of write another triconnected component
> analysis upfront.
>
>> I believe the biggest difference that you mention is that you can capture more
>> complicated regions without having to modify the CFG with the current
>> algorithm.
> Yes.
>
>>> If you want, have a look into some results I got with a pass extracting
>>> maximal non trivial regions that do not contain loops from libbz2:
>>>
>>> http://tobias.osaft.eu/llvm/region/bz2NoLoops/
>>>
>>> Interesting ones:
>>>
>>> regions_without_loops.BZ2_bzBuffToBuffCompress.dot.png
>>> has a lot of exit edges.
>>
>> I think this example proves the strengths and weaknesses of both
>> approaches. Making that region into structured control flow would add a lot
>> of additional blocks. This will also happen after generating code
>> from the polyhedral model, so either way the cost is there if the optimization
>> is successful.
> Yes, but just in this case and not for regions where the model cannot be
> applied. If the regions pass is used for analysis purposes, nothing has
> to be touched.
>
>> The second  case is where the optimization fails (no profitable
>> transformation found) and the CFG can remain untouched. 
>>
>> The third case is if one of those blocks contains something complicated.
>> I believe the current algorithm simply fails and cannot detect the region.
> Which algorithm? The one in Graphite? Or the region detection I wrote
> here? This is just plain region detection, that does not even look at
> the content but builds a region tree (program structure tree). It just
> detects every possible region.
> The selection would be a later pass.

My assumption was that there is a selection in there somewhere.
Do you plan to refine the regions in the selection phase in any way?

>> If the
>> CFG is modified this would allow an internal SESE-region to become a black box, and the
>> the outer regions could be optimized.
>This is an optimization, however I think it is orthogonal to the region
>detection problem. Say it works with any algorithm.

I believe that creating a black-box will map a lot more cleanly to an edge-based
region definition, since block-based may include multiple entry/exit sub-regions
that will not encapsulate control flow in a reasonable way.

>>> regions_without_loops.bzopen_or_bzdopen.dot.png
>>> the first region has two entry edges. One is the loop latch.
>>> (Keep in mind all regions have the same color, so if it seems there is
>>> an edge into a region, there are just two regions close by)
>>>
>>> Without a prepass that exposes the edges almost no region could be
>>> detected with the "standard" approach.
>>
>> Indeed the CFG will have to be modified for these cases. I it seems to me that the trade-off
>> between the two approaches is that the algorithm that you currently have is a cheaper up
>> front, but may be less capable in some cases, while the "standard" algorithm will be more
>> expensive, but can handle problematic regions better. Would you agree?
>
> I agree that the algorithm I have is cheaper upfront, but I do not yet
> see a case where the algorithm is less capable. Would you mind to give
> an example or to highlight the relevant part of the discussion?

With the insertion of extra merge-blocks the code becomes more structured and the PST
can be refined further. A more fine-grained PST may allow more cases to be handled. 

Thanks

Jan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100120/6ce1a3c3/attachment.html>