[LLVMdev] Make LoopBase inherit from "RegionBase"?

Tue Jan 12 20:09:10 PST 2010

Hi Tobias

> In general this approach saves a preliminary pass that has to insert new

> bbs, to generate these edges. As we do not have to modify the CFG other
> passes like dominance information are still valid, and we do not have to
> create a lot of auxiliary bbs, to be able to detect all regions. This
> saves memory and runtime. In general it is probably not too easy to
> decide where to insert these bbs either:

The general rule is to split all blocks with multiple in-edges and multiple out-edges
into blocks with either multiple in-edges or multiple out-edges, but not both.
One option is to keep this as an invariant throughout the compiler and make use
of the merge blocks (multiple in-edges) to contain only PHI-nodes, and all other code
in regular basic blocks. There are different variations on this that may or may not be
useful.

> CFG:
>        0
>        |
>        1
>       / |
>      2   |
>     / \   3
>    4   5  |
>    |   |  |
>    6   7  8
>     \  | /
>      \ |/       region A: 1 -> 9 {1,2,3,4,5,6,7,8}
>        9        region B: 2 -> 9 {2,4,5,6,7}
>
> So we need one bb that joins 6 and 7 and one that joins the two regions
>
> CFG:    0
>        |
>        1
>       / |
>      2   |
>     / \   3
>    4   5  |
>    |   |  |
>    6   7  8
>     \  |  |
>      \ |  |     region A: (0,1) -> (9b,9)  {1,2,3,4,5,6,7,8,9a,9b}
>       9a  |     region B: (1,2) -> (9a,9b) {2,4,5,6,7,9a}
>        \ /
>         9b
>         |
>         9

It is fairly simple to use the information from the algorithm to decide
where those merges should be inserted to get the expected regions.
This may be needed in the cases where a sub-region is too complicated
to be represented and must be abstracted into a "black box". 

> My approach is comparable to this paper:
> The Refined Process Structure Tree by JussiVanhatalo, Hagen Völzer,
> Jana Koehler

I was looking through some slides that described their algorithm. One case that
seems to be legal is this:

      |
     Entry
    /    \
  R0      R1
    \     /
     Exit
       |

With two fragments: Entry->R0->Exit and Entry->R1->Exit, which means
that a fragment cannot be identified using only the entry and exit blocks, but
the internal blocks or edges will also need to be listed. I don't know if this is
relevant to your implementation.

> The implementation however takes advantage of the existence of
> Dominance/PostDominance information. Therefore it is simpler and
> hopefully faster. At the moment run time is comparable to dominance tree
> calculation.

Both algorithms are linear so there is really no big difference in time imo. 
I believe the biggest difference that you mention is that you can capture more 
complicated regions without having to modify the CFG with the current
algorithm.

> If you want, have a look into some results I got with a pass extracting
> maximal non trivial regions that do not contain loops from libbz2:
> 
> http://tobias.osaft.eu/llvm/region/bz2NoLoops/
> 
> Interesting ones:
> 
> regions_without_loops.BZ2_bzBuffToBuffCompress.dot.png
> has a lot of exit edges.

I think this example proves the strengths and weaknesses of both
approaches. Making that region into structured control flow would add a lot
of additional blocks. This will also happen after generating code
from the polyhedral model, so either way the cost is there if the optimization
is successful.

The second  case is where the optimization fails (no profitable
transformation found) and the CFG can remain untouched.  

The third case is if one of those blocks contains something complicated. 
I believe the current algorithm simply fails and cannot detect the region. If the
CFG is modified this would allow an internal SESE-region to become a black box, and the
the outer regions could be optimized. 

> regions_without_loops.bzopen_or_bzdopen.dot.png
> the first region has two entry edges. One is the loop latch.
> (Keep in mind all regions have the same color, so if it seems there is
> an edge into a region, there are just two regions close by)
>
> Without a prepass that exposes the edges almost no region could be
> detected with the "standard" approach.

Indeed the CFG will have to be modified for these cases. I it seems to me that the trade-off
between the two approaches is that the algorithm that you currently have is a cheaper up 
front, but may be less capable in some cases, while the "standard" algorithm will be more 
expensive, but can handle problematic regions better. Would you agree?

- Jan