<html><head><style type="text/css"><!-- DIV {margin:0px;} --></style></head><body><div style="font-family:Courier New,courier,monaco,monospace,sans-serif;font-size:12pt">>>> bbs, to generate these edges. As we do not have to modify the CFG other<br>>>> passes like dominance information are still valid, and we do not have to<br>>>> create a lot of auxiliary bbs, to be able to detect all regions. This<br>>>> saves memory and runtime. In general it is probably not too easy to<br>>>> decide where to insert these bbs either:<br>>><br>>> The general rule is to split all blocks with multiple in-edges and multiple out-edges<br>>> into blocks with either multiple in-edges or multiple out-edges, but not both.<br>> This is not sufficient, as shown in the example below. It would allow<br>> only one region.<br><br>Yes, but with the insertion of merge-blocks it will allow more (see

 below).<br><br>>> One option is to keep this as an invariant throughout the compiler and make use<br>>> of the merge blocks (multiple in-edges) to contain only PHI-nodes, and all other code<br>>> in regular basic blocks. There are different variations on this that may or may not be<br>>> useful.<br>> This might be possible, however probably doubling the number of bbs.<br><br>I don't know if it will be double, but there will be more basic blocks for sure.<br><br>>>>> CFG:<br>>>>        0<br>>>>        |<br>>>>        1<br>>>>       / |<br>>>>      2   |<br>>>>     / \   3<br>>>>    4   5  |<br>>>>   

 |   |  |<br>>>>    6   7  8<br>>>>     \  | /<br>>>>      \ |/       region A: 1 -> 9 {1,2,3,4,5,6,7,8}<br>>>>        9        region B: 2 -> 9 {2,4,5,6,7}<br>>>><br>>>> So we need one bb that joins 6 and 7 and one that joins the two regions<br>>>><br>>>> CFG:    0<br>>>>        |<br>>>>        1<br>>>>       / |<br>>>>      2   |<br>>>>     / \   3<br>>>>    4   5  |<br>>>>    |   | 

 |<br>>>>    6   7  8<br>>>>     \  |  |<br>>>>      \ |  |     region A: (0,1) -> (9b,9)  {1,2,3,4,5,6,7,8,9a,9b}<br>>>>       9a  |     region B: (1,2) -> (9a,9b) {2,4,5,6,7,9a}<br>>>>        \ /<br>>>>         9b<br>>>>         |<br>>>>         9<br>>><br>>> It is fairly simple to use the information from the algorithm to decide<br>>> where those merges should be inserted to get the expected regions.<br>>> This may be needed in the cases where a sub-region is too complicated<br>>> to be represented and must be abstracted into

 a "black box".<br>> From which algorithm?  The program structure tree does not give this<br>> information, does it?<br><br>The algorithm that computes the SESE-regions can be used to determine<br>where the merge-nodes should be inserted. There are a couple of ways<br>of doing it, but if the bracket sets on two eges can be intersected to<br>match a third edge (which dominates the first two), you can insert a<br>merge block for the two edges. You don't have to compute the<br>dominators, but it helps to explain the problem that way.<br><br>>>>> My approach is comparable to this paper:<br>>>>> The Refined Process Structure Tree by JussiVanhatalo, Hagen Völzer,<br>>>>> Jana Koehler<br>>>><br>>>> I was looking through some slides that described their algorithm. One case that<br>>>> seems to be legal is this:<br>>>><br>>>>      

 |<br>>>>      Entry<br>>>>     /    \<br>>>>   R0      R1<br>>>>     \     /<br>>>>      Exit<br>>>>        |<br>>>><br>>>> With two fragments: Entry->R0->Exit and Entry->R1->Exit, which means<br>>>> that a fragment cannot be identified using only the entry and exit blocks, but<br>>>> the internal blocks or edges will also need to be listed. I don't know if this is<br>>>> relevant to your implementation.<br>><br>> No. The ideas are comparable, however I believe their implementation is<br>> a little complicated. ;-)<br><br>Do you have the same definition of a region and entry/exit blocks as they do?<br><br>> I would mark the regions

 as:<br>><br>> Region A: R0 -> Exit, containing {R0}<br>> Region B: R1 -> Exit, containing {R1}<br><br>Is the entry always contained and is the exit never contained, or is that specified<br>per region? Depending on the restrictions of entry and exit blocks a loop with a single<br>basic block cannot be an entry or exit by itself. Example:<br><br>    |<br>    A<br>   /| _<br>  / |/ \<br> B  R  |<br>  \ |\_/<br>   \|<br>    C<br>    |<br><br>If you only care about R in this case how is the region formed?<br><br><br>>>> The implementation however takes advantage of the existence of<br>>>> Dominance/PostDominance information. Therefore it is simpler and<br>>>> hopefully faster. At the moment run time is comparable to dominance tree<br>>>> calculation.<br>>><br>>> Both algorithms are linear so

 there is really no big difference in time imo.<br>> Sure. However in terms of maintainability it is nice to be able to reuse<br>> existing analysis instead of write another triconnected component<br>> analysis upfront.<br>><br>>> I believe the biggest difference that you mention is that you can capture more<br>>> complicated regions without having to modify the CFG with the current<br>>> algorithm.<br>> Yes.<br>><br>>>> If you want, have a look into some results I got with a pass extracting<br>>>> maximal non trivial regions that do not contain loops from libbz2:<br>>>><br><span>>>> <a target="_blank" href="http://tobias.osaft.eu/llvm/region/bz2NoLoops/">http://tobias.osaft.eu/llvm/region/bz2NoLoops/</a></span><br>>>><br>>>> Interesting ones:<br>>>><br>>>> regions_without_loops.BZ2_bzBuffToBuffCompress.dot.png<br>>>> has a lot of exit

 edges.<br>>><br>>> I think this example proves the strengths and weaknesses of both<br>>> approaches. Making that region into structured control flow would add a lot<br>>> of additional blocks. This will also happen after generating code<br>>> from the polyhedral model, so either way the cost is there if the optimization<br>>> is successful.<br>> Yes, but just in this case and not for regions where the model cannot be<br>> applied. If the regions pass is used for analysis purposes, nothing has<br>> to be touched.<br>><br>>> The second  case is where the optimization fails (no profitable<br>>> transformation found) and the CFG can remain untouched. <br>>><br>>> The third case is if one of those blocks contains something complicated.<br>>> I believe the current algorithm simply fails and cannot detect the region.<br>> Which algorithm? The one in Graphite? Or the region

 detection I wrote<br>> here? This is just plain region detection, that does not even look at<br>> the content but builds a region tree (program structure tree). It just<br>> detects every possible region.<br>> The selection would be a later pass.<br><br>My assumption was that there is a selection in there somewhere.<br>Do you plan to refine the regions in the selection phase in any way?<br><br>>> If the<br>>> CFG is modified this would allow an internal SESE-region to become a black box, and the<br>>> the outer regions could be optimized.<br>>This is an optimization, however I think it is orthogonal to the region<br>>detection problem. Say it works with any algorithm.<br><br>I believe that creating a black-box will map a lot more cleanly to an edge-based<br>region definition, since block-based may include multiple entry/exit sub-regions<br>that will not encapsulate control flow in a reasonable way.<br><br>>>>

 regions_without_loops.bzopen_or_bzdopen.dot.png<br>>>> the first region has two entry edges. One is the loop latch.<br>>>> (Keep in mind all regions have the same color, so if it seems there is<br>>>> an edge into a region, there are just two regions close by)<br>>>><br>>>> Without a prepass that exposes the edges almost no region could be<br>>>> detected with the "standard" approach.<br>>><br>>> Indeed the CFG will have to be modified for these cases. I it seems to me that the trade-off<br>>> between the two approaches is that the algorithm that you currently have is a cheaper up<br>>> front, but may be less capable in some cases, while the "standard" algorithm will be more<br>>> expensive, but can handle problematic regions better. Would you agree?<br>><br>> I agree that the algorithm I have is cheaper upfront, but I do not yet<br>> see a case where the

 algorithm is less capable. Would you mind to give<br>> an example or to highlight the relevant part of the discussion?<br><br>With the insertion of extra merge-blocks the code becomes more structured and the PST<br>can be refined further. A more fine-grained PST may allow more cases to be handled. <br><br>Thanks<br><br>Jan<br></div></body></html>