<div class="gmail_quote">Ok, I think I have a working pass that is based much more on what we've talked about here. The patch is attached. I'd love to commit it soon-ish and then start tweaking it based on feedback from you, others, and looking at how it actually works in the wild.</div>

<div class="gmail_quote"><br></div><div class="gmail_quote">It's essentially a re-write, so it may be hard to read... Let me know if some other form would be easier.</div><div class="gmail_quote"><br></div><div class="gmail_quote">

Some responses to your comments below, perhaps giving insight into the choices I made or what the pass is currently doing and what I haven't addressed yet.</div><div class="gmail_quote"><br></div><div class="gmail_quote">

On Fri, Oct 21, 2011 at 6:17 PM, Andrew Trick <span dir="ltr"><<a href="mailto:atrick@apple.com">atrick@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div style="word-wrap:break-word"><div><div class="im"><div><span style="background-color: transparent; ">Once you decide to break these constraints, you have effectively</span></div></div></div><div><div>designated the offending paths as "cold". It's up to you how to</div>

<div>decide which paths are cold. You'll need some threshold based on</div><div>confidence in the profile, and we can measure the effectiveness of</div><div>your heuristics and add more over time.</div></div></div></blockquote>

<div><br></div><div>FYI, I ended up with a heuristic that requires a hot edge to be 4x more likely than any other competing in the CFG. This is very loosely based on the definition of a 'hot' edge in BranchProbabilityInfo (80% vs. 20%). </div>

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div style="word-wrap:break-word"><div><div><span style="background-color: transparent; ">I don't care much about how you layout the cold chains as long as they</span></div>

<div>are not interleaved with the rest of the code, thus breaking the</div><div>topological ordering and forcing extra branches. In practice, this</div><div>means they get their own little Siberia after the function return,</div>

<div>effectively splitting the function into two sections. However, the</div><div>notion of "coldness" is really relative to the current loop head.</div></div></div></blockquote><div><br></div><div>I've not really tried to ensure this happens correctly. Mostly, this is just forming hot-paths through the code when they are *really* hot. I'll work on better sectioning off the cold paths that are cut off in a followup patch.</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div style="word-wrap:break-word"><div><div><span style="background-color: transparent; ">2) Construct chains of blocks within the loop. Here the "entry" chain</span></div>

<div>   is the loop header, or function entry. You can probably represent</div><div>   each already laid out inner loop as a single chain. The graph of</div><div>   chains within a loop is *acyclic* except for irreducible control</div>

<div>   flow, which doesn't have any structure to preserve anyway. There's</div><div>   no need to compute SCCs.</div></div></div></blockquote><div><br></div><div>I'm actually not satisfied with how I've done this. I think there is likely a cleaner / better solution than the one I have in the patch, but it largely works, and is easy to change later. The patch just walks forward over the function, and tries to chain blocks together with their successors when the CFG permits, choosing successors based on whatever (even weak) signal the probability info gives us.</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div style="word-wrap:break-word"><div><div><span style="background-color: transparent; ">4) Splice the chains together, preserving topological order. Assigning</span></div>

<div>   each block to a DFS reverse-post-order number provides a nice</div><div>   ordering that is easy to compare or sort.</div></div></div></blockquote><div><br></div><div>Currently I'm doing this once at the very end. I wonder if it would be better to do iteratively as we walk up through the loops?</div>

<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div style="word-wrap:break-word"><div><div><span style="background-color: transparent; ">Let me give you a little example to think about:</span></div>

<div><br></div><div>A:</div><div>  br B(80%), C(20%)</div><div>B:</div><div>  br D</div><div>C:</div><div>  br D(90%), E(10%)</div><div>D:</div><div>  br F</div><div>E:</div><div>  br F</div><div>F:</div><div>  ret</div><div>

<br></div><div>This is a fine topological sort but bad layout based on expected</div><div>branch probability (even with low confidence in the profile).</div><div><br></div><div>The only good layout that preserves topological ordering is:</div>

<div>A, B, C, E, D, F</div></div></div></blockquote><div><br></div><div>I made a test case out of this, and I think it works. It's a bit confusing at it just replaces branches to F with ret. That removes the edge from D to F, and makes the following ordering also viable: A, B, C, D, E. That's the ordering it chooses, and it also duplicates D up into B instead of branching from B to D. :: shrug ::. It looks pretty good to me though. =]</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div style="word-wrap:break-word"><div><div><span style="background-color: transparent; ">Finally, we need some way to validate the design and verify the</span></div>

<div>implementation. Weighted statistics will be very useful here, similar</div><div>to those that Jakob used in the register allocator. For example:</div><div><br></div><div>For each nonadjacent edge:</div><div>  TakenBranchFrequency += frequency(Edge)</div>

<div><br></div><div>This metric will prefer the most "extreme" layout because it assumes</div><div>perfect accuracy of block frequencies. It would be nice to look at the</div><div>same metric computed from skewed branch probabilities to see how</div>

<div>sensitive it is to small shifts in branch behavior.</div><div><br></div><div>Taken branch frequency is the most obvious, but doesn't completely</div><div>capture locality. You could also measure distance jumped fairly</div>

<div>easily, also weighted by edge frequency.</div><div><br></div><div>It might be useful to gather these stats as late as possible to take</div><div>into consideration the final impact of layout, including the effect on</div>

<div>any later passes.</div></div></div></blockquote><div><br></div><div>Cool, I'm working on this next. I've also uncovered some *completely* broken aspects of branch probability. Going back and adding tests and fixes for it. =]</div>

</div>