[LLVMdev] Question regarding basic-block placement optimization

Sun Oct 23 01:11:52 PDT 2011

Ok, I think I have a working pass that is based much more on what we've
talked about here. The patch is attached. I'd love to commit it soon-ish and
then start tweaking it based on feedback from you, others, and looking at
how it actually works in the wild.

It's essentially a re-write, so it may be hard to read... Let me know if
some other form would be easier.

Some responses to your comments below, perhaps giving insight into the
choices I made or what the pass is currently doing and what I haven't
addressed yet.

On Fri, Oct 21, 2011 at 6:17 PM, Andrew Trick <atrick at apple.com> wrote:

> Once you decide to break these constraints, you have effectively
> designated the offending paths as "cold". It's up to you how to
> decide which paths are cold. You'll need some threshold based on
> confidence in the profile, and we can measure the effectiveness of
> your heuristics and add more over time.
>

FYI, I ended up with a heuristic that requires a hot edge to be 4x more
likely than any other competing in the CFG. This is very loosely based on
the definition of a 'hot' edge in BranchProbabilityInfo (80% vs. 20%).

I don't care much about how you layout the cold chains as long as they
> are not interleaved with the rest of the code, thus breaking the
> topological ordering and forcing extra branches. In practice, this
> means they get their own little Siberia after the function return,
> effectively splitting the function into two sections. However, the
> notion of "coldness" is really relative to the current loop head.
>

I've not really tried to ensure this happens correctly. Mostly, this is just
forming hot-paths through the code when they are *really* hot. I'll work on
better sectioning off the cold paths that are cut off in a followup patch.

2) Construct chains of blocks within the loop. Here the "entry" chain
>    is the loop header, or function entry. You can probably represent
>    each already laid out inner loop as a single chain. The graph of
>    chains within a loop is *acyclic* except for irreducible control
>    flow, which doesn't have any structure to preserve anyway. There's
>    no need to compute SCCs.
>

I'm actually not satisfied with how I've done this. I think there is likely
a cleaner / better solution than the one I have in the patch, but it largely
works, and is easy to change later. The patch just walks forward over the
function, and tries to chain blocks together with their successors when the
CFG permits, choosing successors based on whatever (even weak) signal the
probability info gives us.

4) Splice the chains together, preserving topological order. Assigning
>    each block to a DFS reverse-post-order number provides a nice
>    ordering that is easy to compare or sort.
>

Currently I'm doing this once at the very end. I wonder if it would be
better to do iteratively as we walk up through the loops?

Let me give you a little example to think about:
>
> A:
>   br B(80%), C(20%)
> B:
>   br D
> C:
>   br D(90%), E(10%)
> D:
>   br F
> E:
>   br F
> F:
>   ret
>
> This is a fine topological sort but bad layout based on expected
> branch probability (even with low confidence in the profile).
>
> The only good layout that preserves topological ordering is:
> A, B, C, E, D, F
>

I made a test case out of this, and I think it works. It's a bit confusing
at it just replaces branches to F with ret. That removes the edge from D to
F, and makes the following ordering also viable: A, B, C, D, E. That's the
ordering it chooses, and it also duplicates D up into B instead of branching
from B to D. :: shrug ::. It looks pretty good to me though. =]

> Finally, we need some way to validate the design and verify the
> implementation. Weighted statistics will be very useful here, similar
> to those that Jakob used in the register allocator. For example:
>
> For each nonadjacent edge:
>   TakenBranchFrequency += frequency(Edge)
>
> This metric will prefer the most "extreme" layout because it assumes
> perfect accuracy of block frequencies. It would be nice to look at the
> same metric computed from skewed branch probabilities to see how
> sensitive it is to small shifts in branch behavior.
>
> Taken branch frequency is the most obvious, but doesn't completely
> capture locality. You could also measure distance jumped fairly
> easily, also weighted by edge frequency.
>
> It might be useful to gather these stats as late as possible to take
> into consideration the final impact of layout, including the effect on
> any later passes.
>

Cool, I'm working on this next. I've also uncovered some *completely* broken
aspects of branch probability. Going back and adding tests and fixes for it.
=]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111023/8781c895/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mbp-redux.patch
Type: application/octet-stream
Size: 33590 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20111023/8781c895/attachment.obj>