[llvm-commits] [llvm] r142743 - in /llvm/trunk: lib/CodeGen/MachineBlockPlacement.cpp test/CodeGen/X86/block-placement.ll

Tue Oct 25 23:18:36 PDT 2011

On Oct 23, 2011, at 2:18 AM, Chandler Carruth wrote:

> Author: chandlerc
> Date: Sun Oct 23 04:18:45 2011
> New Revision: 142743
> 
> URL: http://llvm.org/viewvc/llvm-project?rev=142743&view=rev
> Log:
> Completely re-write the algorithm behind MachineBlockPlacement based on
> discussions with Andy. Fundamentally, the previous algorithm is both
> counter productive on several fronts and prioritizing things which
> aren't necessarily the most important: static branch prediction.
> 
> The new algorithm uses the existing loop CFG structure information to
> walk through the CFG itself to layout blocks. It coalesces adjacent
> blocks within the loop where the CFG allows based on the most likely
> path taken. Finally, it topologically orders the block chains that have
> been formed. This allows it to choose a (mostly) topologically valid
> ordering which still priorizes fallthrough within the structural
> constraints.
> 
> As a final twist in the algorithm, it does violate the CFG when it
> discovers a "hot" edge, that is an edge that is more than 4x hotter than
> the competing edges in the CFG. These are forcibly merged into
> a fallthrough chain.
> 
> Future transformations that need te be added are rotation of loop exit
> conditions to be fallthrough, and better isolation of cold block chains.
> I'm also planning on adding statistics to model how well the algorithm
> does at laying out blocks based on the probabilities it receives.
> 
> The old tests mostly still pass, and I have some new tests to add, but
> the nested loops are still behaving very strangely. This almost seems
> like working-as-intended as it rotated the exit branch to be
> fallthrough, but I'm not convinced this is actually the best layout. It
> is well supported by the probabilities for loops we currently get, but
> those are pretty broken for nested loops, so this may change later.
> 
> Modified:
>    llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp
>    llvm/trunk/test/CodeGen/X86/block-placement.ll

Hi Chandler,

Sorry, I've been neglecting this a bit. My initial impression of your
current design is that the framework is good and the intention of the
algorithm is good, but placeChainsTopogically could do a better job in
few areas (1) biasing "warm" edges, (2) placing "cold" chains, (3)
loop handling.

(1) "Warm" edges are those that cross chains, and whose successor is
the head of a chain. In other words, you've decided to place the
chains "in-line" in topological order. Placing them in any arbitrary
topological/RPO order is not bad at all since you've already chosen
good chains. But consider a merge point with three or more
predecessors. You actually want the most frequent predecessor to fall
through to the merge, unless the head of that chain dominates the
other predecessors. This is tricky, rife with corner cases, and
probably not too important, especially without perfect profile
information. So it would be fair to punt on this issue.

(2) "Cold" edges branch into the middle of another chain. But it looks
like placeChainsTopogically happily follows the RPO order of the
chain's head. I'm guessing they'll end up in strange places unrelated
to their adjacent chains. This doesn't seem like what you intended. If
you avoid placing those in RPO order, then they should float down to
the end of the function.

(3) I've always thought we would regret that LoopInfo doesn't keep
it's list of blocks in RPO order. You're currently sidestepping the
issue by placing chains in a single iteration through the function's
blocks. If that works well, I don't have a problem with it, but it
seems to complicate the issue of loop layout. For example, our simple
RPO iterator makes no attempt to visit blocks in loop order, so
following this order will not result in contiguous loops. I think
laying out one loop at a time is much easier to deal with. If you want
to follow blocks within a loop in RPO order, you can use the
LoopIterator that I recently added for this purpose. It does need to
be adapted for MachineLoopInfo. I actually like the BranchProbability
approach of implementing the generic code in an implementation header
that's only included once per instantiation
(BranchProbabilityInfo.cpp + MachineBranchProbabilityInfo.cpp).

Another alternative is to do the topological sort without relying on
DFS. This can be done using a worklist of valid blocks. You need to
attach a "remaining predecessor" counter to each block. When all preds
have been laid out (or pushed out-of-line), the block can be pushed on
the valid queue. This might provide the freedom you need to implement
better profile-based heuristics, but could also result in more
arbitrary shuffling of blocks in the absence of good heuristics for
picking the next candidate. In other words, it could be the basis of a
solution to both the "warm" layout problem, and loop handling, but you
have an additional problem of choosing a "valid" block that is somehow
related to the last block that was laid out. I supposed you could just
sort the chains within the valid set similar to how you sorted the
full chain list in the Pettis & Hansen prototype. I think it's an
interesting idea, but haven't completely thought through it.

Another thing to keep in mind: where the profile doesn't indicate
otherwise, it's not a bad idea to preserve the incoming layout. So if
you have cases where the algorithm can't make a good choice, falling
back on the current layout within some region is a way to avoid making
things worse.

Implementation Details:

Do you think it's important to keep a vector of blocks inside each
chain, as opposed to incrementally updating the existing ilist and
storing pointers to the chain head and tail? AFAIK, you can reorder
the blocks all you want then update terminators later, but I could be
wrong. Either way, chain merge can't be constant time because of the
BlockChain map update. But you could avoid regrowing a lot of
vectors. Along those lines, if you ever decide to use union-find, we
have IntEqClasses.

Should we have an inline version of getEdgeProbability() that simply
returns BranchProbability::getOne() for single-successor blocks?

-Andy