[LLVMdev] Question regarding basic-block placement optimization

Wed Oct 19 18:48:26 PDT 2011

On Oct 19, 2011, at 7:56 AM, Jakob Stoklund Olesen wrote:
>> This is still *very* much a rough draft, but it's probably better to review that the previous patch. One big caveat, I know I have an iteration bug in here somewhere that is inf-looping. Just ran out of steam debugging it, will pick it back up again later today to shake it out.
> 
> Some random notes:
> 
> - Please add a description of the algorithm.
> - Please add a comment to the BlockChain class.
> - Use a separate anonymous namespace per class, and don't indent for the namespace.
> 
...
> Note that MachineFunction::iterator decays into an MBB pointer, so you can say FI->canFallThrough() and AnalyzeBranch(*FI...)
> 
> +      WeightedEdge WE = { BaseFrequency * MBPI->getEdgeProbability(From, To),
> 
> Did we add API for this computation? We intended to add a function that computes this and saturates on overflow etc.

Overflow is handled transparently in the overloaded BlockFrequency::operator*(BranchProbability). But you could use the existing API anyway by adding a helper to MachineBlockFrequencyInfo calling BlockFrequencyImpl::getEdgeFreq(Src,Dst).

>> Still, for the test cases that don't tickle the iteration bug, it generates beautiful, well laid out code. =]
> 
> I am sure others have strong opinions about the algorithm ;-)

I may be one of those others ;-). Although handling SCCs (loops) probably saves Chandler's design from going too far off the rails. I don't remember that being in the published algorithm.

> I would like to see a more explicit handling of loop layout. There is a tradeoff between entering the loop with a branch, and having the exit block last. It is not clear to me that this algorithm handles that properly.

How do you ensure that BlockChains start at loop headers?

How do you ensure that loop exits are laid out following the loop body, especially in the presence of critical edges?

Why not layout blocks according to MachineLoopInfo? The SCC abstraction seems overly generic and unnecessary.

When you merge chains, why don't you delete the edge between the chains?

Why do you run profile guided block layout after the existing CodePlacementOpt? Shouldn't it be the other way around so that CodePlacementOpt can cleanup loops, which BlockPlacement2 doesn't handle well? I think the answer is that BlockPlacement2 actually depends on loops being laid out sensibly before running, but that needs to be explained.

> 
> Be careful about placing too much trust in the behavior of branch probabilities. They go out of whack when they saturate.
> 
> Second, you should handle blocks that can never fall through (indirect branches from jump tables). There is no need to try to place their successors.

Please put this under some flag for now. Enabling it by default opens up a whole set of issues that are premature to discuss. Until the overall code layout strategy is more coherent, people will only want to enable profile guided layout when they have complete profile info, or really need to take full advantage of builtin expect. If you're not one of those people, you'll want to leave this disabled to avoid unnecessary misery when debugging generated disassembly, and reduce the number of places that codegen makes semi-random decisions that perturb the code.

Otherwise, the code looks nice! Good comments.

-Andy