[PATCH] D11662: Filter cold blocks off the loop chain when profile data is available.

Mon Oct 12 17:07:30 PDT 2015

congh added a comment.

In http://reviews.llvm.org/D11662#261683, @djasper wrote:

> Why only do this if profile data is available? Are we scared that the statically derived probabilities are off by too much?

Without precise profile data, it is safer to put all blocks of the same loop together: if hot blocks are scattered into several places at the runtime, we will get very poor cache locality.

> How does this affect nested loops? Is the 'outlined' block still in the outer loop? I am not sure which one would be better, but maybe we should add a comment and a test to document the behavior?

You can treat each loop as a function: for an inner loop, if a block is cold, it will probably put together with blocks of its outer loop. But this is also determined by whether that block is cold or not in the outer loop (e.g., its frequency is more than 20% of the frequency of the outer loop). That says, we keep passing this cold block to outer loops until it is not cold in that outer loop anymore. I have updated the comment and explained this. I also updated the test case with a nested loop for which a block is cold for the inner loop but not cold for the outer loop.

> Should there be some minimum size for the block to be outlined? A very short block will still increase branch count, but not really affect cache locality. For those, the trade-off might be different.

If we consider not to outline cold but very short blocks, we should do it uniformly in other cases. Take a diamond branch for instance:

A; if (...) B; else C; D;

Suppose B is very cold but very short, should we layout this branch as ACBD..., or ACD...B?

One optimization I could think out is that we can put B as close as possible to D without affecting overall branch cost.

http://reviews.llvm.org/D11662