[LLVMdev] Greedy register allocation

Chris Lattner clattner at apple.com
Wed May 4 05:17:42 PDT 2011


On May 3, 2011, at 4:08 PM, David A. Greene wrote:

>> 
>> It's just that an REX prefix is required on some instructions when
>> %xmm8 is used. Is it worth it to undo LICM just for that? In this
>> case, probably. In general, no.
> 
> Ah, so you're saying the regression is due to the inner loop icache
> footprint increasing.  Ok, that makes total sense to me.  I agree this
> is a difficult thing to get right in a general sort of way.  Perhaps the
> CostPerUse (or whatwever heuristics use it) can factor in the loop body
> size so that tight loops are favored for smaller encodings.

It is almost certainly that the inner loop doesn't fit in the processors predecode loop buffer.  Modern intel X86 chips have a buffer that can hold a very small number of instructions and is bound by instruction count, code size, and sometimes # cache lines.  If a loop fits in this it allows the processor to turn off the decoder completely for the loop, a significant power and performance win.

I don't know how realistic it is to model the loop buffer in the register allocator, but this would a very interesting thing to try to optimize for in a later pass.  If an inner loop "almost" fits, then it would probably be worth heroic effort to try to reduce the size of it to shave off a few bytes. 

-Chris



More information about the llvm-dev mailing list