[LLVMdev] BigBlock register allocator

Fri Jun 22 03:20:05 PDT 2007

Hi Duraid,

> Hi everyone,
> 
> Quick summary:
> 
> 	LLVM now has a new register allocator particularly suitable for 
> compiling (very) large, machine-generated functions.

Congrats! Very good job!

> Longer story:
> 
> 	I've recently been using LLVM in an application that involves JITing
> 
> fairly large functions that have no control flow - they're just flat 
> sequences of instructions, anywhere from 100 to 10000+ in size. (The 
> control flow is all in the host program, which works out which
> monster  function to call and when.)

> The default (linearscan) register allocator wasn't doing a good job,
as 
> it doesn't (yet) have live range splitting. It would quickly use all 
> available registers and then get stuck, using only a single register 
> (and the stack, a lot!) for hundreds or thousands of instructions at
> a time, greatly slowing (+bloating) the code. 

True. I'm working on the version of the linear scan based on Wimmer's
thesis. It supports live range splitting. I'd like to compare it with
yours. Do you have any good examples of those fairly large functions
that are just flat sequences of instructions, anywhere from 100 to
10000+ in size??? It would be nice, if you could send me those test
cases (as C or ll files). I could use it then as a basis for a
comparision and report about results.

> 	The good news is the new "BigBlock" allocator turns out to 
> produce even  better code than the local allocator when blocks are
very
> large. We're talking a +10~20% speed boost on average. (If your basic

> blocks are small, or there's not much register pressure, you'll 
> actually get the same code out of both local and BigBlock.)

Do you have numbers comparing it to the current version of the LLVM's
linear scan? The win of your allocator over the linear scan should be
even better, I guess. 

> 	While BigBlock isn't (and never will be) as fast as the local 
> allocator, it's not much slower, doesn't use much memory, and is 
> certainly faster than linearscan. So if you're compiling very large, 
> (probably) machine-generated blocks of straight-line code, give the 
> Local and BigBlock allocators a try, especially if you're JITing
> things and compile time is important.

I looked at your code. And I see some things that could be
significantlty sped up, e.g.
 - InsnTimes handling. I have the feeling, this map can be eliminated
completely.
 - use of the VRegReadTable. The vector of read occurences can be
shortened every time, you processed the corresponding intruction. This
makes it shorter and makes searches inside this vector faster, thus
making chooseReg much faster. Probably also some other optimizations
can be applied to the chooseReg function.
 - PhysRegsUseOrder - you remove some elements from the middle of this
vector in removePhysReg. This is not a very efficient operation on the
vectors, since it need to copy the tail of the vector. I think using a
list data-structure could be much more efficient for this purpose

I think these changes may significantely improve the performance of
your BigBlock register allocator. I'll try to come up with some more
concrete proposals or even patches over the week-end or next week. 

-Roman

__________________________________________________
Do You Yahoo!?
Sie sind Spam leid? Yahoo! Mail verfügt über einen herausragenden Schutz gegen Massenmails. 
http://mail.yahoo.com