[LLVMdev] New Register Allocation Algorithm

Wed Jul 18 10:57:46 PDT 2007

Dear LLVMers,

     we here in the UCLA compiler's lab have an implementation of a very 
cool register allocator in LLVM. It finds an optimal register assignment 
for the x86 machine, via live range splitting, that is, if it is possible 
to have a register assignment without register spilling, our allocator can 
find it.
We have the short and long versions of our paper here:

http://compilers.cs.ucla.edu/fernando/projects/puzzles/

     Our algorithm is polynomial time, and up to now, it is adding about 
15% more to the total compilation time of LLVM. It does not iterate like 
the current implementation of RegAllocLinearScan (LN), and does not have 
to worry about conflicts between virtuals and physical registers, two 
things that bring the running time of RegAllocLinearScan down. I am not an 
expert in C++, but I think that with a careful implementation, it would 
even be possible to match the compilation time of LN. Indeed, for the very 
large benchmarks, the algorithm sometimes is faster.
     We have not implemented instruction folding yet. Nothing prevent us of 
implementing it, it is just a matter of time (and patience). Currently the 
code that our algorithm produces is faster than LN, if I use 
-disable-spill-fusing, and sometimes it is faster even when I let LN to 
use instruction folding.
     Of course, because the algorithm is very new, there are a lot of 
things yet to improve, and we would like to hear comments from you guys. 
We did not do any global coalescing, and our spilling heuristics is the 
simplest possible: when having to decide among register to spill, evict 
those that have he longest interval. I had implemented a pass to build the 
dominator tree of MachineBasicBlocks, but my pass uses the naive quadratic 
algorithm, and I have this in my list of things to improve. Also, our 
algorithm only works for x86, because I hard coded the relations between 
aliased registers in the target independent part of the implementation. 
This is also something to improve. To implement the allocator, I had to 
change the implementation of LLVM code here and there: adding a function 
to insert swaps in MRegisterInfo, changing the coalescing routine in 
LiveIntervalAnalysis, etc.

     Anyway, if you guys could take a look into our paper, I think you 
would like it. The idea is quite new, and if possible, we would like to 
produce a version that could be distributed with LLVM. But in this case, 
the work is huge, for the quality of my implementation is way below the 
quality of the implementation of the current linear scan algorithm.

All the best,

Fernando