[llvm-dev] Register Allocation Graph Coloring algorithm and Others

Thu Dec 14 20:40:19 PST 2017

On 12/14/2017 10:18 PM, Leslie Zhai wrote:
> Hi GCC and LLVM developers,
>
> I am learning Register Allocation algorithms and I am clear that:
>
> * Unlimited VirtReg (pseudo) -> limited or fixed or alias[1] PhysReg 
> (hard)
>
> * Memory (20 - 100 cycles) is expensive than Register (1 cycle), but 
> it has to spill code when PhysReg is unavailable
>
It might be much less if memory value is in L1 cache.
> * Folding spill code into instructions, handling register coallescing, 
> splitting live ranges, doing rematerialization, doing shrink wrapping 
> are harder than RegAlloc
>
RegAlloc is in a wide sense includes all this tasks and more.  For some 
architectures, other tasks like a right live range splitting might be 
even more important for generated code quality than just better graph 
coloring.
> * LRA and IRA is default Passes in RA for GCC:
>
> $ /opt/gcc-git/bin/gcc hello.c
> DEBUG: ../../gcc/lra.c, lra_init_once, line 2441
> DEBUG: ../../gcc/ira-build.c, ira_build, line 3409
>
> * Greedy is default Pass for LLVM
>
> But I have some questions, please give me some hint, thanks a lot!
>
> * IRA is regional register allocator performing graph coloring on a 
> top-down traversal of nested regions, is it Global? compares with 
> Local LRA
IRA is a global RA.  The description of its initial version can be found

https://vmakarov.fedorapeople.org/vmakarov-submission-cgo2008.pdf

LRA in some way is also global RA but it is a very simplified version of 
global RA (e.g. LRA does not use conflict graph and its coloring 
algoritm is closer to priority coloring).  LRA does a lot of other very 
complicated things besides RA, for example instruction selection which 
is quite specific to GCC machine description. Usually code selection 
task is a separate pass in other compilers. Generally speaking LRA is 
more complicated, machine dependent and more buggy than IRA.  But 
fortunately LRA is less complicated than its predecessor so called 
reload pass.

IRA and LRA names have a long history and they do not reflect correctly 
the current situation.

It would be possible to incorporate LRA tasks into IRA, but the final RA 
would be much slower, even more complicated and hard to maintain and the 
generated code would be not much better.  So to improve RA 
maintainability, RA is divided on two parts solving a bit different 
tasks.  This is a typical engineering approach.
>
> * The papers by Briggs and Chaiten contradict[2] themselves when 
> examine the text of the paper vs. the pseudocode provided?
I don't examine Preston Briggs work so thoroughly.  So I can not say 
that is true.  Even so it is natural that there are discrepancy in 
pseudocode and its description especially for such size description.

For me Preston Briggs is famous for his introduction of optimistic coloring.
>
> * Why  interference graph is expensive to build[3]?
>
That is because it might be N^2 algorithm.  There are a lot of 
publications investigating building conflict graphs and its cost in RAs.
> And I am practicing[4] to use HEA, developed by Dr. Rhydian Lewis, for 
> LLVM firstly.
>
When I just started to work on RAs very long ago I used about the same 
approach: a lot of tiny transformations directed by a cost function and 
using metaheuristics (I also used tabu search as HEA). Nothing good came 
out of this.

If you are interesting in RA algorithms and architectures, I'd recommend 
Michael Matz article

ftp://gcc.gnu.org/pub/gcc/summit/2003/Graph%20Coloring%20Register%20Allocation.pdf

as a start point.
>
> [1] https://reviews.llvm.org/D39712
>
> [2] http://lists.llvm.org/pipermail/llvm-dev/2008-March/012940.html
>
> [3] https://github.com/joaotavio/llvm-register-allocator
>
> [4] https://github.com/xiangzhai/llvm/tree/avr/include/llvm/CodeGen/GCol
>