[LLVMdev] Greedy Register Allocation in LLVM 3.0

Mon Sep 26 22:11:42 PDT 2011

On Sep 26, 2011, at 4:22 AM, Leo Romanoff wrote:
> Hi Jakob,
> 
> Thanks for a very interesting description of the new register allocation algorithm in LLVM!!!
> 
> Could you elaborate a bit on the following topics:
> 
> 1) Do you have any plans to publish something more formal and detailed about this algorithm, e.g. a paper or something similar?  It would be nice to better understand how this algorithm relates to well-known algorithms described in the literature, e.g. linear scan, graph coloring, iterated register coalescing by Chaitin-Briggs, register allocation by priority-based coloring by Chow and Hennesey and if it represents an extension of one of those algorithms or if it is a totally new algorithm. And of course it would be nice if the terminology used by this description would follow the one published in the literature, as sometimes use of a different terminology can be misleading. For example, it is rather well known that the Linear Scan register allocation of LLVM was not a real linear scan register allocator, as it allowed for backtracking and re-coloring. It was de-facto more of a special case of graph-coloring register allocator. But due to its name, many
> published research papers assumed that it is a real linear scan implementation and took it as a representative example for such allocators.
> 
> 2) A comparison of the quality of generated allocations  and allocator performance with well-known register allocation algorithms would be very interesting to better understand pros and cons of this new algorithm. Do you have any numbers at hand? If not, do you plan to produce a more thorough analysis?
> 
> 3) It is stated in the blog that this allocator is much simpler than the old linear scan implementation of LLVM. Can you provide a bit more information?
>    For example:
>     - How many lines of code it is and how it compares in this regard to the old linear scan?
>     - Are there any wired places like the former SimpleRegisterCoalescer, which was not that simple at all and had an extremely wired logic ;-)
>     - You explained that it produces  in most situations a better code than linear scan, which is cool! But it would be also interesting to know in which pathological cases it still looses against linear scan and why
>     - Can you provide a rough estimate of the algorithmic complexity of the new algorithm? Is it O(n), O(n^2) or something else?
> 
> Thanks,
>   Roman

It may be more helpful to explain how LLVM's register allocator came
into existence before debating the high level algorithm.

When I began working on LLVM last October, Jakob was developing an
infrastructure for global live range splitting. It was just becoming
clear that LLVM's Linear Scan was inadequate for this work, and no one
I talked to knew how to fix it. Chris and Evan were very receptive to
replacing the register allocator, but the design space was wide
open. I knew from experience that we did not want to build a
traditional interference graph. Chris, Evan, and Jakob all agreed on
this point. I reasoned that an efficient implementation of
LiveInterval along with a new "LiveIntervalUnion" data structure and
mechanism for caching interference tests would be sufficient, and
offered to build a prototype to prove it.

This was an important step. My goal was never to develop a
theoretically superior register allocator algorithm in its own
right. On the contrary, Jakob and I wanted the global splitter to
drive allocation, rather than an allocation algorithm that fell back
to splitting when it failed. The important stake holders at the time
strongly agreed that the problem Jakob was solving, global live range
splitting, was the difficult problem that deserved attention, and that
the register allocation algorithm was only important in that it did
not inhibit splitting or spilling or unduly impact compile time. I
believe this goal led to a register allocator that is fundamentally
different from anything I'm aware of at the level of software design.

To summarize, Jakob and I both strongly agreed on the following design
goals, primarily for engineering reasons, not theoretical reasons:

- early aggressive coalescing

- no predetermined allocation order
  i.e. no fixed order CFG traversal,
       splitting changes allocation order

- no explicit interference graph

- on-the-fly live range splitting at the finest possible granularity

  When I say this the new allocator is fundamentally different from
  the other allocators I've seen, this is mainly what I'm referring
  to. At each decision point made by the allocator, the splitter may
  rewrite the code, modify live ranges, and incrementally update every
  bit of state in the allocator. There is no abandoning the current
  solution and no iterative application of an algorithm.

When I implemented the Basic/Greedy prototypes, I was able to replace
the code in RegAllocLinearScan.cpp with the code you can still see in
RegAllocBasic.cpp. You can see for yourself what Jakob means by the
new allocator being much simpler. And as Jakob explained, the real
benefit will be in removing VirtRegWriter.cpp which is made possible
by his spill code optimizations. Those optimizations are in turn made
practical by the new framework for incrementally updating the register
allocator's state.

The naming decision was literally a five minute brainstorm driven by
the need for unique filenames that are easy to type. "Greedy" is no
better description of our design than "Linear Scan" is of the LLVM
implementation by that name or "Graph Coloring" is of most
Chaitin-Briggs implementations. I believe the best name for the new
design would be one that conveys that it is actually a live range
splitter that happens to perform allocation. But I am not attached to
the names, and always figured the final algorithm, if successful,
would simply be known as the "LLVM Register Allocator".

It's worth mentioning that we considered calling it a "Priority"
allocator, but I objected to this because I felt it to be
misleading. Register reassignment and eviction were part of my
original prototype and I felt those features were more important than
allocation order. I used a priority queue for convenience, but
believed the choice was somewhat insignificant. In fact, the design
was intended to work reasonably well even if priorities were
inverted. I initially used spill weight as a place holder for whatever
dynamic heuristic Jakob would find works well with the splitter. One
thing Jakob discovered is that an "inverted" heuristic actually
cooperates better with his splitter. The Greedy allocator now
allocates registers to the largest live ranges first.

To be fair to Linear Scan, it is also worth noting that the framework
for live intervals was directly inspired by the LLVM implementation of
the linear scan algorithm.

In the end I believe the effectiveness of any allocator reflects the
work that goes into modeling the details of the target
architecture. This is not easy to achieve in a retargetable code
generator. Consequently, that is where much of the effort continues to
be directed in LLVM, rather than in attempting to rank algorithms in
their most abstract interpretation.

I hope this brief chronicle of the algorithm's inception dispels some of
its mystery.

-Andy