[LLVMdev] Greedy Register Allocation in LLVM 3.0
romixlev at yahoo.com
Tue Sep 27 00:11:06 PDT 2011
Hi Jakob, Hi Andy,
First of all, thanks a lot for very elaborative and interesting answers!
> It may be more helpful to explain how LLVM's register allocator came
> into existence before debating the high level algorithm.
> When I began working on LLVM last October, Jakob was developing an
> infrastructure for global live range splitting. It was just becoming
> clear that LLVM's Linear Scan was inadequate for this work, and no one
> I talked to knew how to fix it. Chris and Evan were very receptive to
> replacing the register allocator, but the design space was wide
> open. I knew from experience that we did not want to build a
> traditional interference graph. Chris, Evan, and Jakob all agreed on
> this point. I reasoned that an efficient implementation of
> LiveInterval along with a new "LiveIntervalUnion" data structure and
> mechanism for caching interference tests would be sufficient, and
> offered to build a prototype to prove it.
> This was an important step. My goal was never to develop a
> theoretically superior register allocator algorithm in its own
> right. On the contrary, Jakob and I wanted the global splitter to
> drive allocation, rather than an allocation algorithm that fell back
> to splitting when it failed. The important stake holders at the time
> strongly agreed that the problem Jakob was solving, global live range
> splitting, was the difficult problem that deserved attention, and that
> the register allocation algorithm was only important in that it did
> not inhibit splitting or spilling or unduly impact compile time. I
> believe this goal led to a register allocator that is fundamentally
> different from anything I'm aware of at the level of software design.
> To summarize, Jakob and I both strongly agreed on the following design
> goals, primarily for engineering reasons, not theoretical reasons:
I totally understand that the goal was/is to build a real, efficient register allocator, instead of producing a theoretical result, which may sound good in theory, but would not be very practical in terms of quality or compilation speed on real world use-cases.
> - early aggressive coalescing
> - no predetermined allocation order
> i.e. no fixed order CFG traversal,
> splitting changes allocation order
> - no explicit interference graph
> - on-the-fly live range splitting at the finest possible granularity
This very nicely summarizes main objectives of the new allocator.
> When I say this the new allocator is fundamentally different from
> the other allocators I've seen, this is mainly what I'm referring
> to. At each decision point made by the allocator, the splitter may
> rewrite the code, modify live ranges, and incrementally update every
> bit of state in the allocator. There is no abandoning the current
> solution and no iterative application of an algorithm.
While I agree that most "classical" allocators (e.g. graph coloring) do not do fine grained splitting and usually build an explicit interference graph, I also want to mention that many of the modern proposals published in the literature often do very agressive splitting (sometimes even before and after at each instruction, or at the beginning and end of each basic block). Many of proposals also avoid construction of interference graphs and replace it with on-demand interference computation, which is often very cheap and efficient, e.g. for SSA-progams. There are also descriptions how to compute liveness information very efficiently. Quite some of these register allocation proposals are also able to handle overlapping register classes. A few names to be mentioned in these areas are for example Hack, Bouchez, Boissinot, Pereira, Wimmer, etc
> When I implemented the Basic/Greedy prototypes, I was able to replace
> the code in RegAllocLinearScan.cpp with the code you can still see in
> RegAllocBasic.cpp. You can see for yourself what Jakob means by the
> new allocator being much simpler. And as Jakob explained, the real
> benefit will be in removing VirtRegWriter.cpp which is made possibler c
> by his spill code optimizations. Those optimizations are in turn made
> practical by the new framework for incrementally updating the register
> allocator's state.
> The naming decision was literally a five minute brainstorm driven by
> the need for unique filenames that are easy to type. "Greedy" is no
> better description of our design than "Linear Scan" is of the LLVM
> implementation by that name or "Graph Coloring" is of most
> Chaitin-Briggs implementations. I believe the best name for the new
> design would be one that conveys that it is actually a live range
> splitter that happens to perform allocation. But I am not attached to
> the names, and always figured the final algorithm, if successful,
> would simply be known as the "LLVM Register Allocator".
It is true that names are not always reflecting the essense. But on the other hand, there is a lot of ongoing research on register allocation (and compilers in general) and it looks like more and more such efforts choose LLVM as a platform for experimentation. Quite some results and comparisons are published. So, it would be nice to have a somewhat concrete (but may be not ideal) name for the LLVM allocator rather than "LLVM register allocator". In a few years, there may be another, even better register alloctor for LLVM. If it will be called "LLVM register allocator" again, it would lead to a lot of misunderstandings when reading descriptions and papers. The typical question then will be "Which LLVM register allocator was meant?" ;-)
> It's worth mentioning that we considered calling it a "Priority"
> allocator, but I objected to this because I felt it to be
> misleading. Register reassignment and eviction were part of my
> original prototype and I felt those features were more important than
> allocation order. I used a priority queue for convenience, but
> believed the choice was somewhat insignificant. In fact, the design
> was intended to work reasonably well even if priorities were
> inverted. I initially used spill weight as a place holder for whatever
> dynamic heuristic Jakob would find works well with the splitter. One
> thing Jakob discovered is that an "inverted" heuristic actually
> cooperates better with his splitter. The Greedy allocator now
> allocates registers to the largest live ranges first.
> To be fair to Linear Scan, it is also worth noting that the framework
> for live intervals was directly inspired by the LLVM implementation of
> the linear scan algorithm.
> In the end I believe the effectiveness of any allocator reflects the
> work that goes into modeling the details of the target
> architecture. This is not easy to achieve in a retargetable code
> generator. Consequently, that is where much of the effort continues to
> be directed in LLVM, rather than in attempting to rank algorithms in
> their most abstract interpretation.
Absolutely. And yet a detailed description, deep explanation and comparison (either theoretical or practical based on benchmarks) is always useful as it introduces more clarity and helps to avoid misunderstandings.
The aim is not to produce a paper of "academic" quality or something comparable just for the sake of it, because you want it to be published at a conference. There are enough people in academia who would do it ;-)
The reason should be that it makes it easier to further develop and improve the LLVM register allocator and may also lead to better register allocation algorithms overall. And there is also a lot to gain from developments outside of LLVM. Once register allocation people (both from academia and from industry) have understood how this algorithm works and why certain (practical) decisions were made, they may realize how to improve it even further by using and borrowing some approaches and tricks invented in scope of other frameworks and register allocation algorithms.
> I hope this brief chronicle of the algorithm's inception dispels some of
> its mystery.
Yes. It was very educative! Actually, it would be nice to have a page with a detailed description of the current and may be old register allocator as a part of the LLVM web-site. For starters, the blog entry from Jakob and explanations from this mail thread could be put there. Later it could be extended and improved. What do you think about this idea?
More information about the llvm-dev