[LLVMdev] Supporting Complex Register Allocation Constraints (PBQP Allocator Status Update)
Lang Hames
lhames at gmail.com
Mon Sep 20 07:53:03 PDT 2010
Hi All,
I've just committed some changes to the PBQP allocator which are designed to
make it easier to implement custom register allocation constraints. This is
a quick summary of those changes, and of the status of the PBQP allocator in
general.
First a quick bit of background:
The PBQP allocator is based on ideas described in [1]. I implemented this
algorithm (with the improved heuristic from [2]) in LLVM, and committed it
to the mainline back in October 2008. Since then it has consistently passed
correctness tests (compiling all benchmarks for which linear scan also
succeeds, including the SPEC benchmarks), and produces similar code quality
to linear scan on average (on my AMD64 test box at least). It currently has
significant compile-time costs, requiring more memory and roughly triple the
total compile time on average (compared to linear scan). Hopefully these
will come down somewhat if/when I find more time to devote to optimizing the
allocator, however I believe they will remain on the high end. The reason
for this (and at the same time the source of PBQP's strengths) is the
generality of the mathematical optimization problem which underlies the
allocator. This optimization problem allows a wide variety of register
allocation costs and constraints to be expressed in a straightforward
manner. If you have a target architecture with complex constraints and you
want to get start getting good allocations with minimal engineering work,
this may be the allocator for you.
Here's how the PBQP allocator works:
(Note: This section dives in to a few details of how PBQP works, and how to
use it, so you can skip it if you're not interested)
Starting with some pseudo-C++-code for the allocator, and taking a very
high-level view:
void pbqpAllocate() {
do {
problem = builder->buildPBQPProblem(machineFunction);
solution = solvePBQP(problem);
allocation = mapToAllocation(solution);
} while (containsSpills(allocation));
}
Essentially the allocator loops mapping the current machine function to an
optimization problem, solving the problem, them mapping the solution back to
an allocation. If there are any spills then they're applied (using LLVM's
spilling infrastructure), and iteration terminates when there are no further
spills required.
It's a very similar process to graph coloring, only instead of building an
interference graph and solving a k-colouring problem, you're building a PBQP
instance and minimizing an objective function. The good news is that there's
a PBQP solver implemented in LLVM, and the mapping from solutions to
allocations is generic, so the only thing that needs to be done to support a
new architecture is to supply a mapping from machine functions to PBQP
instances for the new target. If you can do that, the existing
infrastructure can handle the rest. (A caveat is appropriate here: PBQP is *
only* solving the labeling problem. That's powerful, but it doesn't solve
every problem in regalloc. If you have issues with the spiller/rewriter, or
need high-quality bank-selection, or something like that, I'm afraid this
won't help).
So how do you write mappings?
First you need to know what a PBQP representation of a register allocation
problem looks like. They're represented as graphs, and those who know graph
coloring will find the structure familiar. Each virtual register in the
machine function is represented by a node in the PBQP graph, and constraints
between virtual registers (such as interference) are represented as edges.
in the PBQP graph each node is associated with a cost vector, which gives
the cost of each allocation option for the virtual register. E.g.
spill 500 // <- Spill cost estimate
ax 0
cx 0
si 0
Typically you will use the spill cost estimate for the spill option, and
zero for each register, however you can attach positive or negative costs to
the individual registers to preference them as you like (which is handy if
you've got preferred registers with shorter instruction encodings for
instance).
Each edge in the PBQP graph is associated with a cost matrix. This gives the
cost for combinations of allocations for the virtual registers it
connects. So to represent an interference constraint you would have a matrix
with infinite costs for aliasing physical registers, and zero everywhere
else. Here a picture helps:
spill al ah cl dl
spill 0 0 0 0 0
ax 0 inf inf 0 0
cx 0 0 0 inf 0
si 0 0 0 0 0
If you want a coalescing edge you'd use a matrix with negative costs for
elements that allocate virtual registers to the same physical register. To
represent a pairing constraint you'd put an infinite cost on any pair whose
allocation doesn't obey the pairing restriction (See [1] for several example
constraints).
The overall cost of a specific solution (i.e. a register assignment) is
given by summing the costs over all nodes and edges in the graph. Finding an
optimal solution is NP-complete (since you can map k-colouring problems to
PBQP), however the solver should find you a decent solution relatively
quickly.
There's a lot more detail on all of this in [1] and [2], so I'll leave you
to read them for more information if you're interested (or feel free to
shoot me questions via email). For now I'll get on to the the how-to bit for
LLVM.
The new header include/llvm/CodeGen/RegAllocPBQP.h includes a class called
PBQPBuilder. This class represents the mapping between machine functions
and PBQP problems. By extending it and implementing the build method you can
add your own register allocation constraints to the PBQP problem. Just call
back to the PBQPBuilder base to have the spill costs and interference
constraints added for you, then iterate over the instructions in the
MachineFunction and add any additional constraints for your architecture.
Finally you'll need to add your own "createRegisterAllocator" function to
the passmanager in your Target, and have this function construct a
RegAllocPBQP (PBQP allocator) instance with your Builder. That should do
it*.
Summing up:
The PBQP allocator remains solid, generates reasonable code, and has just
been generalized to allow you to add custom register allocation constraints.
People who want a (comparatively) easy way to get an allocator running for
architectures with unusual constraints may find it useful.
Hopefully the above can serve as a starting point for anyone who's
interested to play around with the new features, and as always I'm very
happy to hear questions and comments.
Regards,
Lang.
* This claim is only partially tested, but I'm running the allocator right
now with the base constraints plus an extension which adds coalescing costs
and it's just passed CINT2006 without error. Not bad given that the
coalescing extension only took a couple of hours to write.
References:
[1] Scholz, B., Eckstein, E. 2002. Register allocation for
irregular architectures. In Proceedings of the Joint Conference on
Languages,
Compilers and Tools for Embedded Systems (LCTES'02), ACM Press, New
York, NY, USA, 139-148.
[2] Hames, L. and Scholz, B. 2006. Nearly optimal register allocation with
PBQP. In Proceedings of the 7th Joint Modular Languages Conference
(JMLC'06). LNCS, vol. 4228. Springer, New York, NY, USA. 346-361.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100921/6244ebfd/attachment.html>
More information about the llvm-dev
mailing list