[LLVMdev] Is it possible to use the SimpleRegisterCoalescing pass in an iterative way?

Tue Jan 13 00:54:28 PST 2009

Hi Dave,

Thanks a lot for such an elaborative answer!

2009/1/12 David Greene <dag at cray.com>:
> On Friday 09 January 2009 03:36, Roman Levenstein wrote:
>> Hi,
>>
>> I'm implementing some variations of graph-coloring register allocators for
>> LLVM. Many of them perform their phases (e.g. coalescing, graph
>> simplification, spilling,  color selection) in an iterative way. Since
>> LLVM provides an implementation of the coalescing in the
>> SimpleRegisterCoalescing class already, I would like to reuse it (even
>> though I could of course create my own coalescing implementation). But
>> this class is a MachineFunctionPass.
>>
>> I'm wondering, if it is possible to use this pass from another pass
>> (i.e. regalloc) in an iterative way:
>> I need to invoke it multiple times for the same MachineFunction at
>> certain places in my regalloc pass implementation.
>> May be some sort of analysis update can be performed? Or is it
>> designed to be used just once per MachineFunction?
>
> Hi Roman,
>
> I did this a while back when implementing iterative register coalescing.  My
> guess is you're trying to do the same with the Generaslized Graph Coloring
> algorithm,.

You are absolutely right.

> It's not easy to accomplish within the LLVM framework.  I've made a number of
> structural changes to our code here and it's in a bit of a state of bitrot due
> to upstream changes we haven't merged in yet.
>
> I got some of this work merged upstream in the RegisterCoalescer and
> RegallocQuery interfaces.  RegallocQuery is supposed to be an opaque
> communication conduit between register allocators and coalescers such that
> the allocator can update the coalescer when it makes changes and vice versa.

Yes. Initially I didn't notice RegallocQuery in the header files. But
after Even pointed out that it can be used, I had a closer look at it.

> When I did the full implementation I found I had to make more changes to
> RegallocQuery.

Yes. This was exactly my conclusion. Namely, it would require quite
some work to implement a coalescer this way using this interface.
Too many pieces are missing yet.

> I have not pushed those upstream for various reasons,
> including that at the time the register coalescing code was in a great state
> of flux and it was difficult to track.  It will be a bit of work to get it all
> up-to-date for merging.

I see.

> Evan is right that coalesceFunction is the primary interface to invoke the
> coalescer.  The iterative coalescer I wrote implements that function.  The
> iterative coalescer is also a FunctionPass which does nothing.  It's simply
> there to fulfill the requirement that the register allocator depends on a
> coalescer.  This way I could use the SimpleRegisterCoalescer or the new
> iterative coalescer in a pluggable manner.

Yes. Using coalescers in a pluggable manner would be the ideal solution.

> The generalized graph coloring
> allocator holds a reference to the coalescer.  SimpleRegisterCoalescing does
> nothing in its implementation of coalesceFunction.

The idea with the empty pass is really nice, but looks a bit like a hack ;-)

> This is all very far from ideal.

;-)

> What we really need is a PassManager that
> knows about multually dependent, iterative algorithms.  I thought about this a
> bit back in the day and came up with the idea of a hierarchy of PassManagers.
> A set of iterative passes could live in a PassManager under the top-level
> PassManager and the child PassManager could specify dependencies for all of
> the passes it manages.  I didn't get very far with this so I don't know it
> it's a viable solution.

I was also thinking about something of this kind. But I was too lazy
to come up with a practical solution for it.

> The iterative coalescer does its own correctness and profitability analysis
> but reuses the code from SimpleRegisterCoalescing to do the actual LLVM IR and
> dataflow updates.  That's non-trivial work.

Oh, yes.

> The code to do this in SimpleRegisterCoalescing is a bit...well...convoluted.  :)

I'd say that the whole SimpleRegisterCoalescing is a
bit...well...overcomplicated :-)

It does a very, very good job doing aggressive coalescing (and taking
into account subtle subregs related things and updating the
LiveIntervals information), but IMHO it became so complex that it is
almost unmaintainable. I've seen different coalescers for graph
coloring (Chitin, Chaitin-Briggs, iterated register coalescing,
chordal graph based ones) and linear scan (Wimmer-based) register
allocators. But SimpleRegisterCoalescing in LLVM (I particulary like
"Simple" in its name ;-) is by far the most complex one I've seen so
far.  I think, only Evan can completely understand it by now. And
there are still subtle bugs in it being found every week or two and
fixed in the mainline. It is not a very good sign...

Not that I have a constructive solution for it, but may at some point
it should be split into independent parts that are more
understandable? Or may be it should be even completely redesigned to
become more understandable?

> I did a significant amount of work to separate that code out from the dataflow and
> profitability analysis that determines when to coalesce copies.

> This I think is very valuable work and I'd like to push it back.  It's always
> good to separate analyses and updates as much as possible so that things can
> be reused.

Absolutely!

> But I can't promise that I'll be able to merge all of the work I've  done.  We
> have quite a bit going on here right now and I've been moved to other bits of
> work.  I'll see if I can get some things accomplished out-of-band.

It would be very good, if you could push this code back to LLVM and
make it available for others.
I hope very much that you'll find some time for it!

On my side, after looking at alternatives, I decided to use my own
custom coalescers for now. They are small, clean and do what I need.
And I have even a limited reuse among the three GC-based register
allocators I'm working on (Chitin, Chitin-Briggs, iterated register
coalescing). It is not an ideal solution; but for now I'm more
concerned about the quality and speed of my register allocators. Once
I'm happy with that, I'll probably have a closer look and refactor the
code to make it more reusable as a part of a framework.

-Roman