[LLVMdev] [RFC] Scheduler Rework

Fri Apr 20 10:31:28 PDT 2012

Hey Everyone,

I'd like to begin a project to rework the scheduler to address some
problems we've discovered on this end.  The goal is to get a more
configurable/flexible scheduler while simplifying maintenance by
separating policy from implementation to get independent and
interchangeable parts.

This is going to be challenging because we are still stuck on LLVM 2.9.
We will be upgrading to LLVM 3.1 when it is released but I really need
to start this work now to meet work deadlines.  But I want to make sure
I have something I can contribute upstream so I want to present the idea
and get some input on whether it is a good design and how best to move
it upstream.  It looks like the fundamentals of the scheduler haven't
changed much from 2.9 to 3.1 so that makes it a bit easier.

Most specifically, I would like some comment on whether it is reasonable
to just start a new RegReduction-like scheduler from scratch rather than
try to incrementally change the existing one.  I'm looking at a pretty
invasive reengineering of the scheduler structure so that it will have
familiar elements but be put together quite differently.  That's why I'm
not sure incremental change to the existing code makes sense.  The two
implementations could live side-by-side for a while if that makes sense.

The existing RegReduction scheduler design goes something like this:

            SchedulingPriorityQueue
                      ^
                      |
              RegReductionPQBase
                      ^
                      |
          RegReductionPriorityQueue<SF>
                                    ^
                                   /|\
                            ______/ | \______
                           |        |        |
                      Heuristic 1  ...  Heuristic N

The only point of customization right now is the template parameter to
RegReductionPriorityQueue.  This is the sort criteria for prioritizing
which node gets scheduled next.  SchedulingPriorityQueue has some
virtual functions which can be (and are) overridden by
RegReductionPQBase and/or RegReductionPriorityQueue.  However, the
current inheritance structure makes it impossible to tweak much even by
subclassing RegReductionPriorityQueue, either because the customization
points aren't virtual (easy to fix) or because customization points are
coupled (hard to fix without a redesign).

Here are some things we have found useful to tune.  Most are not
currently orthogonal and thus are not easily separated from each other:

- Node priority function (the SF parameter above).  This is
  well-separated and easily changed.

- Queue implementation.  The current implementation uses a std::vector<>
  and has very bad asymptotic performance (N^2 currently).  Previously
  this was a std::priority_queue<> but was changed because it missed
  some of the dynamic variance of node priorities.  We have found it
  useful and essential to be able to replace this but even after
  inheriting from RegReductionPriorityQueue it's pretty ugly because
  there are two spearate queues in the scheduler, one completely unused.
  Different queue types exhibit different heuristic behaviors based on
  how much dynamism they provide.  Thus the queue implementation is
  policy that really ought to be separated from the scheduler proper.

- Register pressure heuristic.  The ILP scheduler uses a fairly
  simplistic register pressure calculation which is pretty good in most
  cases but we have found it to not work well in a few important cases.
  It would be nice to make this a customization point.  It is a separate
  heuristic from the node priority function.  In fact some node priority
  functions look at the register pressure heuristic to calculate their
  result.  The two are orthogonal and we should be able to vary them
  independently.  Currently the register pressure heuristic is coupled
  tightly to RegReductionPQBase.  Again due to the data structures begin
  in RegReductionPQBase it is not convenient to simply make some of the
  heuristic methods virtual and override them.  Such an arrangement
  doesn't allow easy mix-and-match among register pressure and node
  priority heuristics.  Moreover, more elaborate register pressure
  calculations will almost certainly involve very different data
  structures.

Here's a quick sketch of a new design floating in my head.  I don't have
any code yet so this is subject to change but it gets the general idea
across.  The idea is to configure the scheduler via policy classes.

            SchedulingPriorityQueue
                      ^
                      |
          RegReductionPriorityQueue<Queue,   Pressure,         Sort>
                                     ^           ^              ^
                                    /|\          |              |
                               ____/ | \____     |              |
                              |      |      |    |              |
                           Queue 1  ...  Queue N |              |
                                                 |              |
                                                 |              |
                                                /|\             |
                                         ______/ | \_____       |
                                        |        |       |      |
                                    Pressure 1  ...  Pressure N |
                                                               /|\
                                                           ___/ | \___
                                                          |     |     |
                                                       Sort 1  ...  Sort N

I'll have to work out exactly how to allow the policy classes to
communicate, for example to allow the Sort heuristic to contact the
Pressure heuristic for information.  There are a few ways to do that.
One is to reintroduce RegReductionPQBase with some pure virtual
functions and use multiple inheritance from RegReductionPriorityQueue to
let the policy classes override them.  A better way, I think, would be
to use composition and pass pointers to each policy object to the
others.

The above design also should allow simple expansion in the future.  If
we identify more customization points we can introduce new policy
classes pretty easily.

Does the above look like a good design?  If so, I believe it is easiest
to build this from scratch (borrowing a lot of code from the current
scheduler) rather than try to incrementally morph the curent scheduler
to this design but I am open to guidance.

Thanks for your help!

                             -Dave