[LLVMdev] [RFC] Scheduler Rework

Mon Apr 23 11:09:52 PDT 2012

On Apr 20, 2012, at 10:31 AM, dag at cray.com wrote:

> I'd like to begin a project to rework the scheduler to address some
> problems we've discovered on this end.  The goal is to get a more
> configurable/flexible scheduler while simplifying maintenance by
> separating policy from implementation to get independent and
> interchangeable parts.
> 
> This is going to be challenging because we are still stuck on LLVM 2.9.
> We will be upgrading to LLVM 3.1 when it is released but I really need
> to start this work now to meet work deadlines.  But I want to make sure
> I have something I can contribute upstream so I want to present the idea
> and get some input on whether it is a good design and how best to move
> it upstream.  It looks like the fundamentals of the scheduler haven't
> changed much from 2.9 to 3.1 so that makes it a bit easier.

We plan to move to the MachineScheduler by 3.2. The structure is:

ScheduleDAG: Abstract DAG of SUnits and SDeps
  |
  v
ScheduleDAGInstrs: Build the DAG from MachineInstrs, each SUnit tied to an MI
                   Delimit the current "region" of code being scheduled.
  |
  v
ScheduleDAGMI: Concrete implementation that supports both top-down and bottom-up scheduling
               with live interval update. It divides the region into three zones:
               Top-scheduled, bottom-scheduled, and unscheduled.

The ScheduleDAGMI constructor takes an instance of MachineSchedStrategy. This is currently a very simply interface that provides pickNode(), releaseTopNode(), releaseBottomNode().

The MachineScheduler is plugable at every level.

1. The pass itself is optional. Targets may disable or override it
   completely. For example, a target that implements global scheduling
   would need to override the standard pass.

2. Targets may create a MachineSchedRegistry instance that knows how
   to construct a custom ScheduleDAGInstrs. This is an easy way to
   reuse the standard scheduling pass and DAG builder. The driver will
   determine the scheduling region, but everything else is
   customizable.

3. New scheduling algorithms can be introduced by implementing new
   MachineSchedStrategies.  We can add a target hook to select the
   strategy if needed.

There's still plenty of opportunity to restructure the code and add customization points. Keep in mind that the goal for the target independent code is clarity and maintainability. Code reuse across custom schedulers is secondary and I'm not currently thinking of designing a template library for custom schedulers.

As much as possible, the standard scheduling algorithm will be built from standalone utilities and data structures. The customizations that you describe would all be handled by providing a new MachineSchedStrategy. Start by composing your scheduler from the pieces that are available, e.g. HazardChecker, RegisterPressure... (There's not much value providing a scheduling queue abstraction on top of vector or priority_queue). From that point, we can improve the design.

Here's what you can expect in the MachineScheduler:

1. Precise register pressure tracking with back off. I'm in the process of checking in the pieces for this. It should be usable sometime in the next couple of weeks.

2. Division of the target-defined resources into "interlocked" vs. "buffered". The HazardChecker will continue to handle the interlocks for the resources that the hardware handles in order. Buffered resources, which the hardware can handle out of order. These will be considered separately for scheduling priority. We will also make a distinction between minimum and expected operation latency.

3. A register pressure reducing scheduler that considers interlocks and register pressure. This will prioritize register reuse above the pressure limits and include a very simple heuristic for avoiding high register pressure situations.

4. A balanced scheduler that still prioritizes interlocks and register pressure limits, but balances buffered resource usage with register pressure avoidance.

5. Improved heuristics. For example, we can precompute register lineages and use that information to avoid high register pressure situations.

The overarching goal of the standard scheduling algorithm is to handle the most important cases that the SDScheduler and PostRAScheduler handle today without unnecessarily shuffling instructions. This should result in more debugable generated code with more stable performance. The goal of the scheduler framework is to support the standard scheduler while providing a place for targets to introduce a custom scheduler, or simply any optimization that requires the scheduling DAG.

-Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120423/e008fc18/attachment.html>