[LLVMdev] Scheduler Roadmap

Fri May 11 09:53:14 PDT 2012

  My 2c...

  Even though I understand it might be way off in the future, but we are
talking about long term plans here anyway. Also as a VLIW backend
maintainer, I just have to say it :)

  - We do need to have a way to assign bundles much earlier than we do now.
And it needs to be intertwined with scheduling (Bundler currently reuses a
good chunk of scheduler infrastructure). It is also obvious that it will
have adverse effect on all the downstream passes. It is further insulting
due to the fact that bundling is trivial to do during scheduling, but it
goes hard against the original assumptions made elsewhere. Re-bundling is
also a distinct task that might need to be addressed in this context.

  - We do need to have at least a distant plan for global scheduling. BB
scope is nice and manageable, but I can easily get several percent of
"missed" performance by simply implementing a pull-up pass at the very end
of code generation... meaning multiple opportunities were lost earlier.
Current way to express and maintain state needed for global scheduling
remains to be improved.

  - SW pipelining and scheduler interaction with it. When (not if:) we will
have a robust SW pipeliner it will likely to take place before first
scheduling pass, and we do not want to "undo" some decision made there.

Sergei

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum.

> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu]
> On Behalf Of Andrew Trick
> Sent: Friday, May 11, 2012 12:29 AM
> To: Hal Finkel
> Cc: dag at cray.com; wrf at cray.com; LLVM Developers Mailing List
> Subject: Re: [LLVMdev] Scheduler Roadmap
> 
> On May 10, 2012, at 9:06 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> >> - Target pass configuration: DONE
> >> - MachineScheduler pass framework: DONE
> >> - MI Scheduling DAG: DONE
> >> - AliasAnalysis aware DAG option: In review (Sergei)
> >> - Bidirectional list scheduling: DONE
> >> - LiveInterval Update: WIP (simple instruction reordering is
> >> supported)
> >> - Target-independent precise modeling of register pressure: DONE
> >> - Register pressure reduction scheduler: WIP
> >> - Support for existing HazardChecker plugin
> >
> > Is support for the existing hazard detectors working now? [it does
> not
> > say DONE or WIP here, but your comment below implies, I think, that
> it
> > is at least partially working].
> 
> Glad you're interested. I can explain. We have several important tools
> in LLVM that most schedulers will need. That's what I was listing below
> (Configurable pass, DAG, LI update, RegPressure, Itinerary,
> HazardChecker--normally called a reservation table).
> 
> I really should have also mentioned the DFAPacketizer developed by the
> Hexagon team. It's being used by their VLIW scheduler, but not by the
> new "standard" scheduler that I'm working on.
> 
> Now that I mentioned that, I should mention MachineInstrBundles, which
> was a necessary IR feature to support the VLIW scheduler, but has other
> random uses--sometimes we want to glue machine instructions.
> 
> HazardChecker was already being used by the PostRA scheduler before I
> started working on infrastructure for a new scheduler. So it's there,
> and can be used by custom schedulers.
> 
> My first goal was to complete all of these pieces. They're in pretty
> good shape now but not well tested. The target independent model for
> register pressure derived from arbitrary register definitions was by
> far the most difficult aspect. Now I need to develop a standard
> scheduling algorithm that will work reasonably well for any target
> given the register description and optionally a scheduling itinerary.
> 
> The register pressure reduction heuristic was the first that I threw
> into the standard scheduler because it's potentially useful by itself.
> It's WIP.
> 
> I haven't plugged in the HazardChecker, but it's quite straightforward.
> 
> At that point, I'll have two competing scheduling constraints and will
> begin implementing a framework for balancing those constraints. I'll
> also add fuzzy constraints such as expected latency and other cpu
> resources.  When I get to that point, I'll explain more, and I hope you
> and others will follow along and help with performance analysis and
> heuristics.
> 
> I will point out one important aspect of the design now. If scheduling
> is very important for your target's performance, and you are highly
> confident that you model your microarchitecture effectively and have
> just the right heuristics, then it might make sense to give the
> scheduler free reign to shuffle the instructions. The standard
> MachineScheduler will not make that assumption. It certainly can be
> tuned to be as aggressive as we like, but unless there is high level of
> confidence that reordering instructions will be beneficial, we don't
> want to do it. Rule #1 is not to follow blind heuristics that
> reschedule reasonable code into something pathologically bad. This
> notion of confidence is not something schedulers typically have, and is
> fundamental to the design.
> 
> For example, most schedulers have to deal with opposing constraints of
> register pressure and ILP. An aggressive way to deal with this is by
> running two separate scheduling passes. First top-down to find the
> optimal latency, then bottom-up to minimize resources needed to achieve
> that latency. Naturally, after the first pass, you've shuffled
> instructions beyond all recognition. Instead, we deal with this problem
> by scheduling in both directions simultaneously. At each point, we know
> which resources and constraints are likely to impact the cpu pipeline
> in both the top and bottom of the scheduling region. Doing this doesn't
> solve any fundamental problem, but it gives the scheduler great freedom
> at each point, including the freedom to do absolutely nothing, which is
> probably exactly what you want for a fair amount of X86 code.
> 
> >> - New target description feature: buffered resources
> >> - Modeling min latency, expected latency, and resources constraints
> >
> > Can you comment on how min and expected latency will be used in the
> > scheduling?
> 
> In the new scheduler's terminology, min latency is an interlocked
> resource, and expected latency is a buffered resource. Interlocked
> resources are used to form instruction groups (for performance only,
> not correctness). For out-of-order targets with register rename,  we
> can use zero-cycle min latency so there is no interlock within an issue
> groups. Instead we know expected latency of the scheduled instructions
> relative to the critical path. We can balance the schedule so that
> neither the expected latency of the top nor bottom scheduled
> instructions exceed the overall critical path. This way, we will slice
> up two very long independent chains into neat chunks, instead of the
> random shuffling that we do today.
> 
> >> - Heuristics that balance interlocks, regpressure, latency and
> >> buffered resources
> >>
> >> For targets where scheduling is critical, I encourage developers who
> >> stay in sync with trunk to write their own target-specific scheduler
> >> based on the pieces that are already available. Hexagon developers
> >> are doing this now. The LLVM toolkit for scheduling is all there--
> not
> >> perfect, but ready for developers.
> >>
> >> - Pluggable MachineScheduler pass
> >> - Scheduling DAG
> >> - LiveInterval Update
> >> - RegisterPressure tracking
> >> - InstructionItinerary and HazardChecker (to be extended)
> >>
> >> If you would simply like improved X86 scheduling without rolling
> your
> >> own, then providing feedback and test cases is useful so we can
> >> incorporate improvements into the standard scheduler while it's
> being
> >> developed.
> >
> > Does this mean that we're going to see a new X86 scheduling paradigm,
> > or is the existing ILP heuristic, in large part, expected to stay?
> 
> It's a new paradigm but not a change in focus--we're not modeling the
> microarchitecture in any greater detail. Although other contributors
> are encouraged to do that.
> 
> Both schedulers will be supported for a time. In fact it will make
> sense to run both in the same compile, until MISched is good enough to
> take over. It will be easy to determine when one scheduler is doing
> better than the other. I'm relying on you to tell me when it's doing
> the wrong thing.
> 
> -Andy
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev