[LLVMdev] Enabling MI Scheduler on x86 (was Experimental Evaluation of the Schedulers in LLVM 3.3)
Stefan Hepp
stefan at stefant.org
Thu Sep 26 06:33:25 PDT 2013
Hi,
Thanks for your explanations!
How is the big picture for supporting in-order VLIW architectures and
the like though?
I am asking because I am currently implementing instruction scheduling
in our own backend for our custom Patmos processor, for which I need to
support both branch delay slots and bundles, some restrictions regarding
bundles.
For the moment, I am quite happy with a simple bottom-up basic-block
scheduler. I tried to use a combination of the DFAPacketizer and a
simple delay-slot-filler pass first, but the results are quite bad, both
in terms of performance and in terms of maintainability/code quality.
I found that currently the PostRA scheduler is nearly similar to the MI
Scheduler, except that it uses the Anti-Dep-Breaker instead of live
register tracking and that it is not customizable, while the MI
scheduler cannot be run post-RA due to the dependency on the live
variable analysis which requires SSA code.
We would like to be able to schedule spill code and prologue/epilogue
code (and if-converted code, which is currently required to be post-RA,
I think?). Hence, I basically created a new post-RA scheduler similar to
MI scheduler, which does bundling and handles delay slots and NOOP
insertion. The downside is that there is a lot of code duplication,
since the MI scheduler usually uses ScheduleDAGMI and not the more
generic ScheduleDAGInstr at the interfaces.
So here are my questions:
- Are there any plans for a (more generic) post-RA scheduler
replacement, or the possibility to run the MI scheduler post-RA (i.e.,
without live variable analysis depenency), or is simply creating a
completely separate pass based on ScheduleDAGInstr the 'official' way to
handle hardware with no hazard detection?
- Is the MI scheduler supposed to create bundles (there is no support
for this now as far as I can see, and some passes might need to break up
some bundles later on) or should this only be done post-RA? Is the
register allocator (supposed to be) able to handle bundles, or should
the MI scheduler just order the instructions in the right sequence
without actually creating bundles (which might cause some live ranges to
seem to overlap when in fact they don't)?
- Will there be any generic pass or framework for filling delay slots
and inserting NOOPs on hazards, similar to post-RA-sched, or has this
always to be a custom scheduler/delay-slot-filler/...?
- And slightly unrelated: At which point should/must bundles be finalized?
Kind regards,
Stefan
On 2013-09-24 08:11, Andrew Trick wrote:
> In my last message, I explained the goals of the generic MI scheduler and current status. This week, I'll see if we can enable MI scheduling by default for x86. I'm not sure which flags you're using to test it now. But by making it default and enabling the corresponding coalescer changes, we can be confident that benchmarking efforts are improving on the same baseline. At that point, I expect bugs to be filed for specific instances of badly scheduled code. Getting a fix committed may not be easy, because we have to show that new heuristics aren't likely to pessimize other code. But at least I'll be able to provide an explanation for why MI isn't currently handling it.
>
> There are other reasons that MI sched should be enabled now on x86 anyway:
>
> (1) The Selection DAG scheduler will be disabled as soon as I can implement a complete replacement. That should eliminate about 10% of codegen (llc) compile time. The Selection DAG scheduler has also long suffered from unnacceptable worst-cast compile time behavior and unresolved defects. We been chipping away at the problems, but some remain: PR15941, PR16365. This is a fundamentally bad place to perform scheduling.
>
> (2) The postRA scheduler will also be eliminated. That will eliminate another 10% of compile time for targets that currently enable it. It also eliminates a maintenance problem because its dependence on kill flags and implicit operands is frightening--these can easily be valid for some targets but not others.
>
> (3) Non-x86 targets have been using MI sched for the past year to achieve important performance and compile time benefits. For quality and maintenance reasons, we should use the same scheduling infrastructure for mainstream targets.
>
> The basic theme here is that we want a single scheduling infrastructure that is efficient enough to enable by default--even if it is typically performance-neutral, can leverage verification across many targets, and can be safely customized by plugging in heuristics.
>
> -Andy
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
More information about the llvm-dev
mailing list