[LLVMdev] Enabling MI Scheduler on x86 (was Experimental Evaluation of the Schedulers in LLVM 3.3)

Thu Sep 26 06:33:25 PDT 2013

Hi,

Thanks for your explanations!

How is the big picture for supporting in-order VLIW architectures and 
the like though?

I am asking because I am currently implementing instruction scheduling 
in our own backend for our custom Patmos processor, for which I need to 
support both branch delay slots and bundles, some restrictions regarding 
bundles.
For the moment, I am quite happy with a simple bottom-up basic-block 
scheduler. I tried to use a combination of the DFAPacketizer and a 
simple delay-slot-filler pass first, but the results are quite bad, both 
in terms of performance and in terms of maintainability/code quality.

I found that currently the PostRA scheduler is nearly similar to the MI 
Scheduler, except that it uses the Anti-Dep-Breaker instead of live 
register tracking and that it is not customizable, while the MI 
scheduler cannot be run post-RA due to the dependency on the live 
variable analysis which requires SSA code.

We would like to be able to schedule spill code and prologue/epilogue 
code (and if-converted code, which is currently required to be post-RA, 
I think?). Hence, I basically created a new post-RA scheduler similar to 
MI scheduler, which does bundling and handles delay slots and NOOP 
insertion. The downside is that there is a lot of code duplication, 
since the MI scheduler usually uses ScheduleDAGMI and not the more 
generic ScheduleDAGInstr at the interfaces.

So here are my questions:

- Are there any plans for a (more generic) post-RA scheduler 
replacement, or the possibility to run the MI scheduler post-RA (i.e., 
without live variable analysis depenency), or is simply creating a 
completely separate pass based on ScheduleDAGInstr the 'official' way to 
handle hardware with no hazard detection?

- Is the MI scheduler supposed to create bundles (there is no support 
for this now as far as I can see, and some passes might need to break up 
some bundles later on) or should this only be done post-RA? Is the 
register allocator (supposed to be) able to handle bundles, or should 
the MI scheduler just order the instructions in the right sequence 
without actually creating bundles (which might cause some live ranges to 
seem to overlap when in fact they don't)?

- Will there be any generic pass or framework for filling delay slots 
and inserting NOOPs on hazards, similar to post-RA-sched, or has this 
always to be a custom scheduler/delay-slot-filler/...?

- And slightly unrelated: At which point should/must bundles be finalized?

Kind regards,
  Stefan

On 2013-09-24 08:11, Andrew Trick wrote:
> In my last message, I explained the goals of the generic MI scheduler and current status. This week, I'll see if we can enable MI scheduling by default for x86. I'm not sure which flags you're using to test it now. But by making it default and enabling the corresponding coalescer changes, we can be confident that benchmarking efforts are improving on the same baseline. At that point, I expect bugs to be filed for specific instances of badly scheduled code. Getting a fix committed may not be easy, because we have to show that new heuristics aren't likely to pessimize other code. But at least I'll be able to provide an explanation for why MI isn't currently handling it.
>
> There are other reasons that MI sched should be enabled now on x86 anyway:
>
> (1) The Selection DAG scheduler will be disabled as soon as I can implement a complete replacement. That should eliminate about 10% of codegen (llc) compile time. The Selection DAG scheduler has also long suffered from unnacceptable worst-cast compile time behavior and unresolved defects. We been chipping away at the problems, but some remain: PR15941, PR16365. This is a fundamentally bad place to perform scheduling.
>
> (2) The postRA scheduler will also be eliminated. That will eliminate another 10% of compile time for targets that currently enable it. It also eliminates a maintenance problem because its dependence on kill flags and implicit operands is frightening--these can easily be valid for some targets but not others.
>
> (3) Non-x86 targets have been using MI sched for the past year to achieve important performance and compile time benefits. For quality and maintenance reasons, we should use the same scheduling infrastructure for mainstream targets.
>
> The basic theme here is that we want a single scheduling infrastructure that is efficient enough to enable by default--even if it is typically performance-neutral, can leverage verification across many targets, and can be safely customized by plugging in heuristics.
>
> -Andy
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>