[LLVMdev] Enabling MI Scheduler on x86 (was Experimental Evaluation of the Schedulers in LLVM 3.3)
atrick at apple.com
Thu Sep 26 09:38:44 PDT 2013
On Sep 26, 2013, at 6:33 AM, Stefan Hepp <stefan at stefant.org> wrote:
> Thanks for your explanations!
> How is the big picture for supporting in-order VLIW architectures and the like though?
The short answer is to schedule and bundle in the MachineSchedulerPass.
To do your own bundling you may want to override the driver for list scheduling, like VLIWMachineScheduler in HexagonMachineScheduler.cpp .
In theory you should be able to form bundles as you schedule or immediately afterward, see MIBundleBuilder. However, updating the LiveIntervals is not easy. We don’t have great utilities yet. Also, no in-tree targets do that, so you will hit bugs in the regalloc pipeline. This is the right thing to do though, so don’t be discouraged!
Another option is to schedule as if bundlers are generated, but don’t actually bundle into after regalloc. You won’t get the best regalloc that way though.
> I am asking because I am currently implementing instruction scheduling in our own backend for our custom Patmos processor, for which I need to support both branch delay slots and bundles, some restrictions regarding bundles.
> For the moment, I am quite happy with a simple bottom-up basic-block scheduler. I tried to use a combination of the DFAPacketizer and a simple delay-slot-filler pass first, but the results are quite bad, both in terms of performance and in terms of maintainability/code quality.
> I found that currently the PostRA scheduler is nearly similar to the MI Scheduler, except that it uses the Anti-Dep-Breaker instead of live register tracking and that it is not customizable, while the MI scheduler cannot be run post-RA due to the dependency on the live variable analysis which requires SSA code.
Live variables/intervals are unnecessary for post-RA, nothing in the MI scheduler absolutely requires it. We probably just need do define a new wrapper pass that with the proper set of Pass requirements.
> We would like to be able to schedule spill code and prologue/epilogue code (and if-converted code, which is currently required to be post-RA, I think?). Hence, I basically created a new post-RA scheduler similar to MI scheduler, which does bundling and handles delay slots and NOOP insertion. The downside is that there is a lot of code duplication, since the MI scheduler usually uses ScheduleDAGMI and not the more generic ScheduleDAGInstr at the interfaces.
Sure, let’s fix ScheduleDAGMI so it has no preRA assumptions.
> So here are my questions:
> - Are there any plans for a (more generic) post-RA scheduler replacement, or the possibility to run the MI scheduler post-RA (i.e., without live variable analysis depenency), or is simply creating a completely separate pass based on ScheduleDAGInstr the 'official' way to handle hardware with no hazard detection?
Let's run the MI scheduler post-RA. All I need is a free afternoon and a client to test it out.
I would like to figure out a way to run it conditionally on prolog/epilog and spill blocks. That may not matter for you if you need to run the pass to do bundling.
> - Is the MI scheduler supposed to create bundles (there is no support for this now as far as I can see, and some passes might need to break up some bundles later on) or should this only be done post-RA? Is the register allocator (supposed to be) able to handle bundles, or should the MI scheduler just order the instructions in the right sequence without actually creating bundles (which might cause some live ranges to seem to overlap when in fact they don't)?
MI scheduler is supposed to allow creating bundles, but support for it is weak. The only major challenge is updating live intervals. LiveIntervals::repairIntervalsInRange is only partially implemented.
Each time an instruction is scheduled it is moved in place and LiveIntervals are updated. LiveIntervals needs to remain valid. So for example, you can’t reorder conflicting coalesced vregs or physregs into a bundle without actually forming the bundle on-the-fly:
I0: v1 = ...
I1: v1' = ...
I2: v2 = v1
The scheduler, regalloc, and the rest of the post-scheduling/RA backend don’t assume anything about bundle semantics. A bundle just appears to be a single instruction so the operands of all bundles instrs look like one large operand list. So, regalloc should handle bundles, it just isn’t well tested.
If you defer bundle creation, regalloc will see false interferences among the instructions that should be bundled. Hexagon does this, so you might be able to get more pointers from them. I think this is a quicker route to get something working, but not ideal. Assuming you have a classic VLIW, I think you could pseudo-bundle during preRA MachineScheduler, but not actually form the bundles and avoid reordering instructions within the bundle. Then during postRA scheduling, you can rebundle (spilling may have changed things anyway) and reorder within the bundle to meet any VLIW encoding constraints.
> - Will there be any generic pass or framework for filling delay slots and inserting NOOPs on hazards, similar to post-RA-sched, or has this always to be a custom scheduler/delay-slot-filler/…?
No plans here. I suggest overriding ScheduleDAGMI initially, with some code duplication, then migrating it into the generic code if that will help with future maintenance.
Are you talking about branch delay slots? I don’t know how that’s handled today. I would probably handle it with a bundle.
> - And slightly unrelated: At which point should/must bundles be finalized?
The FinalizeMachineBundles pass is vestigial. It inserts a BUNDLE marker instruction. No target-independent code requires that, so you can ignore it.
> Kind regards,
> On 2013-09-24 08:11, Andrew Trick wrote:
>> In my last message, I explained the goals of the generic MI scheduler and current status. This week, I'll see if we can enable MI scheduling by default for x86. I'm not sure which flags you're using to test it now. But by making it default and enabling the corresponding coalescer changes, we can be confident that benchmarking efforts are improving on the same baseline. At that point, I expect bugs to be filed for specific instances of badly scheduled code. Getting a fix committed may not be easy, because we have to show that new heuristics aren't likely to pessimize other code. But at least I'll be able to provide an explanation for why MI isn't currently handling it.
>> There are other reasons that MI sched should be enabled now on x86 anyway:
>> (1) The Selection DAG scheduler will be disabled as soon as I can implement a complete replacement. That should eliminate about 10% of codegen (llc) compile time. The Selection DAG scheduler has also long suffered from unnacceptable worst-cast compile time behavior and unresolved defects. We been chipping away at the problems, but some remain: PR15941, PR16365. This is a fundamentally bad place to perform scheduling.
>> (2) The postRA scheduler will also be eliminated. That will eliminate another 10% of compile time for targets that currently enable it. It also eliminates a maintenance problem because its dependence on kill flags and implicit operands is frightening--these can easily be valid for some targets but not others.
>> (3) Non-x86 targets have been using MI sched for the past year to achieve important performance and compile time benefits. For quality and maintenance reasons, we should use the same scheduling infrastructure for mainstream targets.
>> The basic theme here is that we want a single scheduling infrastructure that is efficient enough to enable by default--even if it is typically performance-neutral, can leverage verification across many targets, and can be safely customized by plugging in heuristics.
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
More information about the llvm-dev