[llvm] [MachineScheduler] Experimental option to partially disable pre-ra scheduling. (PR #90181)

Wed May 8 02:38:52 PDT 2024

JonPsson1 wrote:

> Thanks for you patch. I have a few questions and observations
> 
> > I found a big regression on SPEC/cactus which comes from enabling the machine scheduler before register allocation
> 
> Are you running pre-ra and post-ra scheduler? Do you see the regression if you run both vs only running pre-ra?

On SystemZ, both machine schedulers are run. Post-RA scheduling is done with a target-specific MachineSchedStrategy and HazardRecognizer which has been running for years so I wouldn't suspect it. In this case it seems obvious to me that the pre-ra scheduler is to blame given the increase in spilling, which only it can affect.

I anyway checked this, and found that there is some additional variation by disabling the post-RA scheduler, but the significant part is definitely the pre-RA pass.

> 
> > The reason seems to be that it prdoduces a schedule that gives heavily increased spilling during register allocation.
> 
> It sound like the long term solution is to address spilling directly. Do you know if the issue is spilling in general or if it has to do specifically with larger regions?

As can be seen in the table I posted, this seems to be a problem in larger regions, indeed. It seems the scheduler has a lot of nodes to pick from and fails to make sensible choices (looking at the DAG) and a lot of unfortunate overlapping results. I am kind of optimistic that this could be fixed by some new heuristic, but not sure exactly how. 

> 
> > cactus ran 8% faster by just disabling the pre-ra machine scheduler
> 
> Is this dynamic instruction count or runtime?

Runtime (and spill/reload instructions).

> 
> Do you see any regressions due to this change in SPEC?

On SystemZ, no. This is probably because disabling bigger regions mainly affects this benchmark.

> 
> What is the difference between the performance with this change vs disabling pre-ra scheduling entirely?

I have seen a few minor regressions by disabling pre-ra scheduling all together (<5%).

> 
> I can take a look to see how this change is impacting benchmarks on RISC-V.

That would be nice - thanks! It would be interesting to see what size limit you end up using as that may be slightly different (number of target instructions).

https://github.com/llvm/llvm-project/pull/90181