[llvm-commits] [PATCH] Allow SelectionDAGBuilder to reorder loads past stores

Tue Jan 10 12:10:56 PST 2012

On Tue, 2012-01-10 at 09:55 -0800, Andrew Trick wrote:
> On Dec 21, 2011, at 12:36 PM, Hal Finkel wrote:
> 
> > For the purpose of discussion, I have attached an updated version of my
> > patch. This version has partially resolved the scheduling-after-call
> > (excess-spilling) problem by introducing an overriding heuristic into
> > BURRSort which prioritizes all non-loads over calls. Roughly, this
> > causes stuff that can be done before a call (except for loads) to be
> > done prior to the call. In practice, this still does not always happen
> > b/c of the local nature of the scheduling decisions. Surprisingly (to me
> > at least), this relatively simple change seems to have eliminated all of
> > the major performance regressions while leaving a substantial aggregate
> > speedup [for some code it seems better to even prioritize the loads, and
> > I don't understand that at all]. Nevertheless, this change has its
> > downsides, and I would like to discuss various alternatives to
> > mitigating the scheduling problems caused by call-induced spilling.
> > 
> > I consider this patch to be for discussion only b/c I have not included
> > any regression-test updates, and there still is a test-suite failure on
> > x86_64 (and likely on other platforms as well) that need to be resolved.
> 
> Hal,
> 
> Are you still investigating this? FYI: I pretty much threw my hands up when it came to scheduling around call sites. So if you have a better heuristic, I don't see a problem taking the patch. When you think you've resolved the significant regressions (>5%), I'll run my own set of benchmarks to cross check.

Yes, and I've improved the heuristics compared to what I had posted
previously. Thanks for reminding me, I should post an updated patch
[I've been tuning mostly on PPC over the past few weeks, but I should
check the x86 performance again].

> 
> The preRA scheduler already knows how many live ranges an instruction is gen/killing and their register class. You could simply prioritize calls over instructions that net terminate callee save live ranges, and prioritize instructions that net initiate callee save live ranges over calls (i.e. schedule them below calls).

This sounds close to what I ended up doing. The trick is to schedule
things before calls only when doing so results in less spilling.

> 
> You want to be a little careful to keep the queue comparisons transitive (no A < B < A), but we already have at least one case where that's violated.

I think that I was careful to do this, but I'll look at it again.

Thanks again,
Hal

> 
> -Andy

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory