[llvm-dev] Register pressure calculation in the machine scheduler and live-through registers

Thu Sep 14 18:55:11 PDT 2017

Jonas and Matthias,

Thank you for informative replies!

Implementing that naive and inefficient procedure for getting complete live-in/live-out info sounds like a suitable solution for us at this point. We are working a research project, in which we are experimenting with relatively slow scheduling algorithms to explore the limits of register pressure (RP) reduction. I agree with Matthias that when our experimentation shows that these algorithms make a significant difference, we will worry about putting them in production. So, for now, we will try to implement this naive approach.

We realize that the register allocator will always try to place spills in basic blocks with lower frequencies of execution, which implies that locally reducing RP within a basic block (or a few basic blocks) will not necessarily reduce the static spill count in the whole function, which is the metric that we are currently using. However, with the large size of our data set (all CPU2006 benchmarks with about 900 thousand scheduling regions), we should statistically get a very strong (but not perfect) correlation between local RP and the static spill count. The current correlation is not as strong as expected, and that's why we think that there is a problem with the cost function that we are minimizing. It does not seem to be fully  capturing the register info.

After reading Matthias's comment though, I concluded that the weighted spill count (with each spill multiplied by the block's frequency of execution) would be a better metric for us to use. Of course, we do run the benchmarks and look at the actual execution time once in a while, but we cannot afford doing this after every change that we make. We have started looking into calculating a weighted spill count, but we are having problems with this. We will address that in a separate email.

Overall, we have reasons to believe that precise scheduling algorithms can significantly impact the performance of scientific programs (at least), because most FP2006 benchmarks have significant amounts of spilling in their hot functions, and significant reductions in these hot spills always lead to significant performance gains. Our framework is general enough to support balancing RP and Instruction-level parallelism (ILP), but we are currently focusing on the RP part, targeting processors with powerful out-of-order execution, such as Intel x86. Once we complete our exploration of the limits of RP reduction, we will start experimenting with the balance between RP and ILP on in-order processors.

Thank you again for your insightful comments!

Ghassan Shobaki
Assistant Professor of Computer Science
California State University, Sacramento

________________________________
From: Matthias Braun <matze at braunis.de>
Sent: Tuesday, September 12, 2017 9:50:22 AM
To: Jonas Paulsson
Cc: Shobaki, Ghassan; Andrew Trick; llvm-dev at lists.llvm.org; Kerbow, Austin Michael; ghassanshobaki at gmail.com
Subject: Re: [llvm-dev] Register pressure calculation in the machine scheduler and live-through registers

On Sep 12, 2017, at 6:44 AM, Jonas Paulsson via llvm-dev <llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:

Hi Ghassan,

As for live-through information, we found that the machine scheduler does call initLiveThru() and here is a pointer to the code:

https://gitlab.com/CSUS_LLVM/LLVM_DRAGONEGG/blob/master/Generic/llvmTip/llvm-master/lib/CodeGen/MachineScheduler.cpp#L921

The first part of the comment above initLiveThru() says "The register tracker is unaware of global liveness so ignores normal live-thru ranges...". It is then of course confusing to see these methods like initLiveThru()...

My understanding is that (please correct me if I'm wrong)
1. All instructions are traversed bottom-up during DAG building. While doing this reg pressure is tracked based on just looking at those instructions. So if a def has no use in an mbb it is a "live-out" reg, and if there is a use with no def, it would become "live-in". This is then a kind of local live-through concept, in contrast to a true live-through analysis which would be aware of registers not used/defed in the region as well.
Yes, the first pass during DAG construction determines the maximum register pressure and the list of live-out values. I think the code consults liveintervals to differentiate dead-defs from true live-outs or detect killing uses that aren't marked as such.

2. We should ideally have an analysis of global liveness so that the regpressure trackers could be properly initialized, but this is currently missing. Of course, one might hope that it wouldn't be too hard to extend LiveIntervals to also provide this information... It would be interesting to merely try this and see how valuable it would be...
Again a global minimum amount of spills/reloads is not necessarily better than a good schedule inside a loop with extra spills/reloads outside the loop. But would certainly be worth exploring.
Writing a naive function to determine all live values at a certain point is easy: Just iterate over all vregs and check for each whether the point in question is covered by a live interval. However to do this efficiently so it can be used in production would probably require a concept such as keeping live-in/live-out lists on basic blocks up-to-date. To put that into production we should first demonstrate that it is worth it.

- Matthias

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170915/d906dc3e/attachment-0001.html>