[llvm-commits] PATCH: Expose the merge interface for LiveInterval and use it to speed up a few hot paths in LiveIntervalAnalysis

Sun Jul 15 19:41:41 PDT 2012

On Jul 15, 2012, at 3:00 PM, Chandler Carruth <chandlerc at gmail.com> wrote:

> On Sun, Jul 15, 2012 at 11:43 AM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:
> 
> On Jul 15, 2012, at 5:00 AM, Chandler Carruth <chandlerc at gmail.com> wrote:
> 
> > These two callsites to addRange showed up as fairly hot, taking about 10% of some executions with many basic blocks and variables.
> 
> You mean the asan test case, right?
> 
> Yes, but only after I've switched it to produce more "normal" looking functions. They now look essentially the same as metaprogramming-unrolled loops coming out of uses of Boost and other C++ libraries.

Not quite, there is an important difference. The loop-invariant virtual registers that LICM loves to hoist, normally compress extremely well. A 1000-block loop is usually laid out contiguously, so the live range of a loop-invariant register can be represented as a single LiveRange spanning all 1000 blocks. That is the big deal about live intervals. (My inner mathematician really really wants to swap the names of LiveInterval and LiveRange, BTW).

The asan code contains lots of blocks that end in 'unreachable'. The loop-invariant registers are not live in these blocks, and their live intervals explode because each discontinuity requires a new LiveRange entry.

If you move the 'unreachable' blocks out of the contiguous loop, everything LiveInterval should go much faster. Uniquing them seems like a good idea as well.

> It is important to pay attention to the performance of typical compilations which look very different.
> 
> Very aware. I'm working on a bit of infrastructure here that will let me quickly test a patch out against our entire codebase and see if either the average time to compile or the shape of the distribution of compile times changes. I'm happy to hold back from too many more patches until that is done. We're planning on also running it nightly to track changes over time.

That sounds great. I am particularly concerned about L1 cache and memory bandwidth in general with the changes you are proposing. You may want to keep that in mind if you're running this on server grade hardware. The tradeoffs can be different compared to desktop systems.

> My expectation is that in the small case, the lack of allocation makes these copies not an issue, and in the large case, the actual merge is the dominant issue. The profiles I've seen thus far seem to back that up. I also don't think a batching interface is a huge imposition on the callers. It added two lines of code in both cases?

A case that particularly concerns me is the virtual register that spans a 1000 block loop as I described above. The SparseBitVector with all its flaws uses about 2 bits per live block in the LiveVariables representation. The final LiveInterval will have a single 24-byte LiveRange entry. Your intermediate buffer is 24kB, or in more relevant units: One L1 cache.

The behavior when transferring LiveVariables::VarInfo to a LiveInterval is very different from the join() and mergeValuesInto() cases. There is some serious compression going on. You don't want to write the intermediate uncompressed form to memory. It blows your cache, and it adds unnecessary memory bandwidth.

As I told you, I am planning on killing LiveVariables. That actually involves removing most of LiveIntervalAnalysis.cpp as well. Besides depending on LV, the code has these weird quirks where it can handle non-SSA form, but only the specific limited version that comes out of 2-addr and phi-elim.

The replacement code already exists, it goes:

For each LI:
  LRCalc->reset(LI);
  LRCalc->createDeadDefs(LI);
  LRCalc->extendToUses(LI);

This doesn't depend on LV, but it is still the same fundamental algorithm: A backwards search from each use finds the dominating def and the set of live blocks.

I just noticed that this new code would have the same problem with building up a huge buffer of per-block entries (LiveRangeCalc::LiveIn built by findReachingDefs()). It's currently using a backwards BFS (don't ask), but if I switch it to a backwards DFS post-order, it would be possible to color in the LiveInterval right away for those blocks that are dominated by a single value. (That is all of them when called from LiveIntervalAnalysis.)

The post-order of a backwards DFS is going to be very close to layout order. I suspect that addRange() would perform really well since most calls affect the last few LiveRanges.

> Anyways, let me know if you find my arguments about batch interfaces versus a merge class unconvincing. ;] I'm not in any hurry with this patch as it's just a 10% or 20% drop in very edge-case compilations.

The last issue is that I am trying to kill off the LiveRange struct. It is an implementation detail that I don't want to leak. I would prefer for it to be a private class in LiveInterval. We're not quite there yet, but methods like getVNInfoAt() and extendInBlock() have significantly limited its exposure.

/jakob

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120715/7409254c/attachment.html>