[llvm-commits] [PATCH] Stack Coloring optimization

Mon Sep 3 01:05:11 PDT 2012

Hi Nuno!

Thanks for the feedback. Comments below.

> - AFAICT most (IR) optimizations do not make any effort to keep these lifetime markers. They simply discard them.  Probably some optimizations must be tweaked to try to preserve those.

Yes. I plan to go over IR optimizations and see what I can do there. 

> - Probably there's also juice to be extracted from single functions where none of the callees was inlined (and therefore no markers have been placed). Should this pass try to compute the liveness intervals for allocas and then intersect the computed intervals with the lifetime markers (that may or may not exist)?  Or should we have an IR pass that inserts (and tightens) these markers?

The Clang folks said that they will look into placing lifetime markers in more places. 

> - Can you please add a comment describing the merging algorithm?  It seems to be a greedy, O(n^2) algorithm, where n is the number of Allocas.  It's good to have it documented because we may want to experiment with different heuristics in the future.

Done.

> - Other interesting but more complex trick to try would be to share a merge of a big object with several smaller ones. Right now the merging algorithm doesn't split objects and doesn't allow objects with overlapping liveness intervals to be merged with a single object (into disjoint parts, of course).
> - You're missing a test for a non-ending lifetime markers region.

Done. I also added some other tests. 

> 
> Ok, I guess it's enough of random stuff for now. I pretty much like the idea.
> 
> Nuno
> 

Thanks,
Nadav

> 
> ----- Original Message ----- From: "Nadav Rotem" <nrotem at apple.com>
> To: <llvm-commits at cs.uiuc.edu>
> Sent: Thursday, August 30, 2012 10:51 AM
> Subject: [llvm-commits] [PATCH] Stack Coloring optimization
> 
> 
>> Hi All,
>> 
>> I've been working on a new optimization for reducing the stack size. Currently, when we declare allocas in LLVM IR, these allocas are directly translated to stack slots. And when we inline small functions into larger function, these allocas add up and take up lots of space.  In some cases we know that the use of the allocas is bounded by disjoint regions.  In this optimization we merge multiple disjoint slots into a single slot. LLVM uses the lifetime markers for specifying the regions in which the allcoa is used.  This patch propagates the lifetime markers through SelectionDAG and makes them pseudo ops.  Later, a pre-register-allocator pass constructs live intervals which represent the lifeless of different stack slots. Next, the pass merges disjoint intervals.  Notice that lifetime markers and not perfect single-entry-single exit regions. They may be removed by optimizations, they may start with two markers, and end with one, or even not end at all!
>> 
>> So, why is this done in codegen?  There are a number of reasons. First, joining allocas may hinder alias analysis. Second, in the future we would like to share the alloca space with spill slots.
>> 
>> The inliner has a 'hack' for merging allocas when inlining functions. We plan to remove this hack once this pass is tuned and we see that there are no regressions.  Also, we plan to look at joining multiple non-disjoint slot into a bigger disjoint slot.
>> 
>> This work is based on code by Owen, and on feedback and ideas from a number of other engineers at Apple.
>> 
>> Any comments or review are much appreciated.
>> 
>> 
>> Thanks,
>> Nadav 
>