[llvm-commits] [PATCH] Stack Coloring optimization

Sun Sep 2 11:57:36 PDT 2012

Hi,

I generally like the approach. It makes sense to me to do it in codegen as 
well.

Let me just give you a few random thoughts (mostly for future work):
 - AFAICT most (IR) optimizations do not make any effort to keep these 
lifetime markers. They simply discard them.  Probably some optimizations 
must be tweaked to try to preserve those.
 - Probably there's also juice to be extracted from single functions where 
none of the callees was inlined (and therefore no markers have been placed). 
Should this pass try to compute the liveness intervals for allocas and then 
intersect the computed intervals with the lifetime markers (that may or may 
not exist)?  Or should we have an IR pass that inserts (and tightens) these 
markers?
 - Can you please add a comment describing the merging algorithm?  It seems 
to be a greedy, O(n^2) algorithm, where n is the number of Allocas.  It's 
good to have it documented because we may want to experiment with different 
heuristics in the future.
 - Other interesting but more complex trick to try would be to share a merge 
of a big object with several smaller ones. Right now the merging algorithm 
doesn't split objects and doesn't allow objects with overlapping liveness 
intervals to be merged with a single object (into disjoint parts, of 
course).
 - You're missing a test for a non-ending lifetime markers region.

Ok, I guess it's enough of random stuff for now. I pretty much like the 
idea.

Nuno

----- Original Message ----- 
From: "Nadav Rotem" <nrotem at apple.com>
To: <llvm-commits at cs.uiuc.edu>
Sent: Thursday, August 30, 2012 10:51 AM
Subject: [llvm-commits] [PATCH] Stack Coloring optimization

> Hi All,
>
> I've been working on a new optimization for reducing the stack size. 
> Currently, when we declare allocas in LLVM IR, these allocas are directly 
> translated to stack slots. And when we inline small functions into larger 
> function, these allocas add up and take up lots of space.  In some cases 
> we know that the use of the allocas is bounded by disjoint regions.  In 
> this optimization we merge multiple disjoint slots into a single slot. 
> LLVM uses the lifetime markers for specifying the regions in which the 
> allcoa is used.  This patch propagates the lifetime markers through 
> SelectionDAG and makes them pseudo ops.  Later, a pre-register-allocator 
> pass constructs live intervals which represent the lifeless of different 
> stack slots. Next, the pass merges disjoint intervals.  Notice that 
> lifetime markers and not perfect single-entry-single exit regions. They 
> may be removed by optimizations, they may start with two markers, and end 
> with one, or even not end at all!
>
> So, why is this done in codegen?  There are a number of reasons. First, 
> joining allocas may hinder alias analysis. Second, in the future we would 
> like to share the alloca space with spill slots.
>
> The inliner has a 'hack' for merging allocas when inlining functions. We 
> plan to remove this hack once this pass is tuned and we see that there are 
> no regressions.  Also, we plan to look at joining multiple non-disjoint 
> slot into a bigger disjoint slot.
>
> This work is based on code by Owen, and on feedback and ideas from a 
> number of other engineers at Apple.
>
> Any comments or review are much appreciated.
>
>
> Thanks,
> Nadav