[llvm-commits] [PATCH] Stack Coloring optimization
Nuno Lopes
nunoplopes at sapo.pt
Sun Sep 2 11:57:36 PDT 2012
Hi,
I generally like the approach. It makes sense to me to do it in codegen as
well.
Let me just give you a few random thoughts (mostly for future work):
- AFAICT most (IR) optimizations do not make any effort to keep these
lifetime markers. They simply discard them. Probably some optimizations
must be tweaked to try to preserve those.
- Probably there's also juice to be extracted from single functions where
none of the callees was inlined (and therefore no markers have been placed).
Should this pass try to compute the liveness intervals for allocas and then
intersect the computed intervals with the lifetime markers (that may or may
not exist)? Or should we have an IR pass that inserts (and tightens) these
markers?
- Can you please add a comment describing the merging algorithm? It seems
to be a greedy, O(n^2) algorithm, where n is the number of Allocas. It's
good to have it documented because we may want to experiment with different
heuristics in the future.
- Other interesting but more complex trick to try would be to share a merge
of a big object with several smaller ones. Right now the merging algorithm
doesn't split objects and doesn't allow objects with overlapping liveness
intervals to be merged with a single object (into disjoint parts, of
course).
- You're missing a test for a non-ending lifetime markers region.
Ok, I guess it's enough of random stuff for now. I pretty much like the
idea.
Nuno
----- Original Message -----
From: "Nadav Rotem" <nrotem at apple.com>
To: <llvm-commits at cs.uiuc.edu>
Sent: Thursday, August 30, 2012 10:51 AM
Subject: [llvm-commits] [PATCH] Stack Coloring optimization
> Hi All,
>
> I've been working on a new optimization for reducing the stack size.
> Currently, when we declare allocas in LLVM IR, these allocas are directly
> translated to stack slots. And when we inline small functions into larger
> function, these allocas add up and take up lots of space. In some cases
> we know that the use of the allocas is bounded by disjoint regions. In
> this optimization we merge multiple disjoint slots into a single slot.
> LLVM uses the lifetime markers for specifying the regions in which the
> allcoa is used. This patch propagates the lifetime markers through
> SelectionDAG and makes them pseudo ops. Later, a pre-register-allocator
> pass constructs live intervals which represent the lifeless of different
> stack slots. Next, the pass merges disjoint intervals. Notice that
> lifetime markers and not perfect single-entry-single exit regions. They
> may be removed by optimizations, they may start with two markers, and end
> with one, or even not end at all!
>
> So, why is this done in codegen? There are a number of reasons. First,
> joining allocas may hinder alias analysis. Second, in the future we would
> like to share the alloca space with spill slots.
>
> The inliner has a 'hack' for merging allocas when inlining functions. We
> plan to remove this hack once this pass is tuned and we see that there are
> no regressions. Also, we plan to look at joining multiple non-disjoint
> slot into a bigger disjoint slot.
>
> This work is based on code by Owen, and on feedback and ideas from a
> number of other engineers at Apple.
>
> Any comments or review are much appreciated.
>
>
> Thanks,
> Nadav
More information about the llvm-commits
mailing list