[llvm-commits] [PATCH] Stack Coloring optimization

Thu Aug 30 07:19:11 PDT 2012

On Thu, Aug 30, 2012 at 5:51 AM, Nadav Rotem <nrotem at apple.com> wrote:
> Hi All,
>
> I've been working on a new optimization for reducing the stack size.  Currently, when we declare allocas in LLVM IR, these allocas are directly translated to stack slots. And when we inline small functions into larger function, these allocas add up and take up lots of space.  In some cases we know that the use of the allocas is bounded by disjoint regions.  In this optimization we merge multiple disjoint slots into a single slot.  LLVM uses the lifetime markers for specifying the regions in which the allcoa is used.  This patch propagates the lifetime markers through SelectionDAG and makes them pseudo ops.  Later, a pre-register-allocator pass constructs live intervals which represent the lifeless of different stack slots. Next, the pass merges disjoint intervals.  Notice that lifetime markers and not perfect single-entry-single exit regions. They may be removed by optimizations, they may start with two markers, and end with one, or even not end at all!
>
> So, why is this done in codegen?  There are a number of reasons. First, joining allocas may hinder alias analysis. Second, in the future we would like to share the alloca space with spill slots.
>
> The inliner has a 'hack' for merging allocas when inlining functions. We plan to remove this hack once this pass is tuned and we see that there are no regressions.  Also, we plan to look at joining multiple non-disjoint slot into a bigger disjoint slot.
>
> This work is based on code by Owen, and on feedback and ideas from a number of other engineers at Apple.
>
> Any comments or review are much appreciated.

+  BitVector LiveInToggle = LocalLiveIn;
+  LiveInToggle.reset(LIVE_IN[BB]);
+      if (LiveInToggle.any()) {
+        changed = true;
+        LIVE_IN[BB] |= LocalLiveIn;
+
...

+      }

It looks like you are copying the entire bitvector just to figure out
if the reset changes anything (there are a few other places this is
done too).
That seems ugly and expensive (space/time wise) compared to just
figuring out a good name for such a function in BitVector (say
"Difference" or "EmptyDifference" or something) and implementing it
there, and returning a bool from it.  Besides the space inefficiency
of the copy, difference can return true the second it discovers any
BitWord is different in A - B, whereas yours will process the entire
"B" bitmap, performing a reset of all of those bits, *then* check
whether something has changed.

ie you should just write
// Return true if lhs - rhs is nonempty
bool Bitvector::Difference(Bitvector &lhs, Bitvector &rhs)

and use that.

You also iterate more than just the dirty blocks on each iteration of
the dataflow computation, but I guess it's not expensive enough to
matter.

>
>
>
>
>
> Thanks,
> Nadav
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>