[LLVMdev] Stack roots and function parameters

Wed Sep 22 09:12:53 PDT 2010

On Wed, Sep 22, 2010 at 5:58 AM, Kenneth Uildriks <kennethuil at gmail.com>wrote:

> On Tue, Sep 21, 2010 at 8:20 PM, Talin <viridia at gmail.com> wrote:
> > On Mon, Sep 20, 2010 at 3:16 PM, Talin <viridia at gmail.com> wrote:
> >>
> >> So I've managed to get my stack crawler working and passing its unit
> tests
> >> - this is the one I've been working on as an alternative to
> shadow-stack: it
> >> uses only static constant data structures (no global variables or
> >> thread-local data), which means that it's fully compatible with a
> >> multi-threaded environment.
> >> One question that has arisen, however, is what to do about function
> >> parameters that may be stack roots. You can't call llvm.gcroot() on a
> >> function parameter. The only thing I can think of is to copy any
> function
> >> parameter that is a pointer, or which contains a pointer, into a local
> >> variable. This seems complicated and inefficient to me (for one thing,
> it
> >> means that every stack frame just got 8 bytes bigger due to the need to
> >> store the 'this' pointer locally)  - is there a better way?
>
> If a function parameter is a stack root, then so is the value passed
> through that parameter.  Since they both point to the same thing, only
> one stack root is needed to keep it reachable.  If you've got a root
> for it in the caller, you don't need one in the callee for the
> argument.
>
> >
> > Anyone?
> > Here's some examples of what I am talking about (I'll use Java for the
> > examples):
> > Example 1:
> >   class Foo {
> >     static String st = "Hello";
> >     void f1() {
> >        f2(st);
> >     }
> >     void f2(String s) {
> >       st = null; // Destroy static root
> >       // Now allocate something, which might trigger
> >       // a garbage collection.
> >       StringBuffer sb = new StringBuffer();
> >       // Is 's' still live? It's not stored in any
> >       // alloca, or in any global variable. It only
> >       // exists as a function parameter, which cannot
> >       // be declared as a root via llvm.gcroot.
>
> s is definitely still live assuming f1 copied st into a root before
> using it, which it generally should.
>
> >       sb.append(s);
> >     }
> >   }
> > Example 2:
> >   class Foo {
> >     static void f1() {
> >       // Create a new 'Foo' but don't store it
> >       // anywhere.
> >       new Foo().f2();
> >     }
> >     void f2() {
> >       // Possibly trigger a collection
> >       StringBuffer sb = new StringBuffer();
> >       // Is 'this' still live at this point?
>
> Again, yes, if f1 copied the result of "new Foo()" into a root.
>
> >       sb.append(toString());
> >     }
> >   }
> > Now, I know that the LLVM "Accurate GC" document says that any
> "intermediate
> > values" must be declared as a GC root. My question is, does merely
> _loading_
> > a global variable - or any mutable, non-strictly-local variable for that
> > matter - count as an "intermediate" value in this case?
>
> Yes, I believe it would.  In a multithreaded system, if you load a
> pointer from a global or the heap, and some other thread changes the
> pointer you loaded, you'd better have a stack root for the original
> pointer value you loaded while you're using it.
>
> Now if the global is thread-local, you don't need another root for it
> since no other threads can see it.  And if you can prove that a global
> isn't going to change, you also don't need another root for it.
>
>
> > My problem is that I can't see how to address this without generating
> > horribly bloated and inefficient code. Pretty much every function
> argument
> > that isn't a simple integer or float will have to be copied to a hidden
> > local variable before every function call - in addition to copying it to
> the
> > stack to create the call frame. Even local variables are not immune if
> you
> > allow closures in your language. Under this scenario, calling conventions
> > that pass arguments in registers are utterly futile and save nothing,
> > because everything has to get copied to the stack anyway.
>
> One way or another, stack roots *must* get copied to the stack, so
> that the GC can find them.  Remember the GC is usually running in some
> other thread and can *only* find pointers that live somewhere in
> memory (either the stack or the heap).
>
> > I really wish I could simply declare function arguments as llvm.gcroots.
> For
> > arguments that are not in registers, it would treat it just like an
> alloca,
> > since they both represent stack slots. For arguments that are passed in
> > registers, LLVM should automatically lower it to a non-register argument,
> > making a copy if needed. (I can't do this myself, since I'm not supposed
> to
> > *know* which arguments are passed in registers or not - that depends on
> the
> > target and the code generators and such).
>
> I *believe* the code generator has some optimizations that eliminate
> redundant copies within a block.  At any rate, if you keep your stack
> roots in the caller rather than the callee, then there'll be a number
> of cases where the caller needs to keep that stack root for purposes
> other than that one call.  In those cases, having the stack root
> doesn't impose a cost you aren't already paying just to allow the GC
> to work.
>
> > That still leaves me with the problem of declaring as roots function
> > arguments that are the result of complex calculations, but those are far
> > fewer - declaring temporary vars for those won't bloat the code so badly.
> > But having to copy every single load of a non-local variable into a
> > temporary is a nightmare.
>
> You only have to copy pointers.  If you've got a struct, you only need
> to keep a root for the pointer member(s) of that struct.
>

Here's another way to think about the issue: Compare this whole approach to
stack roots to that of a conservative collector. Since we know conservative
collectors work, the whole reason for making an accurate collector in the
first place is efficiency, right? If we couldn't make an accurate collector
that was more efficient than a conservative collector, then no one would
bother because it's such a pain.

However, looking strictly at stack frames, a conservative collector has a
much more compact stack layout. There's no need to copy anything to local
variables, because by the time you call into the collector proper,
everything - every register variable, every function parameter, and so on -
is already on the stack *somewhere*.

The stack frame layout dictated by llvm.gcroot, by contrast, is much, much
fatter. In a typical deeply-nested call stack, there will be many copies of
the same value on the stack. Let's take the 'this' pointer as an example.
Say we have some method that calls itself recursively, and we are currently
10 levels deep. That means that we have 20 copies of the 'this' pointer on
the stack: Because each invocation of the method has to push the 'this'
pointer on the stack as part of the calling protocol (even if it's passed in
registers, it still has to save the prior value of the register somewhere),
and in addition we also have to save the 'this' pointer as a stack root. A
conservative collector, on the other hand, would be able to get by with only
10 copies of the 'this' pointer.

As far as the issue of eliminating redundant stores, I'm pretty sure that
the optimizers cannot eliminate redundant *roots*. Figuring out the minimal
set of temporary roots needed for a function call is a non-trivial problem
(you don't want to just blindly make every pointer argument to a function a
temporary root, since that just bloats the size of the call frame
needlessly.)

My entire goal here is to get a GC that has high performance and efficiency.
I want my GC to be able to run on everything from a data center to a beagle
board, and use the minimum resources possible.

-- 
-- Talin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100922/b5103a7f/attachment.html>