On Wed, Sep 22, 2010 at 5:58 AM, Kenneth Uildriks <span dir="ltr"><<a href="mailto:kennethuil@gmail.com">kennethuil@gmail.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">


<div class="im">On Tue, Sep 21, 2010 at 8:20 PM, Talin <<a href="mailto:viridia@gmail.com">viridia@gmail.com</a>> wrote:<br>

> On Mon, Sep 20, 2010 at 3:16 PM, Talin <<a href="mailto:viridia@gmail.com">viridia@gmail.com</a>> wrote:<br>

>><br>

>> So I've managed to get my stack crawler working and passing its unit tests<br>

>> - this is the one I've been working on as an alternative to shadow-stack: it<br>

>> uses only static constant data structures (no global variables or<br>

>> thread-local data), which means that it's fully compatible with a<br>

>> multi-threaded environment.<br>

>> One question that has arisen, however, is what to do about function<br>

>> parameters that may be stack roots. You can't call llvm.gcroot() on a<br>

>> function parameter. The only thing I can think of is to copy any function<br>

>> parameter that is a pointer, or which contains a pointer, into a local<br>

>> variable. This seems complicated and inefficient to me (for one thing, it<br>

>> means that every stack frame just got 8 bytes bigger due to the need to<br>

>> store the 'this' pointer locally)  - is there a better way?<br>

<br>

</div>If a function parameter is a stack root, then so is the value passed<br>

through that parameter.  Since they both point to the same thing, only<br>

one stack root is needed to keep it reachable.  If you've got a root<br>

for it in the caller, you don't need one in the callee for the<br>

argument.<br>

<div class="im"><br>

><br>

> Anyone?<br>

> Here's some examples of what I am talking about (I'll use Java for the<br>

> examples):<br>

> Example 1:<br>

>   class Foo {<br>

>     static String st = "Hello";<br>

>     void f1() {<br>

>        f2(st);<br>

>     }<br>

>     void f2(String s) {<br>

>       st = null; // Destroy static root<br>

>       // Now allocate something, which might trigger<br>

>       // a garbage collection.<br>

>       StringBuffer sb = new StringBuffer();<br>

>       // Is 's' still live? It's not stored in any<br>

>       // alloca, or in any global variable. It only<br>

>       // exists as a function parameter, which cannot<br>

>       // be declared as a root via llvm.gcroot.<br>

<br>

</div>s is definitely still live assuming f1 copied st into a root before<br>

using it, which it generally should.<br>

<div class="im"><br>

>       sb.append(s);<br>

>     }<br>

>   }<br>

> Example 2:<br>

>   class Foo {<br>

>     static void f1() {<br>

>       // Create a new 'Foo' but don't store it<br>

>       // anywhere.<br>

>       new Foo().f2();<br>

>     }<br>

>     void f2() {<br>

>       // Possibly trigger a collection<br>

>       StringBuffer sb = new StringBuffer();<br>

>       // Is 'this' still live at this point?<br>

<br>

</div>Again, yes, if f1 copied the result of "new Foo()" into a root.<br>

<div class="im"><br>

>       sb.append(toString());<br>

>     }<br>

>   }<br>

> Now, I know that the LLVM "Accurate GC" document says that any "intermediate<br>

> values" must be declared as a GC root. My question is, does merely _loading_<br>

> a global variable - or any mutable, non-strictly-local variable for that<br>

> matter - count as an "intermediate" value in this case?<br>

<br>

</div>Yes, I believe it would.  In a multithreaded system, if you load a<br>

pointer from a global or the heap, and some other thread changes the<br>

pointer you loaded, you'd better have a stack root for the original<br>

pointer value you loaded while you're using it.<br>

<br>

Now if the global is thread-local, you don't need another root for it<br>

since no other threads can see it.  And if you can prove that a global<br>

isn't going to change, you also don't need another root for it.<br>

<div class="im"><br>

<br>

> My problem is that I can't see how to address this without generating<br>

> horribly bloated and inefficient code. Pretty much every function argument<br>

> that isn't a simple integer or float will have to be copied to a hidden<br>

> local variable before every function call - in addition to copying it to the<br>

> stack to create the call frame. Even local variables are not immune if you<br>

> allow closures in your language. Under this scenario, calling conventions<br>

> that pass arguments in registers are utterly futile and save nothing,<br>

> because everything has to get copied to the stack anyway.<br>

<br>

</div>One way or another, stack roots *must* get copied to the stack, so<br>

that the GC can find them.  Remember the GC is usually running in some<br>

other thread and can *only* find pointers that live somewhere in<br>

memory (either the stack or the heap).<br>

<div class="im"><br>

> I really wish I could simply declare function arguments as llvm.gcroots. For<br>

> arguments that are not in registers, it would treat it just like an alloca,<br>

> since they both represent stack slots. For arguments that are passed in<br>

> registers, LLVM should automatically lower it to a non-register argument,<br>

> making a copy if needed. (I can't do this myself, since I'm not supposed to<br>

> *know* which arguments are passed in registers or not - that depends on the<br>

> target and the code generators and such).<br>

<br>

</div>I *believe* the code generator has some optimizations that eliminate<br>

redundant copies within a block.  At any rate, if you keep your stack<br>

roots in the caller rather than the callee, then there'll be a number<br>

of cases where the caller needs to keep that stack root for purposes<br>

other than that one call.  In those cases, having the stack root<br>

doesn't impose a cost you aren't already paying just to allow the GC<br>

to work.<br>

<div class="im"><br>

> That still leaves me with the problem of declaring as roots function<br>

> arguments that are the result of complex calculations, but those are far<br>

> fewer - declaring temporary vars for those won't bloat the code so badly.<br>

> But having to copy every single load of a non-local variable into a<br>

> temporary is a nightmare.<br>

<br>

</div>You only have to copy pointers.  If you've got a struct, you only need<br>

to keep a root for the pointer member(s) of that struct.<br>

</blockquote></div><div><br></div>Here's another way to think about the issue: Compare this whole approach to stack roots to that of a conservative collector. Since we know conservative collectors work, the whole reason for making an accurate collector in the first place is efficiency, right? If we couldn't make an accurate collector that was more efficient than a conservative collector, then no one would bother because it's such a pain.<div>


<br></div><div>However, looking strictly at stack frames, a conservative collector has a much more compact stack layout. There's no need to copy anything to local variables, because by the time you call into the collector proper, everything - every register variable, every function parameter, and so on - is already on the stack *somewhere*.</div>


<div><br></div><div>The stack frame layout dictated by llvm.gcroot, by contrast, is much, much fatter. In a typical deeply-nested call stack, there will be many copies of the same value on the stack. Let's take the 'this' pointer as an example. Say we have some method that calls itself recursively, and we are currently 10 levels deep. That means that we have 20 copies of the 'this' pointer on the stack: Because each invocation of the method has to push the 'this' pointer on the stack as part of the calling protocol (even if it's passed in registers, it still has to save the prior value of the register somewhere), and in addition we also have to save the 'this' pointer as a stack root. A conservative collector, on the other hand, would be able to get by with only 10 copies of the 'this' pointer.</div>


<div><br></div><div>As far as the issue of eliminating redundant stores, I'm pretty sure that the optimizers cannot eliminate redundant <i>roots</i>. Figuring out the minimal set of temporary roots needed for a function call is a non-trivial problem (you don't want to just blindly make every pointer argument to a function a temporary root, since that just bloats the size of the call frame needlessly.)</div>


<div><br></div><div>My entire goal here is to get a GC that has high performance and efficiency. I want my GC to be able to run on everything from a data center to a beagle board, and use the minimum resources possible.<br clear="all">


<br>-- <br>-- Talin<br>

</div>