[LLVMdev] Garbage Collection Roots
Talin
viridia at gmail.com
Tue Sep 4 01:15:16 PDT 2007
Gordon Henriksen wrote:
> The intrinsics are entirely neutral to collector implementation, and
> thus to threading. They could easily be used to implement reference
> counting, for instance, which may or may not be implemented in a
> threadsafe manner. However, as with your algorithm, reference counting
> does not require code generator support, and so would not justify the
> introduction of the intrinsics.
Well, I might rethink my algorithm based on what I learn here. :)
> But to elaborate upon the described class of collectors: The emitted
> GC tables do not describe the dynamic call stack; rather, they are
> static data compiled into the executable. Only when cross-referenced
> with the dynamic call stack do they identify the roots. The garbage
> collectors for .NET and Java work in this manner.
>
> Here's an example of the sort of data these tables might contain. (I
> generated this using the -print-gc feature which I have added to llc
> locally; the llc in Subversion does cannot print this data because it
> does not collect it.) This test case has a simple function with 2
> roots and 1 call.
>
> GC roots for fun:
> 0 8[sp]
> 1 4[sp]
> GC safe points for fun:
> label 1: post-call, live = { 0, 1 }
>
> Provided this information, the collector can easily identify the live
> roots within fun's stack frame at runtime—so long as fun is paused at
> the safe point, which it must be if it is not the topmost stack frame.
How do you get the info for the non-topmost stack frames in this case?
Am I going to need to implement my own version of llvm_cg_walk_gcroots() ?
My current mad scheme (which I'm not sure will work) for safe points in
a multi-threaded environment uses a global POSIX reader/writer lock.
Each running thread keeps a read lock on the global lock when it's
running. The old-generation collector attempts to acquire a write-lock
when it needs to do a collection. At some semi-regular interval, the
application threads will need to momentarily release the lock, telling
the global collector its OK to run. (This is only for the old generation
- young generation collections are handled per-thread, hopefully.
Although cross-thread references make that complex...) The advantage of
this scheme is that it doesn't require preemptive thread freezing, icky
pagefault-based write barriers or other platform-specific hacks, the
disadvantage is that all threads have to wait if some thread doesn't
want to release the read lock in a timely manner.
The hard part will be analyzing loops in the generated IR to decide
whether a call to pulse the lock needs to be inserted in the loop body
or not. Obviously, dropping and reacquiring a lock is expensive, so if
the loop is provably short, I don't want to do that.
So it sounds like I should be passing the pointer to the stack root list
at the same time I release the lock? I supposed I could store it in the
TLD which I guess wouldn't be too expensive.
> Walking the stack; identifying the current safe point and stack
> pointer; stopping concurrent threads at safe points; and actually
> tracing the heap—all left as exercises to the interested reader. :)
>
> Here's a resource you might find interesting for further reading:
>
> http://www.cs.kent.ac.uk/people/staff/rej/gcbib/gcbib.html
Cool. I have tons of papers on garbage collection already, though :) But
I will add it to the list.
>> BTW, I'm also noticing that the current SemiSpace example code seems
>> quite incomplete.
>
> I observed that myself.
>
> — Gordon
> ------------------------------------------------------------------------
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
More information about the llvm-dev
mailing list