[LLVMdev] Garbage collection

Gabor Greif ggreif at gmail.com
Thu Feb 26 08:22:27 PST 2009


On Feb 26, 2:18 pm, Ralf Schneider <li... at gestaltgeber.com> wrote:
> Hello,
>
> 2009/2/26 Talin <viri... at gmail.com>
>
> > The IR-level intrinsics themselves don't much help you *write* a GC, so
> > much as to integrate one with LLVM. What is provided is essentially a
> > mechanism for walking the stack, and a means to insert read/write
> > barriers into the generated code, which together form a tiny fraction of
> > what it would take to design a working collector.
>
> <...>
>
>
>
> > In other words, I think that it should be possible to create a fairly
> > comprehensive and efficient collector implementation in which 80% of the
> > implementation lives in a library, and the language front end generates
> > the remaining 20%. Moreover, it should be possible to design this
> > library in a way that makes it possible for people to replace
> > significant parts of the collector, so that it can be used as a basis
> > for developing a diverse selection of collection strategies. And because
> > all of the parts that are specific to the object model are generated by
> > the front-end, the collector should work for a large number of languages
> > with different object semantics.
>
> IMO LLVM should try to keep its <L>ow <L>evel design. Garbage collection is
> already very high level.
> A library which supports building garbage collectors on the other hand would
> be very helpful of course.
> Basically such a library boils down to an operatic system abstraction
> library. Functions like "stop the world" are completely managed by the OS
> and not the CPU. I'm not sure if such functionality is part of LLVMs goals.
>
> Having that said; functions like :
>
> - Enumerate threads
> - Pause/Resume thread
> - Get root pointers for a thread (including registers)
> - Get a list of modified memory-pages (GetWriteWatch in Windows - used in
> the .net GC)
> - ...
>
> for different platforms - would definitely help building a GC. Just look at
> the source code of the Boehm GC: It's a completely unmaintainable mess of
> #ifdefs
>
> A little bit off topic: Has anybody tried building a concurrent GC - running
> in a different _process_, instead of a thread?

Yes, I had a proof of concept implementation of a GC with
- shared memory as the GC arena,
- (C++) throw-catch-based marking
- simple lookup rules for (in-arena) associated
  instance metadata.

I never had the need to finish the implementation, but
the fork approach worked reasonably well, and the mark
and sweep parts ran in the forked process, with the
shared memory communicating back the collection results.

The most amusing thing was to see how the stack unwinding
in the forked process did the marking and the original process
was not harmed.

I hope to extend the concept to multiple threads by (m)protecting
increasing parts of the arena and hoping that all threads get
caught with time. Finally the last running threads must be
stopped and forced to fork. This last question and how to recover
the threads from the protection violations in the original process
are the big open questions to be solved.

Cheers,

    Gabor

> The idea: To perform a collection you do a fork(). The child process
> collects all unreferenced memory regions and reports them back to the parent
> process. The parrent process waits for the result (in a sperate thread) and
> if it gets the result it frees the memory regions.
> This way you do not have to worry about barriers and all the nasty stuff.
> The maximum pause time is close to zero.
> Of course this is only useful with a working "copy on write" implementation
> of fork().
>
> - Ralf
>
> _______________________________________________
> LLVM Developers mailing list
> LLVM... at cs.uiuc.edu        http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev




More information about the llvm-dev mailing list