[LLVMdev] Garbage collection

Thu Feb 26 08:40:22 PST 2009

On 2009-02-26 18:22, Gabor Greif wrote:
> On Feb 26, 2:18 pm, Ralf Schneider <li... at gestaltgeber.com> wrote:
>   
>> Hello,
>>
>> 2009/2/26 Talin <viri... at gmail.com>
>>
>>     
>>> The IR-level intrinsics themselves don't much help you *write* a GC, so
>>> much as to integrate one with LLVM. What is provided is essentially a
>>> mechanism for walking the stack, and a means to insert read/write
>>> barriers into the generated code, which together form a tiny fraction of
>>> what it would take to design a working collector.
>>>       
>> <...>
>>
>>
>>
>>     
>>> In other words, I think that it should be possible to create a fairly
>>> comprehensive and efficient collector implementation in which 80% of the
>>> implementation lives in a library, and the language front end generates
>>> the remaining 20%. Moreover, it should be possible to design this
>>> library in a way that makes it possible for people to replace
>>> significant parts of the collector, so that it can be used as a basis
>>> for developing a diverse selection of collection strategies. And because
>>> all of the parts that are specific to the object model are generated by
>>> the front-end, the collector should work for a large number of languages
>>> with different object semantics.
>>>       
>> IMO LLVM should try to keep its <L>ow <L>evel design. Garbage collection is
>> already very high level.
>> A library which supports building garbage collectors on the other hand would
>> be very helpful of course.
>> Basically such a library boils down to an operatic system abstraction
>> library. Functions like "stop the world" are completely managed by the OS
>> and not the CPU. I'm not sure if such functionality is part of LLVMs goals.
>>
>> Having that said; functions like :
>>
>> - Enumerate threads
>> - Pause/Resume thread
>> - Get root pointers for a thread (including registers)
>> - Get a list of modified memory-pages (GetWriteWatch in Windows - used in
>> the .net GC)
>> - ...
>>
>> for different platforms - would definitely help building a GC. Just look at
>> the source code of the Boehm GC: It's a completely unmaintainable mess of
>> #ifdefs
>>
>> A little bit off topic: Has anybody tried building a concurrent GC - running
>> in a different _process_, instead of a thread?
>>     
>
> Yes, I had a proof of concept implementation of a GC with
> - shared memory as the GC arena,
> - (C++) throw-catch-based marking
> - simple lookup rules for (in-arena) associated
>   instance metadata.
>
> I never had the need to finish the implementation, but
> the fork approach worked reasonably well, and the mark
> and sweep parts ran in the forked process, with the
> shared memory communicating back the collection results.
>
> The most amusing thing was to see how the stack unwinding
> in the forked process did the marking and the original process
> was not harmed.
>
> I hope to extend the concept to multiple threads by (m)protecting
> increasing parts of the arena and hoping that all threads get
> caught with time. Finally the last running threads must be
> stopped and forced to fork. This last question and how to recover
> the threads from the protection violations in the original process
> are the big open questions to be solved.

What do you do if the program is multi-threaded? fork()-ing a
multithreaded program can lead to problems ...
The functions you can call after forking is very limited according to
POSIX: "If a multi-threaded process calls /fork/(), the new process
shall contain a replica of the calling thread and its entire address
space, possibly including the states of mutexes and other resources.
Consequently, to avoid errors, the child process may only execute
async-signal-safe operations until such time as one of the /exec
<http://www.opengroup.org/onlinepubs/9699919799/functions/exec.html>/
functions is called."

Best regards,
--Edwin