[LLVMdev] Garbage collection
Mark Shannon
marks at dcs.gla.ac.uk
Fri Feb 27 01:32:30 PST 2009
Jon Harrop wrote:
> On Thursday 26 February 2009 17:25:56 Chris Lattner wrote:
>> In my ideal world, this would be:
>>
>> 1. Subsystems [with clean interfaces] for thread management,
>> finalization, object model interactions, etc.
>> 2. Within different high-level designs (e.g. copying, mark/sweep, etc)
>> there can be replaceable policy components etc.
>> 3. A couple of actual GC implementations built on top of #1/2.
>> Ideally there would only be a couple of high-level collectors that can
>> be parameterized by replacing subsystems and policies.
>> 4. A very simple language implementation that uses the facilities, on
>> the order of complexity as the kaleidoscope tutorial.
>>
>> As far as I know, there is nothing that prevents this from happening
>> today, we just need leadership in the area to drive it. To avoid the
>> "ivory tower" problem, I'd strongly recommend starting with a simple
>> GC and language and get the whole thing working top to bottom. From
>> there, the various pieces can be generalized out etc. This ensures
>> that there is always a *problem being solved* and something that works
>> and is testable.
>
> I fear that the IR generator and GC are too tightly coupled.
>
> For example, the IR I am generating shares pointers read from the heap even
> across function calls. That is built on the assumption that the pointers are
> immutable and, therefore, that the GC is non-moving. The generated code is
> extremely efficient even though I have not even enabled LLVM's optimizations
> yet precisely because of all this shared immutable data.
>
> If you wanted to add a copying GC to my VM you would probably replace every
> lookup of the IR register with a lookup of the code to reload it, generating
> a lot of redundant loads that would greatly degrade performance so you would
> rely upon LLVM's optimization passes to clean it up again. However, I bet
> they do not have enough information to recover all of the lost performance.
> So there is a fundamental conflict here where a simple GC design decision has
> a drastic effect on the IR generator.
>
> Although it is theoretically possible to parameterize the IR generator
> sufficiently to account for all possible combinations of GC designs I suspect
> the result would be a mess. Consequently, perhaps it would be better to
> consider IR generation and the GC as a single entity and, instead, factor
> them both out using a common high-level representation not dissimilar to JVM
> or CLR bytecode in terms of functionality but much more closely related to
> LLVM's IR?
>
IMHO, it would be better if support for GC was dropped from llvm
altogether. I say this having written a copying GC for my VM toolkit,
which also uses llvm to do its JIT compilation. And it works just fine!
I have simply avoided the intrinsics.
The problem with the llvm is that to write a GC using the llvm
intrinsics, you have to mess around with the code-gen part of llvm.
When I want to add a generational collector to my toolkit in the future,
it is easy to specify write-barriers in the IR. Modifying code-gen to
handle the intrinsics is a task I would rather avoid.
Mark.
More information about the llvm-dev
mailing list