[LLVMdev] Garbage collection
Jon Harrop
jon at ffconsultancy.com
Thu Feb 26 18:17:50 PST 2009
On Thursday 26 February 2009 17:25:56 Chris Lattner wrote:
> In my ideal world, this would be:
>
> 1. Subsystems [with clean interfaces] for thread management,
> finalization, object model interactions, etc.
> 2. Within different high-level designs (e.g. copying, mark/sweep, etc)
> there can be replaceable policy components etc.
> 3. A couple of actual GC implementations built on top of #1/2.
> Ideally there would only be a couple of high-level collectors that can
> be parameterized by replacing subsystems and policies.
> 4. A very simple language implementation that uses the facilities, on
> the order of complexity as the kaleidoscope tutorial.
>
> As far as I know, there is nothing that prevents this from happening
> today, we just need leadership in the area to drive it. To avoid the
> "ivory tower" problem, I'd strongly recommend starting with a simple
> GC and language and get the whole thing working top to bottom. From
> there, the various pieces can be generalized out etc. This ensures
> that there is always a *problem being solved* and something that works
> and is testable.
I fear that the IR generator and GC are too tightly coupled.
For example, the IR I am generating shares pointers read from the heap even
across function calls. That is built on the assumption that the pointers are
immutable and, therefore, that the GC is non-moving. The generated code is
extremely efficient even though I have not even enabled LLVM's optimizations
yet precisely because of all this shared immutable data.
If you wanted to add a copying GC to my VM you would probably replace every
lookup of the IR register with a lookup of the code to reload it, generating
a lot of redundant loads that would greatly degrade performance so you would
rely upon LLVM's optimization passes to clean it up again. However, I bet
they do not have enough information to recover all of the lost performance.
So there is a fundamental conflict here where a simple GC design decision has
a drastic effect on the IR generator.
Although it is theoretically possible to parameterize the IR generator
sufficiently to account for all possible combinations of GC designs I suspect
the result would be a mess. Consequently, perhaps it would be better to
consider IR generation and the GC as a single entity and, instead, factor
them both out using a common high-level representation not dissimilar to JVM
or CLR bytecode in terms of functionality but much more closely related to
LLVM's IR?
--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e
More information about the llvm-dev
mailing list