[llvm] r230359 - Starting to cleanup the garbage collection documentation

Philip Reames listmail at philipreames.com
Tue Feb 24 11:44:46 PST 2015


Author: reames
Date: Tue Feb 24 13:44:46 2015
New Revision: 230359

URL: http://llvm.org/viewvc/llvm-project?rev=230359&view=rev
Log:
Starting to cleanup the garbage collection documentation

In this change:
- Put the getting started section first
- Create a dedicated section to document the built in collector strategies
- Move discuss of ShadowStack into new section
- Add placeholders for erlang, ocaml, and statepoint-example collectors

There will be many more changes following.  I plan on full integrating the documentation for gc.statepoint and gc.root.  I want to make it much clearer on how to get started and what users should expect in terms of effort.


Modified:
    llvm/trunk/docs/GarbageCollection.rst

Modified: llvm/trunk/docs/GarbageCollection.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/GarbageCollection.rst?rev=230359&r1=230358&r2=230359&view=diff
==============================================================================
--- llvm/trunk/docs/GarbageCollection.rst (original)
+++ llvm/trunk/docs/GarbageCollection.rst Tue Feb 24 13:44:46 2015
@@ -1,5 +1,5 @@
 =====================================
-Accurate Garbage Collection with LLVM
+Garbage Collection with LLVM
 =====================================
 
 .. contents::
@@ -8,113 +8,9 @@ Accurate Garbage Collection with LLVM
 Introduction
 ============
 
-Garbage collection is a widely used technique that frees the programmer from
-having to know the lifetimes of heap objects, making software easier to produce
-and maintain.  Many programming languages rely on garbage collection for
-automatic memory management.  There are two primary forms of garbage collection:
-conservative and accurate.
-
-Conservative garbage collection often does not require any special support from
-either the language or the compiler: it can handle non-type-safe programming
-languages (such as C/C++) and does not require any special information from the
-compiler.  The `Boehm collector
-<http://www.hpl.hp.com/personal/Hans_Boehm/gc/>`__ is an example of a
-state-of-the-art conservative collector.
-
-Accurate garbage collection requires the ability to identify all pointers in the
-program at run-time (which requires that the source-language be type-safe in
-most cases).  Identifying pointers at run-time requires compiler support to
-locate all places that hold live pointer variables at run-time, including the
-:ref:`processor stack and registers <gcroot>`.
-
-Conservative garbage collection is attractive because it does not require any
-special compiler support, but it does have problems.  In particular, because the
-conservative garbage collector cannot *know* that a particular word in the
-machine is a pointer, it cannot move live objects in the heap (preventing the
-use of compacting and generational GC algorithms) and it can occasionally suffer
-from memory leaks due to integer values that happen to point to objects in the
-program.  In addition, some aggressive compiler transformations can break
-conservative garbage collectors (though these seem rare in practice).
-
-Accurate garbage collectors do not suffer from any of these problems, but they
-can suffer from degraded scalar optimization of the program.  In particular,
-because the runtime must be able to identify and update all pointers active in
-the program, some optimizations are less effective.  In practice, however, the
-locality and performance benefits of using aggressive garbage collection
-techniques dominates any low-level losses.
-
-This document describes the mechanisms and interfaces provided by LLVM to
-support accurate garbage collection.
-
-Goals and non-goals
--------------------
-
-LLVM's intermediate representation provides :ref:`garbage collection intrinsics
-<gc_intrinsics>` that offer support for a broad class of collector models.  For
-instance, the intrinsics permit:
-
-* semi-space collectors
-
-* mark-sweep collectors
-
-* generational collectors
-
-* reference counting
-
-* incremental collectors
-
-* concurrent collectors
-
-* cooperative collectors
-
-We hope that the primitive support built into the LLVM IR is sufficient to
-support a broad class of garbage collected languages including Scheme, ML, Java,
-C#, Perl, Python, Lua, Ruby, other scripting languages, and more.
-
-However, LLVM does not itself provide a garbage collector --- this should be
-part of your language's runtime library.  LLVM provides a framework for compile
-time :ref:`code generation plugins <plugin>`.  The role of these plugins is to
-generate code and data structures which conforms to the *binary interface*
-specified by the *runtime library*.  This is similar to the relationship between
-LLVM and DWARF debugging info, for example.  The difference primarily lies in
-the lack of an established standard in the domain of garbage collection --- thus
-the plugins.
-
-The aspects of the binary interface with which LLVM's GC support is
-concerned are:
-
-* Creation of GC-safe points within code where collection is allowed to execute
-  safely.
-
-* Computation of the stack map.  For each safe point in the code, object
-  references within the stack frame must be identified so that the collector may
-  traverse and perhaps update them.
-
-* Write barriers when storing object references to the heap.  These are commonly
-  used to optimize incremental scans in generational collectors.
-
-* Emission of read barriers when loading object references.  These are useful
-  for interoperating with concurrent collectors.
-
-There are additional areas that LLVM does not directly address:
-
-* Registration of global roots with the runtime.
-
-* Registration of stack map entries with the runtime.
-
-* The functions used by the program to allocate memory, trigger a collection,
-  etc.
-
-* Computation or compilation of type maps, or registration of them with the
-  runtime.  These are used to crawl the heap for object references.
-
-In general, LLVM's support for GC does not include features which can be
-adequately addressed with other features of the IR and does not specify a
-particular binary interface.  On the plus side, this means that you should be
-able to integrate LLVM with an existing runtime.  On the other hand, it leaves a
-lot of work for the developer of a novel language.  However, it's easy to get
-started quickly and scale up to a more sophisticated implementation as your
-compiler matures.
+This document covers how to integrate LLVM into a compiler for a language which
+supports garbage collection.  **Note that LLVM itself does not provide a 
+garbage collector.**  You must provide your own.  
 
 Getting started
 ===============
@@ -177,7 +73,7 @@ To turn the shadow stack on for your fun
 
 .. code-block:: c++
 
-  F.setGC("shadow-stack");
+  F.setGC(<collector description name>);
 
 for each function your compiler emits. Since the shadow stack is built into
 LLVM, you do not need to load a plugin.
@@ -252,27 +148,120 @@ data structure, but there are only 20 li
     }
   }
 
-About the shadow stack
-----------------------
 
-Unlike many GC algorithms which rely on a cooperative code generator to compile
-stack maps, this algorithm carefully maintains a linked list of stack roots
-[:ref:`Henderson2002 <henderson02>`].  This so-called "shadow stack" mirrors the
-machine stack.  Maintaining this data structure is slower than using a stack map
-compiled into the executable as constant data, but has a significant portability
-advantage because it requires no special support from the target code generator,
-and does not require tricky platform-specific code to crawl the machine stack.
 
-The tradeoff for this simplicity and portability is:
 
-* High overhead per function call.
 
-* Not thread-safe.
+What is Garbage Collection?
+===========================
 
-Still, it's an easy way to get started.  After your compiler and runtime are up
-and running, writing a :ref:`plugin <plugin>` will allow you to take advantage
-of :ref:`more advanced GC features <collector-algos>` of LLVM in order to
-improve performance.
+Garbage collection is a widely used technique that frees the programmer from
+having to know the lifetimes of heap objects, making software easier to produce
+and maintain.  Many programming languages rely on garbage collection for
+automatic memory management.  There are two primary forms of garbage collection:
+conservative and accurate.
+
+Conservative garbage collection often does not require any special support from
+either the language or the compiler: it can handle non-type-safe programming
+languages (such as C/C++) and does not require any special information from the
+compiler.  The `Boehm collector
+<http://www.hpl.hp.com/personal/Hans_Boehm/gc/>`__ is an example of a
+state-of-the-art conservative collector.
+
+Accurate garbage collection requires the ability to identify all pointers in the
+program at run-time (which requires that the source-language be type-safe in
+most cases).  Identifying pointers at run-time requires compiler support to
+locate all places that hold live pointer variables at run-time, including the
+:ref:`processor stack and registers <gcroot>`.
+
+Conservative garbage collection is attractive because it does not require any
+special compiler support, but it does have problems.  In particular, because the
+conservative garbage collector cannot *know* that a particular word in the
+machine is a pointer, it cannot move live objects in the heap (preventing the
+use of compacting and generational GC algorithms) and it can occasionally suffer
+from memory leaks due to integer values that happen to point to objects in the
+program.  In addition, some aggressive compiler transformations can break
+conservative garbage collectors (though these seem rare in practice).
+
+Accurate garbage collectors do not suffer from any of these problems, but they
+can suffer from degraded scalar optimization of the program.  In particular,
+because the runtime must be able to identify and update all pointers active in
+the program, some optimizations are less effective.  In practice, however, the
+locality and performance benefits of using aggressive garbage collection
+techniques dominates any low-level losses.
+
+This document describes the mechanisms and interfaces provided by LLVM to
+support accurate garbage collection.
+
+Goals and non-goals
+-------------------
+
+LLVM's intermediate representation provides :ref:`garbage collection intrinsics
+<gc_intrinsics>` that offer support for a broad class of collector models.  For
+instance, the intrinsics permit:
+
+* semi-space collectors
+
+* mark-sweep collectors
+
+* generational collectors
+
+* reference counting
+
+* incremental collectors
+
+* concurrent collectors
+
+* cooperative collectors
+
+We hope that the primitive support built into the LLVM IR is sufficient to
+support a broad class of garbage collected languages including Scheme, ML, Java,
+C#, Perl, Python, Lua, Ruby, other scripting languages, and more.
+
+However, LLVM does not itself provide a garbage collector --- this should be
+part of your language's runtime library.  LLVM provides a framework for compile
+time :ref:`code generation plugins <plugin>`.  The role of these plugins is to
+generate code and data structures which conforms to the *binary interface*
+specified by the *runtime library*.  This is similar to the relationship between
+LLVM and DWARF debugging info, for example.  The difference primarily lies in
+the lack of an established standard in the domain of garbage collection --- thus
+the plugins.
+
+The aspects of the binary interface with which LLVM's GC support is
+concerned are:
+
+* Creation of GC-safe points within code where collection is allowed to execute
+  safely.
+
+* Computation of the stack map.  For each safe point in the code, object
+  references within the stack frame must be identified so that the collector may
+  traverse and perhaps update them.
+
+* Write barriers when storing object references to the heap.  These are commonly
+  used to optimize incremental scans in generational collectors.
+
+* Emission of read barriers when loading object references.  These are useful
+  for interoperating with concurrent collectors.
+
+There are additional areas that LLVM does not directly address:
+
+* Registration of global roots with the runtime.
+
+* Registration of stack map entries with the runtime.
+
+* The functions used by the program to allocate memory, trigger a collection,
+  etc.
+
+* Computation or compilation of type maps, or registration of them with the
+  runtime.  These are used to crawl the heap for object references.
+
+In general, LLVM's support for GC does not include features which can be
+adequately addressed with other features of the IR and does not specify a
+particular binary interface.  On the plus side, this means that you should be
+able to integrate LLVM with an existing runtime.  On the other hand, it leaves a
+lot of work for the developer of a novel language.  However, it's easy to get
+started quickly and scale up to a more sophisticated implementation as your
+compiler matures.
 
 .. _gc_intrinsics:
 
@@ -447,6 +436,64 @@ greater performance impact since pointer
 
 .. _plugin:
 
+Built In Collectors
+====================
+
+LLVM includes built in support for several varieties of garbage collectors.  
+
+The Shadow Stack GC
+----------------------
+
+To use this collector strategy, mark your functions with:
+
+.. code-block:: c++
+
+  F.setGC("shadow-stack");
+
+Unlike many GC algorithms which rely on a cooperative code generator to compile
+stack maps, this algorithm carefully maintains a linked list of stack roots
+[:ref:`Henderson2002 <henderson02>`].  This so-called "shadow stack" mirrors the
+machine stack.  Maintaining this data structure is slower than using a stack map
+compiled into the executable as constant data, but has a significant portability
+advantage because it requires no special support from the target code generator,
+and does not require tricky platform-specific code to crawl the machine stack.
+
+The tradeoff for this simplicity and portability is:
+
+* High overhead per function call.
+
+* Not thread-safe.
+
+Still, it's an easy way to get started.  After your compiler and runtime are up
+and running, writing a :ref:`plugin <plugin>` will allow you to take advantage
+of :ref:`more advanced GC features <collector-algos>` of LLVM in order to
+improve performance.
+
+The 'Erlang' and 'Ocaml' GCs
+-----------------------------
+
+LLVM ships with two example collectors which leverage the ''gcroot'' 
+mechanisms.  To our knowledge, these are not actually used by any language 
+runtime, but they do provide a reasonable starting point for someone interested 
+in writing an ''gcroot' compatible GC plugin.  In particular, these are the 
+only in tree examples of how to produce a custom binary stack map format using 
+a ''gcroot'' strategy.
+
+As there names imply, the binary format produced is intended to model that 
+used by the Erlang and OCaml compilers respectively.  
+
+
+The Statepoint Example GC
+-------------------------
+
+.. code-block:: c++
+
+  F.setGC("statepoint-example");
+
+This GC provides an example of how one might use the infrastructure provided 
+by ''gc.statepoint''.  
+
+
 Implementing a collector plugin
 ===============================
 





More information about the llvm-commits mailing list