[llvm] r223143 - [Statepoints 4/4] Statepoint infrastructure for garbage collection: Documentation

Sean Silva chisophugis at gmail.com
Tue Feb 24 14:37:20 PST 2015


Necro-nit (wasn't sure where to post this feedback; I realize that this has
been slightly updated in ToT): please update the prototypes here to match
their current definitions (e.g. `llvm.experimental.` prefix).

(sorry for the delay in getting to this)

-- Sean Silva

On Tue, Dec 2, 2014 at 11:37 AM, Philip Reames <listmail at philipreames.com>
wrote:

> Author: reames
> Date: Tue Dec  2 13:37:00 2014
> New Revision: 223143
>
> URL: http://llvm.org/viewvc/llvm-project?rev=223143&view=rev
> Log:
> [Statepoints 4/4] Statepoint infrastructure for garbage collection:
> Documentation
>
> This is the fourth and final patch in the statepoint series.  It contains
> the documentation for the statepoint intrinsics and their usage.
>
> There's definitely still room to improve the documentation here, but I
> wanted to get this landed so it was available for others.  There will
> likely be a series of small cleanup changes over the next few weeks as we
> work to clarify and revise the documentation.  If you have comments or
> questions, please feel free to discuss them either in this commit thread,
> the original review thread, or on llvmdev.  Comments are more than welcome.
>
> Reviewed by: atrick, ributzka
> Differential Revision: http://reviews.llvm.org/D5683
>
>
>
> Added:
>     llvm/trunk/docs/Statepoints.rst
>
> Added: llvm/trunk/docs/Statepoints.rst
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/Statepoints.rst?rev=223143&view=auto
>
> ==============================================================================
> --- llvm/trunk/docs/Statepoints.rst (added)
> +++ llvm/trunk/docs/Statepoints.rst Tue Dec  2 13:37:00 2014
> @@ -0,0 +1,209 @@
> +=====================================
> +Garbage Collection Safepoints in LLVM
> +=====================================
> +
> +.. contents::
> +   :local:
> +   :depth: 2
> +
> +Status
> +=======
> +
> +This document describes a set of experimental extensions to LLVM. Use
> with caution.  Because the intrinsics have experimental status,
> compatibility across LLVM releases is not guaranteed.
> +
> +LLVM currently supports an alternate mechanism for conservative garbage
> collection support using the gc_root intrinsic.  The mechanism described
> here shares little in common with the alternate implementation and it is
> hoped that this mechanism will eventually replace the gc_root mechanism.
> +
> +Overview
> +========
> +
> +To collect dead objects, garbage collectors must be able to identify any
> references to objects contained within executing code, and, depending on
> the collector, potentially update them.  The collector does not need this
> information at all points in code - that would make the problem much harder
> - but only at well defined points in the execution known as 'safepoints'
> For a most collectors, it is sufficient to track at least one copy of each
> unique pointer value.  However, for a collector which wishes to relocate
> objects directly reachable from running code, a higher standard is required.
> +
> +One additional challenge is that the compiler may compute intermediate
> results ("derived pointers") which point outside of the allocation or even
> into the middle of another allocation.  The eventual use of this
> intermediate value must yield an address within the bounds of the
> allocation, but such "exterior derived pointers" may be visible to the
> collector.  Given this, a garbage collector can not safely rely on the
> runtime value of an address to indicate the object it is associated with.
> If the garbage collector wishes to move any object, the compiler must
> provide a mapping for each pointer to an indication of its allocation.
> +
> +To simplify the interaction between a collector and the compiled code,
> most garbage collectors are organized in terms of two three abstractions:
> load barriers, store barriers, and safepoints.
> +
> +#. A load barrier is a bit of code executed immediately after the machine
> load instruction, but before any use of the value loaded.  Depending on the
> collector, such a barrier may be needed for all loads, merely loads of a
> particular type (in the original source language), or none at all.
> +#. Analogously, a store barrier is a code fragement that runs immediately
> before the machine store instruction, but after the computation of the
> value stored.  The most common use of a store barrier is to update a 'card
> table' in a generational garbage collector.
> +
> +#. A safepoint is a location at which pointers visible to the compiled
> code (i.e. currently in registers or on the stack) are allowed to change.
> After the safepoint completes, the actual pointer value may differ, but the
> 'object' (as seen by the source language) pointed to will not.
> +
> +  Note that the term 'safepoint' is somewhat overloaded.  It refers to
> both the location at which the machine state is parsable and the
> coordination protocol involved in bring application threads to a point at
> which the collector can safely use that information.  The term "statepoint"
> as used in this document refers exclusively to the former.
> +
> +This document focuses on the last item - compiler support for safepoints
> in generated code.  We will assume that an outside mechanism has decided
> where to place safepoints.  From our perspective, all safepoints will be
> function calls.  To support relocation of objects directly reachable from
> values in compiled code, the collector must be able to:
> +
> +#. identify every copy of a pointer (including copies introduced by the
> compiler itself) at the safepoint,
> +#. identify which object each pointer relates to, and
> +#. potentially update each of those copies.
> +
> +This document describes the mechanism by which an LLVM based compiler can
> provide this information to a language runtime/collector and ensure that
> all pointers can be read and updated if desired.  The heart of the approach
> is to construct (or rewrite) the IR in a manner where the possible updates
> performed by the garbage collector are explicitly visible in the IR.  Doing
> so requires that we:
> +
> +#. create a new SSA value for each potentially relocated pointer, and
> ensure that no uses of the original (non relocated) value is reachable
> after the safepoint,
> +#. specify the relocation in a way which is opaque to the compiler to
> ensure that the optimizer can not introduce new uses of an unrelocated
> value after a statepoint. This prevents the optimizer from performing
> unsound optimizations.
> +#. recording a mapping of live pointers (and the allocation they're
> associated with) for each statepoint.
> +
> +At the most abstract level, inserting a safepoint can be thought of as
> replacing a call instruction with a call to a multiple return value
> function which both calls the original target of the call, returns it's
> result, and returns updated values for any live pointers to garbage
> collected objects.
> +
> +  Note that the task of identifying all live pointers to garbage
> collected values, transforming the IR to expose a pointer giving the base
> object for every such live pointer, and inserting all the intrinsics
> correctly is explicitly out of scope for this document.  The recommended
> approach is described in the section of Late Safepoint Placement below.
> +
> +This abstract function call is concretely represented by a sequence of
> intrinsic calls known as a 'statepoint sequence'.
> +
> +
> +Let's consider a simple call in LLVM IR:
> +  todo
> +
> +Depending on our language we may need to allow a safepoint during the
> execution of the function called from this site.  If so, we need to let the
> collector update local values in the current frame.
> +
> +Let's say we need to relocate SSA values 'a', 'b', and 'c' at this
> safepoint.  To represent this, we would generate the statepoint sequence::
> +  put an example sequence here
> +
> +Ideally, this sequence would have been represented as a M argument, N
> return value function (where M is the number of values being relocated +
> the original call arguments and N is the original return value + each
> relocated value), but LLVM does not easily support such a representation.
> +
> +Instead, the statepoint intrinsic marks the actual site of the safepoint
> or statepoint.  The statepoint returns a token value (which exists only at
> compile time).  To get back the original return value of the call, we use
> the 'gc_result' intrinsic.  To get the relocation of each pointer in turn,
> we use the 'gc_relocate' intrinsic with the appropriate index.  Note that
> both the gc_relocate and gc_result are tied to the statepoint.  The
> combination forms a "statepoint sequence" and represents the entitety of a
> parseable call or 'statepoint'.
> +
> +When lowered, this example would generate the following x86 assembly::
> +  put assembly here
> +
> +Each of the potentially relocated values has been spilled to the stack,
> and a record of that location has been recorded to the StackMap section.
> If the garbage collector needs to update any of these pointers during the
> call, it knows exactly what to change.
> +
> +Intrinsics
> +===========
> +
> +'''gc_statepoint''' Intrinsic
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Syntax:
> +"""""""
> +
> +::
> +
> +      declare i32
> +        @gc_statepoint(func_type <target>, i64 <#call args>.
> +                       i64 <unused>, ... (call parameters),
> +                       i64 <# deopt args>, ... (deopt parameters),
> +                       ... (gc parameters))
> +
> +Overview:
> +"""""""""
> +
> +The statepoint intrinsic represents a call which is parse-able by the
> runtime.
> +
> +Operands:
> +"""""""""
> +
> +The 'target' operand is the function actually being called.  The target
> can be specified as either a symbolic LLVM funciton, or as an arbitrary
> Value of appropriate function type.  Note that the function type must match
> the signature of the callee and the types of the 'call parameters'
> arguments.
> +
> +The '#call args' operand is the number of arguments to the actual call.
> It must exactly match the number of arguments passed in the 'call
> parameters' variable length section.
> +
> +The 'unused' operand is unused and likely to be removed.  Please do not
> use.
> +
> +The 'call parameters' arguments are simply the arguments which need to be
> passed to the call target.  They will be lowered according to the specified
> calling convention and otherwise handled like a normal call instruction.
> The number of arguments must exactly match what is specified in '# call
> args'.  The types must match the signature of 'target'.
> +
> +The 'deopt parameters' arguments contain an arbitrary list of Values
> which is meaningful to the runtime.  The runtime may read any of these
> values, but is assumed not to modify them.  If the garbage collector might
> need to modify one of these values, it must also be listed in the 'gc
> pointer' argument list.  The '# deopt args' field indicates how many
> operands are to be interpreted as 'deopt parameters'.
> +
> +The 'gc parameters' arguments contain every pointer to a garbage
> collector object which potentially needs to be updated by the garbage
> collector.  Note that the argument list must explicitly contain a base
> pointer for every derived pointer listed.  The order of arguments is
> unimportant.  Unlike the other variable length parameter sets, this list is
> not length prefixed.
> +
> +Semantics:
> +""""""""""
> +
> +A statepoint is assumed to read and write all memory.  As a result,
> memory operations can not be reordered past a statepoint.  It is illegal to
> mark a statepoint as being either 'readonly' or 'readnone'.
> +
> +Note that legal IR can not perform any memory operation on a 'gc pointer'
> argument of the statepoint in a location statically reachable from the
> statepoint.  Instead, the explicitly relocated value (from a
> ''gc_relocate'') must be used.
> +
> +'''gc_result''' Intrinsic
> +^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Syntax:
> +"""""""
> +
> +::
> +
> +      declare type*
> +        @gc_result_ptr(i32 %statepoint_token)
> +
> +      declare fX
> +        @gc_result_float(i32 %statepoint_token)
> +
> +      declare iX
> +        @gc_result_int(i32 %statepoint_token)
> +
> +Overview:
> +"""""""""
> +
> +'''gc_result''' extracts the result of the original call instruction
> which was replaced by the '''gc_statepoint'''.  The '''gc_result'''
> intrinsic is actually a family of three intrinsics due to an implementation
> limitation.  Other than the type of the return value, the semantics are the
> same.
> +
> +Operands:
> +"""""""""
> +
> +The first and only argument is the '''gc.statepoint''' which starts the
> safepoint sequence of which this '''gc_result'' is a part.  Despite the
> typing of this as a generic i32, *only* the value defined by a
> '''gc.statepoint''' is legal here.
> +
> +Semantics:
> +""""""""""
> +
> +The ''gc_result'' represents the return value of the call target of the
> ''statepoint''.  The type of the ''gc_result'' must exactly match the type
> of the target.  If the call target returns void, there will be no
> ''gc_result''.
> +
> +A ''gc_result'' is modeled as a 'readnone' pure function.  It has no side
> effects since it is just a projection of the return value of the previous
> call represented by the ''gc_statepoint''.
> +
> +'''gc_relocate''' Intrinsic
> +^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +
> +Syntax:
> +"""""""
> +
> +::
> +
> +      declare <type> addrspace(1)*
> +        @gc_relocate(i32 %token, i32 %base_offset, i32 %pointer_offset)
> +
> +Overview:
> +"""""""""
> +
> +A ''gc_relocate'' returns the potentially relocated value of a pointer at
> the safepoint.
> +
> +Operands:
> +"""""""""
> +
> +The first argument is the '''gc.statepoint''' which starts the safepoint
> sequence of which this '''gc_relocation'' is a part.  Despite the typing of
> this as a generic i32, *only* the value defined by a '''gc.statepoint''' is
> legal here.
> +
> +The second argument is an index into the statepoints list of arguments
> which specifies the base pointer for the pointer being relocated.  This
> index must land within the 'gc parameter' section of the statepoint's
> argument list.
> +
> +The third argument is an index into the statepoint's list of arguments
> which specify the (potentially) derived pointer being relocated.  It is
> legal for this index to be the same as the second argument if-and-only-if a
> base pointer is being relocated. This index must land within the 'gc
> parameter' section of the statepoint's argument list.
> +
> +Semantics:
> +""""""""""
> +The return value of ''gc_relocate'' is the potentially relocated value of
> the pointer specified by it's arguments.  It is unspecified how the value
> of the returned pointer relates to the argument to the ''gc_statepoint''
> other than that a) it points to the same source language object with the
> same offset, and b) the 'based-on' relationship of the newly relocated
> pointers is a projection of the unrelocated pointers.  In particular, the
> integer value of the pointer returned is unspecified.
> +
> +A ''gc_relocate'' is modeled as a 'readnone' pure function.  It has no
> side effects since it is just a way to extract information about work done
> during the actual call modeled by the ''gc_statepoint''.
> +
> +
> +StackMap Format
> +================
> +
> +Locations for each pointer value which may need read and/or updated by
> the runtime or collector are provided via the StackMap format specified in
> the PatchPoint documentation.
> +
> +.. TODO: link
> +
> +Each statepoint generates the following Locations:
> +
> +* Constant which describes number of following deopt *Locations* (not
> operands)
> +* Variable number of Locations, one for each deopt parameter listed in
> the IR statepoint (same number as described by previous Constant)
> +* Variable number of Locations pairs, one pair for each unique pointer
> which needs relocated.  The first Location in each pair describes the base
> pointer for the object.  The second is the derived pointer actually being
> relocated.  It is guaranteed that the base pointer must also appear
> explicitly as a relocation pair if used after the statepoint. There may be
> fewer pairs then gc parameters in the IR statepoint. Each *unique* pair
> will occur at least once; duplicates are possible.
> +
> +Note that the Locations used in each section may describe the same
> physical location.  e.g. A stack slot may appear as a deopt location, a gc
> base pointer, and a gc derived pointer.
> +
> +The ID field of the 'StkMapRecord' for a statepoint is meaningless and
> it's value is explicitly unspecified.
> +
> +The LiveOut section of the StkMapRecord will be empty for a statepoint
> record.
> +
> +Safepoint Semantics & Verification
> +==================================
> +
> +The fundamental correctness property for the compiled code's correctness
> w.r.t. the garbage collector is a dynamic one.  It must be the case that
> there is no dynamic trace such that a operation involving a potentially
> relocated pointer is observably-after a safepoint which could relocate it.
> 'observably-after' is this usage means that an outside observer could
> observe this sequence of events in a way which precludes the operation
> being performed before the safepoint.
> +
> +To understand why this 'observable-after' property is required, consider
> a null comparison performed on the original copy of a relocated pointer.
> Assuming that control flow follows the safepoint, there is no way to
> observe externally whether the null comparison is performed before or after
> the safepoint.  (Remember, the original Value is unmodified by the
> safepoint.)  The compiler is free to make either scheduling choice.
> +
> +The actual correctness property implemented is slightly stronger than
> this.  We require that there be no *static path* on which a potentially
> relocated pointer is 'observably-after' it may have been relocated.  This
> is slightly stronger than is strictly necessary (and thus may disallow some
> otherwise valid programs), but greatly simplifies reasoning about
> correctness of the compiled code.
> +
> +By construction, this property will be upheld by the optimizer if
> correctly established in the source IR.  This is a key invariant of the
> design.
> +
> +The existing IR Verifier pass has been extended to check most of the
> local restrictions on the intrinsics mentioned in their respective
> documentation.  The current implementation in LLVM does not check the key
> relocation invariant, but this is ongoing work on developing such a
> verifier.  Please ask on llvmdev if you're interested in experimenting with
> the current version.
> +
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150224/9d37ebca/attachment.html>


More information about the llvm-commits mailing list