[llvm] r223143 - [Statepoints 4/4] Statepoint infrastructure for garbage collection: Documentation

Philip Reames listmail at philipreames.com
Tue Feb 24 17:31:16 PST 2015


On the queue for tomorrow.

Other things which need to happen:
- Move intrinsic definitions into LangRef
- Flesh out a description of the "statepoint-example" GC.
- Document the fact there's no a form of statepoint sequence without 
explicitly relocations, update code with asserts & flags respectively

I'm considering just removing the Statepoints page entirely and merging 
the content into GarbageCollection.  I probably wont actually go ahead 
with that just yet.

I also need a place to transcribe my private TODO list somewhere 
public.  The docs probably aren't the right place for this though.

On 02/24/2015 04:56 PM, Sean Silva wrote:
> There are a couple todo/"put assembly here" in the file currently. It 
> would be nice to flesh those out.
>
> On Tue, Feb 24, 2015 at 4:24 PM, Philip Reames 
> <listmail at philipreames.com <mailto:listmail at philipreames.com>> wrote:
>
>     Fixed.  Other comments welcome.
>
>
>     On 02/24/2015 02:44 PM, Philip Reames wrote:
>>     Your timing is good.  I'm working on docs today and should get to
>>     this by end of day.  :)
>>
>>     Philip
>>
>>     On 02/24/2015 02:37 PM, Sean Silva wrote:
>>>     Necro-nit (wasn't sure where to post this feedback; I realize
>>>     that this has been slightly updated in ToT): please update the
>>>     prototypes here to match their current definitions (e.g.
>>>     `llvm.experimental.` prefix).
>>>
>>>     (sorry for the delay in getting to this)
>>>
>>>     -- Sean Silva
>>>
>>>     On Tue, Dec 2, 2014 at 11:37 AM, Philip Reames
>>>     <listmail at philipreames.com <mailto:listmail at philipreames.com>>
>>>     wrote:
>>>
>>>         Author: reames
>>>         Date: Tue Dec  2 13:37:00 2014
>>>         New Revision: 223143
>>>
>>>         URL: http://llvm.org/viewvc/llvm-project?rev=223143&view=rev
>>>         Log:
>>>         [Statepoints 4/4] Statepoint infrastructure for garbage
>>>         collection: Documentation
>>>
>>>         This is the fourth and final patch in the statepoint
>>>         series.  It contains the documentation for the statepoint
>>>         intrinsics and their usage.
>>>
>>>         There's definitely still room to improve the documentation
>>>         here, but I wanted to get this landed so it was available
>>>         for others. There will likely be a series of small cleanup
>>>         changes over the next few weeks as we work to clarify and
>>>         revise the documentation.  If you have comments or
>>>         questions, please feel free to discuss them either in this
>>>         commit thread, the original review thread, or on llvmdev. 
>>>         Comments are more than welcome.
>>>
>>>         Reviewed by: atrick, ributzka
>>>         Differential Revision: http://reviews.llvm.org/D5683
>>>
>>>
>>>
>>>         Added:
>>>             llvm/trunk/docs/Statepoints.rst
>>>
>>>         Added: llvm/trunk/docs/Statepoints.rst
>>>         URL:
>>>         http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/Statepoints.rst?rev=223143&view=auto
>>>         ==============================================================================
>>>         --- llvm/trunk/docs/Statepoints.rst (added)
>>>         +++ llvm/trunk/docs/Statepoints.rst Tue Dec 2 13:37:00 2014
>>>         @@ -0,0 +1,209 @@
>>>         +=====================================
>>>         +Garbage Collection Safepoints in LLVM
>>>         +=====================================
>>>         +
>>>         +.. contents::
>>>         +   :local:
>>>         +   :depth: 2
>>>         +
>>>         +Status
>>>         +=======
>>>         +
>>>         +This document describes a set of experimental extensions to
>>>         LLVM. Use with caution.  Because the intrinsics have
>>>         experimental status, compatibility across LLVM releases is
>>>         not guaranteed.
>>>         +
>>>         +LLVM currently supports an alternate mechanism for
>>>         conservative garbage collection support using the gc_root
>>>         intrinsic.  The mechanism described here shares little in
>>>         common with the alternate implementation and it is hoped
>>>         that this mechanism will eventually replace the gc_root
>>>         mechanism.
>>>         +
>>>         +Overview
>>>         +========
>>>         +
>>>         +To collect dead objects, garbage collectors must be able to
>>>         identify any references to objects contained within
>>>         executing code, and, depending on the collector, potentially
>>>         update them.  The collector does not need this information
>>>         at all points in code - that would make the problem much
>>>         harder - but only at well defined points in the execution
>>>         known as 'safepoints'  For a most collectors, it is
>>>         sufficient to track at least one copy of each unique pointer
>>>         value.  However, for a collector which wishes to relocate
>>>         objects directly reachable from running code, a higher
>>>         standard is required.
>>>         +
>>>         +One additional challenge is that the compiler may compute
>>>         intermediate results ("derived pointers") which point
>>>         outside of the allocation or even into the middle of another
>>>         allocation.  The eventual use of this intermediate value
>>>         must yield an address within the bounds of the allocation,
>>>         but such "exterior derived pointers" may be visible to the
>>>         collector.  Given this, a garbage collector can not safely
>>>         rely on the runtime value of an address to indicate the
>>>         object it is associated with.  If the garbage collector
>>>         wishes to move any object, the compiler must provide a
>>>         mapping for each pointer to an indication of its allocation.
>>>         +
>>>         +To simplify the interaction between a collector and the
>>>         compiled code, most garbage collectors are organized in
>>>         terms of two three abstractions: load barriers, store
>>>         barriers, and safepoints.
>>>         +
>>>         +#. A load barrier is a bit of code executed immediately
>>>         after the machine load instruction, but before any use of
>>>         the value loaded.  Depending on the collector, such a
>>>         barrier may be needed for all loads, merely loads of a
>>>         particular type (in the original source language), or none
>>>         at all.
>>>         +#. Analogously, a store barrier is a code fragement that
>>>         runs immediately before the machine store instruction, but
>>>         after the computation of the value stored.  The most common
>>>         use of a store barrier is to update a 'card table' in a
>>>         generational garbage collector.
>>>         +
>>>         +#. A safepoint is a location at which pointers visible to
>>>         the compiled code (i.e. currently in registers or on the
>>>         stack) are allowed to change.  After the safepoint
>>>         completes, the actual pointer value may differ, but the
>>>         'object' (as seen by the source language) pointed to will not.
>>>         +
>>>         +  Note that the term 'safepoint' is somewhat overloaded. 
>>>         It refers to both the location at which the machine state is
>>>         parsable and the coordination protocol involved in bring
>>>         application threads to a point at which the collector can
>>>         safely use that information.  The term "statepoint" as used
>>>         in this document refers exclusively to the former.
>>>         +
>>>         +This document focuses on the last item - compiler support
>>>         for safepoints in generated code.  We will assume that an
>>>         outside mechanism has decided where to place safepoints. 
>>>         From our perspective, all safepoints will be function
>>>         calls.  To support relocation of objects directly reachable
>>>         from values in compiled code, the collector must be able to:
>>>         +
>>>         +#. identify every copy of a pointer (including copies
>>>         introduced by the compiler itself) at the safepoint,
>>>         +#. identify which object each pointer relates to, and
>>>         +#. potentially update each of those copies.
>>>         +
>>>         +This document describes the mechanism by which an LLVM
>>>         based compiler can provide this information to a language
>>>         runtime/collector and ensure that all pointers can be read
>>>         and updated if desired.  The heart of the approach is to
>>>         construct (or rewrite) the IR in a manner where the possible
>>>         updates performed by the garbage collector are explicitly
>>>         visible in the IR.  Doing so requires that we:
>>>         +
>>>         +#. create a new SSA value for each potentially relocated
>>>         pointer, and ensure that no uses of the original (non
>>>         relocated) value is reachable after the safepoint,
>>>         +#. specify the relocation in a way which is opaque to the
>>>         compiler to ensure that the optimizer can not introduce new
>>>         uses of an unrelocated value after a statepoint. This
>>>         prevents the optimizer from performing unsound optimizations.
>>>         +#. recording a mapping of live pointers (and the allocation
>>>         they're associated with) for each statepoint.
>>>         +
>>>         +At the most abstract level, inserting a safepoint can be
>>>         thought of as replacing a call instruction with a call to a
>>>         multiple return value function which both calls the original
>>>         target of the call, returns it's result, and returns updated
>>>         values for any live pointers to garbage collected objects.
>>>         +
>>>         +  Note that the task of identifying all live pointers to
>>>         garbage collected values, transforming the IR to expose a
>>>         pointer giving the base object for every such live pointer,
>>>         and inserting all the intrinsics correctly is explicitly out
>>>         of scope for this document.  The recommended approach is
>>>         described in the section of Late Safepoint Placement below.
>>>         +
>>>         +This abstract function call is concretely represented by a
>>>         sequence of intrinsic calls known as a 'statepoint sequence'.
>>>         +
>>>         +
>>>         +Let's consider a simple call in LLVM IR:
>>>         +  todo
>>>         +
>>>         +Depending on our language we may need to allow a safepoint
>>>         during the execution of the function called from this site. 
>>>         If so, we need to let the collector update local values in
>>>         the current frame.
>>>         +
>>>         +Let's say we need to relocate SSA values 'a', 'b', and 'c'
>>>         at this safepoint.  To represent this, we would generate the
>>>         statepoint sequence::
>>>         +  put an example sequence here
>>>         +
>>>         +Ideally, this sequence would have been represented as a M
>>>         argument, N return value function (where M is the number of
>>>         values being relocated + the original call arguments and N
>>>         is the original return value + each relocated value), but
>>>         LLVM does not easily support such a representation.
>>>         +
>>>         +Instead, the statepoint intrinsic marks the actual site of
>>>         the safepoint or statepoint. The statepoint returns a token
>>>         value (which exists only at compile time).  To get back the
>>>         original return value of the call, we use the 'gc_result'
>>>         intrinsic.  To get the relocation of each pointer in turn,
>>>         we use the 'gc_relocate' intrinsic with the appropriate
>>>         index.  Note that both the gc_relocate and gc_result are
>>>         tied to the statepoint.  The combination forms a "statepoint
>>>         sequence" and represents the entitety of a parseable call or
>>>         'statepoint'.
>>>         +
>>>         +When lowered, this example would generate the following x86
>>>         assembly::
>>>         +  put assembly here
>>>         +
>>>         +Each of the potentially relocated values has been spilled
>>>         to the stack, and a record of that location has been
>>>         recorded to the StackMap section.  If the garbage collector
>>>         needs to update any of these pointers during the call, it
>>>         knows exactly what to change.
>>>         +
>>>         +Intrinsics
>>>         +===========
>>>         +
>>>         +'''gc_statepoint''' Intrinsic
>>>         +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>         +
>>>         +Syntax:
>>>         +"""""""
>>>         +
>>>         +::
>>>         +
>>>         +      declare i32
>>>         +        @gc_statepoint(func_type <target>, i64 <#call args>.
>>>         +                       i64 <unused>, ... (call parameters),
>>>         +                       i64 <# deopt args>, ... (deopt
>>>         parameters),
>>>         +                       ... (gc parameters))
>>>         +
>>>         +Overview:
>>>         +"""""""""
>>>         +
>>>         +The statepoint intrinsic represents a call which is
>>>         parse-able by the runtime.
>>>         +
>>>         +Operands:
>>>         +"""""""""
>>>         +
>>>         +The 'target' operand is the function actually being
>>>         called.  The target can be specified as either a symbolic
>>>         LLVM funciton, or as an arbitrary Value of appropriate
>>>         function type.  Note that the function type must match the
>>>         signature of the callee and the types of the 'call
>>>         parameters' arguments.
>>>         +
>>>         +The '#call args' operand is the number of arguments to the
>>>         actual call.  It must exactly match the number of arguments
>>>         passed in the 'call parameters' variable length section.
>>>         +
>>>         +The 'unused' operand is unused and likely to be removed. 
>>>         Please do not use.
>>>         +
>>>         +The 'call parameters' arguments are simply the arguments
>>>         which need to be passed to the call target.  They will be
>>>         lowered according to the specified calling convention and
>>>         otherwise handled like a normal call instruction.  The
>>>         number of arguments must exactly match what is specified in
>>>         '# call args'.  The types must match the signature of 'target'.
>>>         +
>>>         +The 'deopt parameters' arguments contain an arbitrary list
>>>         of Values which is meaningful to the runtime.  The runtime
>>>         may read any of these values, but is assumed not to modify
>>>         them.  If the garbage collector might need to modify one of
>>>         these values, it must also be listed in the 'gc pointer'
>>>         argument list.  The '# deopt args' field indicates how many
>>>         operands are to be interpreted as 'deopt parameters'.
>>>         +
>>>         +The 'gc parameters' arguments contain every pointer to a
>>>         garbage collector object which potentially needs to be
>>>         updated by the garbage collector.  Note that the argument
>>>         list must explicitly contain a base pointer for every
>>>         derived pointer listed.  The order of arguments is
>>>         unimportant.  Unlike the other variable length parameter
>>>         sets, this list is not length prefixed.
>>>         +
>>>         +Semantics:
>>>         +""""""""""
>>>         +
>>>         +A statepoint is assumed to read and write all memory.  As a
>>>         result, memory operations can not be reordered past a
>>>         statepoint.  It is illegal to mark a statepoint as being
>>>         either 'readonly' or 'readnone'.
>>>         +
>>>         +Note that legal IR can not perform any memory operation on
>>>         a 'gc pointer' argument of the statepoint in a location
>>>         statically reachable from the statepoint.  Instead, the
>>>         explicitly relocated value (from a ''gc_relocate'') must be
>>>         used.
>>>         +
>>>         +'''gc_result''' Intrinsic
>>>         +^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>         +
>>>         +Syntax:
>>>         +"""""""
>>>         +
>>>         +::
>>>         +
>>>         +      declare type*
>>>         +        @gc_result_ptr(i32 %statepoint_token)
>>>         +
>>>         +      declare fX
>>>         +        @gc_result_float(i32 %statepoint_token)
>>>         +
>>>         +      declare iX
>>>         +        @gc_result_int(i32 %statepoint_token)
>>>         +
>>>         +Overview:
>>>         +"""""""""
>>>         +
>>>         +'''gc_result''' extracts the result of the original call
>>>         instruction which was replaced by the '''gc_statepoint'''. 
>>>         The '''gc_result''' intrinsic is actually a family of three
>>>         intrinsics due to an implementation limitation.  Other than
>>>         the type of the return value, the semantics are the same.
>>>         +
>>>         +Operands:
>>>         +"""""""""
>>>         +
>>>         +The first and only argument is the '''gc.statepoint'''
>>>         which starts the safepoint sequence of which this
>>>         '''gc_result'' is a part.  Despite the typing of this as a
>>>         generic i32, *only* the value defined by a
>>>         '''gc.statepoint''' is legal here.
>>>         +
>>>         +Semantics:
>>>         +""""""""""
>>>         +
>>>         +The ''gc_result'' represents the return value of the call
>>>         target of the ''statepoint''.  The type of the ''gc_result''
>>>         must exactly match the type of the target.  If the call
>>>         target returns void, there will be no ''gc_result''.
>>>         +
>>>         +A ''gc_result'' is modeled as a 'readnone' pure function. 
>>>         It has no side effects since it is just a projection of the
>>>         return value of the previous call represented by the
>>>         ''gc_statepoint''.
>>>         +
>>>         +'''gc_relocate''' Intrinsic
>>>         +^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>         +
>>>         +Syntax:
>>>         +"""""""
>>>         +
>>>         +::
>>>         +
>>>         +      declare <type> addrspace(1)*
>>>         +        @gc_relocate(i32 %token, i32 %base_offset, i32
>>>         %pointer_offset)
>>>         +
>>>         +Overview:
>>>         +"""""""""
>>>         +
>>>         +A ''gc_relocate'' returns the potentially relocated value
>>>         of a pointer at the safepoint.
>>>         +
>>>         +Operands:
>>>         +"""""""""
>>>         +
>>>         +The first argument is the '''gc.statepoint''' which starts
>>>         the safepoint sequence of which this '''gc_relocation'' is a
>>>         part.  Despite the typing of this as a generic i32, *only*
>>>         the value defined by a '''gc.statepoint''' is legal here.
>>>         +
>>>         +The second argument is an index into the statepoints list
>>>         of arguments which specifies the base pointer for the
>>>         pointer being relocated.  This index must land within the
>>>         'gc parameter' section of the statepoint's argument list.
>>>         +
>>>         +The third argument is an index into the statepoint's list
>>>         of arguments which specify the (potentially) derived pointer
>>>         being relocated.  It is legal for this index to be the same
>>>         as the second argument if-and-only-if a base pointer is
>>>         being relocated. This index must land within the 'gc
>>>         parameter' section of the statepoint's argument list.
>>>         +
>>>         +Semantics:
>>>         +""""""""""
>>>         +The return value of ''gc_relocate'' is the potentially
>>>         relocated value of the pointer specified by it's arguments. 
>>>         It is unspecified how the value of the returned pointer
>>>         relates to the argument to the ''gc_statepoint'' other than
>>>         that a) it points to the same source language object with
>>>         the same offset, and b) the 'based-on' relationship of the
>>>         newly relocated pointers is a projection of the unrelocated
>>>         pointers.  In particular, the integer value of the pointer
>>>         returned is unspecified.
>>>         +
>>>         +A ''gc_relocate'' is modeled as a 'readnone' pure
>>>         function.  It has no side effects since it is just a way to
>>>         extract information about work done during the actual call
>>>         modeled by the ''gc_statepoint''.
>>>         +
>>>         +
>>>         +StackMap Format
>>>         +================
>>>         +
>>>         +Locations for each pointer value which may need read and/or
>>>         updated by the runtime or collector are provided via the
>>>         StackMap format specified in the PatchPoint documentation.
>>>         +
>>>         +.. TODO: link
>>>         +
>>>         +Each statepoint generates the following Locations:
>>>         +
>>>         +* Constant which describes number of following deopt
>>>         *Locations* (not operands)
>>>         +* Variable number of Locations, one for each deopt
>>>         parameter listed in the IR statepoint (same number as
>>>         described by previous Constant)
>>>         +* Variable number of Locations pairs, one pair for each
>>>         unique pointer which needs relocated.  The first Location in
>>>         each pair describes the base pointer for the object. The
>>>         second is the derived pointer actually being relocated.  It
>>>         is guaranteed that the base pointer must also appear
>>>         explicitly as a relocation pair if used after the
>>>         statepoint. There may be fewer pairs then gc parameters in
>>>         the IR statepoint. Each *unique* pair will occur at least
>>>         once; duplicates are possible.
>>>         +
>>>         +Note that the Locations used in each section may describe
>>>         the same physical location.  e.g. A stack slot may appear as
>>>         a deopt location, a gc base pointer, and a gc derived pointer.
>>>         +
>>>         +The ID field of the 'StkMapRecord' for a statepoint is
>>>         meaningless and it's value is explicitly unspecified.
>>>         +
>>>         +The LiveOut section of the StkMapRecord will be empty for a
>>>         statepoint record.
>>>         +
>>>         +Safepoint Semantics & Verification
>>>         +==================================
>>>         +
>>>         +The fundamental correctness property for the compiled
>>>         code's correctness w.r.t. the garbage collector is a dynamic
>>>         one.  It must be the case that there is no dynamic trace
>>>         such that a operation involving a potentially relocated
>>>         pointer is observably-after a safepoint which could relocate
>>>         it.  'observably-after' is this usage means that an outside
>>>         observer could observe this sequence of events in a way
>>>         which precludes the operation being performed before the
>>>         safepoint.
>>>         +
>>>         +To understand why this 'observable-after' property is
>>>         required, consider a null comparison performed on the
>>>         original copy of a relocated pointer.  Assuming that control
>>>         flow follows the safepoint, there is no way to observe
>>>         externally whether the null comparison is performed before
>>>         or after the safepoint.  (Remember, the original Value is
>>>         unmodified by the safepoint.)  The compiler is free to make
>>>         either scheduling choice.
>>>         +
>>>         +The actual correctness property implemented is slightly
>>>         stronger than this.  We require that there be no *static
>>>         path* on which a potentially relocated pointer is
>>>         'observably-after' it may have been relocated.  This is
>>>         slightly stronger than is strictly necessary (and thus may
>>>         disallow some otherwise valid programs), but greatly
>>>         simplifies reasoning about correctness of the compiled code.
>>>         +
>>>         +By construction, this property will be upheld by the
>>>         optimizer if correctly established in the source IR.  This
>>>         is a key invariant of the design.
>>>         +
>>>         +The existing IR Verifier pass has been extended to check
>>>         most of the local restrictions on the intrinsics mentioned
>>>         in their respective documentation.  The current
>>>         implementation in LLVM does not check the key relocation
>>>         invariant, but this is ongoing work on developing such a
>>>         verifier. Please ask on llvmdev if you're interested in
>>>         experimenting with the current version.
>>>         +
>>>
>>>
>>>         _______________________________________________
>>>         llvm-commits mailing list
>>>         llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>>>         http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>>
>>
>>
>>
>>     _______________________________________________
>>     llvm-commits mailing list
>>     llvm-commits at cs.uiuc.edu  <mailto:llvm-commits at cs.uiuc.edu>
>>     http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150224/6673f06d/attachment.html>


More information about the llvm-commits mailing list