[llvm] r223143 - [Statepoints 4/4] Statepoint infrastructure for garbage collection: Documentation

Sean Silva chisophugis at gmail.com
Thu Feb 26 14:29:07 PST 2015


On Wed, Feb 25, 2015 at 3:04 PM, Philip Reames <listmail at philipreames.com>
wrote:

>
> On 02/25/2015 02:14 PM, Sean Silva wrote:
>
>
>
> On Tue, Feb 24, 2015 at 5:31 PM, Philip Reames <listmail at philipreames.com>
> wrote:
>
>>  On the queue for tomorrow.
>>
>> Other things which need to happen:
>> - Move intrinsic definitions into LangRef
>>
>
>  Please don't do this until they are non-experimental (or have they
> become non-experimental?). Actually, I don't think this would be the first
> time we've promoted instrinsics to non-experimental, and I'm not sure how
> to handle it best (what to move and what to copy, etc.).
>
> They are still experimental at this point.  I'll abide by your wishes as
> that seems to fit what's in LangRef now.  I will add a section to the
> LangRef linking to the Statepoint docs analogous for what is done for
> patchpoints.
>

Great, thanks!

-- Sean Silva


>
>
>
>
>>  - Flesh out a description of the "statepoint-example" GC.
>> - Document the fact there's no a form of statepoint sequence without
>> explicitly relocations, update code with asserts & flags respectively
>>
>> I'm considering just removing the Statepoints page entirely and merging
>> the content into GarbageCollection.  I probably wont actually go ahead with
>> that just yet.
>>
>
>  If you do this, please leave the Statepoints.rst page empty with a link
> to GarbageCollection.rst; that way, links to Statepoints across the net
> don't break. The real solution is for the server to serve an http redirect,
> but we don't have a way of indicating that currently.
>
>
>>
>> I also need a place to transcribe my private TODO list somewhere public.
>> The docs probably aren't the right place for this though.
>>
>>
>> On 02/24/2015 04:56 PM, Sean Silva wrote:
>>
>> There are a couple todo/"put assembly here" in the file currently. It
>> would be nice to flesh those out.
>>
>> On Tue, Feb 24, 2015 at 4:24 PM, Philip Reames <listmail at philipreames.com
>> > wrote:
>>
>>>  Fixed.  Other comments welcome.
>>>
>>>
>>> On 02/24/2015 02:44 PM, Philip Reames wrote:
>>>
>>> Your timing is good.  I'm working on docs today and should get to this
>>> by end of day.  :)
>>>
>>> Philip
>>>
>>> On 02/24/2015 02:37 PM, Sean Silva wrote:
>>>
>>> Necro-nit (wasn't sure where to post this feedback; I realize that this
>>> has been slightly updated in ToT): please update the prototypes here to
>>> match their current definitions (e.g. `llvm.experimental.` prefix).
>>>
>>>  (sorry for the delay in getting to this)
>>>
>>>  -- Sean Silva
>>>
>>> On Tue, Dec 2, 2014 at 11:37 AM, Philip Reames <
>>> listmail at philipreames.com> wrote:
>>>
>>>> Author: reames
>>>> Date: Tue Dec  2 13:37:00 2014
>>>> New Revision: 223143
>>>>
>>>> URL: http://llvm.org/viewvc/llvm-project?rev=223143&view=rev
>>>> Log:
>>>> [Statepoints 4/4] Statepoint infrastructure for garbage collection:
>>>> Documentation
>>>>
>>>> This is the fourth and final patch in the statepoint series.  It
>>>> contains the documentation for the statepoint intrinsics and their usage.
>>>>
>>>> There's definitely still room to improve the documentation here, but I
>>>> wanted to get this landed so it was available for others.  There will
>>>> likely be a series of small cleanup changes over the next few weeks as we
>>>> work to clarify and revise the documentation.  If you have comments or
>>>> questions, please feel free to discuss them either in this commit thread,
>>>> the original review thread, or on llvmdev.  Comments are more than welcome.
>>>>
>>>> Reviewed by: atrick, ributzka
>>>> Differential Revision: http://reviews.llvm.org/D5683
>>>>
>>>>
>>>>
>>>> Added:
>>>>     llvm/trunk/docs/Statepoints.rst
>>>>
>>>> Added: llvm/trunk/docs/Statepoints.rst
>>>> URL:
>>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/Statepoints.rst?rev=223143&view=auto
>>>>
>>>> ==============================================================================
>>>> --- llvm/trunk/docs/Statepoints.rst (added)
>>>> +++ llvm/trunk/docs/Statepoints.rst Tue Dec  2 13:37:00 2014
>>>> @@ -0,0 +1,209 @@
>>>> +=====================================
>>>> +Garbage Collection Safepoints in LLVM
>>>> +=====================================
>>>> +
>>>> +.. contents::
>>>> +   :local:
>>>> +   :depth: 2
>>>> +
>>>> +Status
>>>> +=======
>>>> +
>>>> +This document describes a set of experimental extensions to LLVM. Use
>>>> with caution.  Because the intrinsics have experimental status,
>>>> compatibility across LLVM releases is not guaranteed.
>>>> +
>>>> +LLVM currently supports an alternate mechanism for conservative
>>>> garbage collection support using the gc_root intrinsic.  The mechanism
>>>> described here shares little in common with the alternate implementation
>>>> and it is hoped that this mechanism will eventually replace the gc_root
>>>> mechanism.
>>>> +
>>>> +Overview
>>>> +========
>>>> +
>>>> +To collect dead objects, garbage collectors must be able to identify
>>>> any references to objects contained within executing code, and, depending
>>>> on the collector, potentially update them.  The collector does not need
>>>> this information at all points in code - that would make the problem much
>>>> harder - but only at well defined points in the execution known as
>>>> 'safepoints'  For a most collectors, it is sufficient to track at least one
>>>> copy of each unique pointer value.  However, for a collector which wishes
>>>> to relocate objects directly reachable from running code, a higher standard
>>>> is required.
>>>> +
>>>> +One additional challenge is that the compiler may compute intermediate
>>>> results ("derived pointers") which point outside of the allocation or even
>>>> into the middle of another allocation.  The eventual use of this
>>>> intermediate value must yield an address within the bounds of the
>>>> allocation, but such "exterior derived pointers" may be visible to the
>>>> collector.  Given this, a garbage collector can not safely rely on the
>>>> runtime value of an address to indicate the object it is associated with.
>>>> If the garbage collector wishes to move any object, the compiler must
>>>> provide a mapping for each pointer to an indication of its allocation.
>>>> +
>>>> +To simplify the interaction between a collector and the compiled code,
>>>> most garbage collectors are organized in terms of two three abstractions:
>>>> load barriers, store barriers, and safepoints.
>>>> +
>>>> +#. A load barrier is a bit of code executed immediately after the
>>>> machine load instruction, but before any use of the value loaded.
>>>> Depending on the collector, such a barrier may be needed for all loads,
>>>> merely loads of a particular type (in the original source language), or
>>>> none at all.
>>>> +#. Analogously, a store barrier is a code fragement that runs
>>>> immediately before the machine store instruction, but after the computation
>>>> of the value stored.  The most common use of a store barrier is to update a
>>>> 'card table' in a generational garbage collector.
>>>> +
>>>> +#. A safepoint is a location at which pointers visible to the compiled
>>>> code (i.e. currently in registers or on the stack) are allowed to change.
>>>> After the safepoint completes, the actual pointer value may differ, but the
>>>> 'object' (as seen by the source language) pointed to will not.
>>>> +
>>>> +  Note that the term 'safepoint' is somewhat overloaded.  It refers to
>>>> both the location at which the machine state is parsable and the
>>>> coordination protocol involved in bring application threads to a point at
>>>> which the collector can safely use that information.  The term "statepoint"
>>>> as used in this document refers exclusively to the former.
>>>> +
>>>> +This document focuses on the last item - compiler support for
>>>> safepoints in generated code.  We will assume that an outside mechanism has
>>>> decided where to place safepoints.  From our perspective, all safepoints
>>>> will be function calls.  To support relocation of objects directly
>>>> reachable from values in compiled code, the collector must be able to:
>>>> +
>>>> +#. identify every copy of a pointer (including copies introduced by
>>>> the compiler itself) at the safepoint,
>>>> +#. identify which object each pointer relates to, and
>>>> +#. potentially update each of those copies.
>>>> +
>>>> +This document describes the mechanism by which an LLVM based compiler
>>>> can provide this information to a language runtime/collector and ensure
>>>> that all pointers can be read and updated if desired.  The heart of the
>>>> approach is to construct (or rewrite) the IR in a manner where the possible
>>>> updates performed by the garbage collector are explicitly visible in the
>>>> IR.  Doing so requires that we:
>>>> +
>>>> +#. create a new SSA value for each potentially relocated pointer, and
>>>> ensure that no uses of the original (non relocated) value is reachable
>>>> after the safepoint,
>>>> +#. specify the relocation in a way which is opaque to the compiler to
>>>> ensure that the optimizer can not introduce new uses of an unrelocated
>>>> value after a statepoint. This prevents the optimizer from performing
>>>> unsound optimizations.
>>>> +#. recording a mapping of live pointers (and the allocation they're
>>>> associated with) for each statepoint.
>>>> +
>>>> +At the most abstract level, inserting a safepoint can be thought of as
>>>> replacing a call instruction with a call to a multiple return value
>>>> function which both calls the original target of the call, returns it's
>>>> result, and returns updated values for any live pointers to garbage
>>>> collected objects.
>>>> +
>>>> +  Note that the task of identifying all live pointers to garbage
>>>> collected values, transforming the IR to expose a pointer giving the base
>>>> object for every such live pointer, and inserting all the intrinsics
>>>> correctly is explicitly out of scope for this document.  The recommended
>>>> approach is described in the section of Late Safepoint Placement below.
>>>> +
>>>> +This abstract function call is concretely represented by a sequence of
>>>> intrinsic calls known as a 'statepoint sequence'.
>>>> +
>>>> +
>>>> +Let's consider a simple call in LLVM IR:
>>>> +  todo
>>>> +
>>>> +Depending on our language we may need to allow a safepoint during the
>>>> execution of the function called from this site.  If so, we need to let the
>>>> collector update local values in the current frame.
>>>> +
>>>> +Let's say we need to relocate SSA values 'a', 'b', and 'c' at this
>>>> safepoint.  To represent this, we would generate the statepoint sequence::
>>>> +  put an example sequence here
>>>> +
>>>> +Ideally, this sequence would have been represented as a M argument, N
>>>> return value function (where M is the number of values being relocated +
>>>> the original call arguments and N is the original return value + each
>>>> relocated value), but LLVM does not easily support such a representation.
>>>> +
>>>> +Instead, the statepoint intrinsic marks the actual site of the
>>>> safepoint or statepoint.  The statepoint returns a token value (which
>>>> exists only at compile time).  To get back the original return value of the
>>>> call, we use the 'gc_result' intrinsic.  To get the relocation of each
>>>> pointer in turn, we use the 'gc_relocate' intrinsic with the appropriate
>>>> index.  Note that both the gc_relocate and gc_result are tied to the
>>>> statepoint.  The combination forms a "statepoint sequence" and represents
>>>> the entitety of a parseable call or 'statepoint'.
>>>> +
>>>> +When lowered, this example would generate the following x86 assembly::
>>>> +  put assembly here
>>>> +
>>>> +Each of the potentially relocated values has been spilled to the
>>>> stack, and a record of that location has been recorded to the StackMap
>>>> section.  If the garbage collector needs to update any of these pointers
>>>> during the call, it knows exactly what to change.
>>>> +
>>>> +Intrinsics
>>>> +===========
>>>> +
>>>> +'''gc_statepoint''' Intrinsic
>>>> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>> +
>>>> +Syntax:
>>>> +"""""""
>>>> +
>>>> +::
>>>> +
>>>> +      declare i32
>>>> +        @gc_statepoint(func_type <target>, i64 <#call args>.
>>>> +                       i64 <unused>, ... (call parameters),
>>>> +                       i64 <# deopt args>, ... (deopt parameters),
>>>> +                       ... (gc parameters))
>>>> +
>>>> +Overview:
>>>> +"""""""""
>>>> +
>>>> +The statepoint intrinsic represents a call which is parse-able by the
>>>> runtime.
>>>> +
>>>> +Operands:
>>>> +"""""""""
>>>> +
>>>> +The 'target' operand is the function actually being called.  The
>>>> target can be specified as either a symbolic LLVM funciton, or as an
>>>> arbitrary Value of appropriate function type.  Note that the function type
>>>> must match the signature of the callee and the types of the 'call
>>>> parameters' arguments.
>>>> +
>>>> +The '#call args' operand is the number of arguments to the actual
>>>> call.  It must exactly match the number of arguments passed in the 'call
>>>> parameters' variable length section.
>>>> +
>>>> +The 'unused' operand is unused and likely to be removed.  Please do
>>>> not use.
>>>> +
>>>> +The 'call parameters' arguments are simply the arguments which need to
>>>> be passed to the call target.  They will be lowered according to the
>>>> specified calling convention and otherwise handled like a normal call
>>>> instruction.  The number of arguments must exactly match what is specified
>>>> in '# call args'.  The types must match the signature of 'target'.
>>>> +
>>>> +The 'deopt parameters' arguments contain an arbitrary list of Values
>>>> which is meaningful to the runtime.  The runtime may read any of these
>>>> values, but is assumed not to modify them.  If the garbage collector might
>>>> need to modify one of these values, it must also be listed in the 'gc
>>>> pointer' argument list.  The '# deopt args' field indicates how many
>>>> operands are to be interpreted as 'deopt parameters'.
>>>> +
>>>> +The 'gc parameters' arguments contain every pointer to a garbage
>>>> collector object which potentially needs to be updated by the garbage
>>>> collector.  Note that the argument list must explicitly contain a base
>>>> pointer for every derived pointer listed.  The order of arguments is
>>>> unimportant.  Unlike the other variable length parameter sets, this list is
>>>> not length prefixed.
>>>> +
>>>> +Semantics:
>>>> +""""""""""
>>>> +
>>>> +A statepoint is assumed to read and write all memory.  As a result,
>>>> memory operations can not be reordered past a statepoint.  It is illegal to
>>>> mark a statepoint as being either 'readonly' or 'readnone'.
>>>> +
>>>> +Note that legal IR can not perform any memory operation on a 'gc
>>>> pointer' argument of the statepoint in a location statically reachable from
>>>> the statepoint.  Instead, the explicitly relocated value (from a
>>>> ''gc_relocate'') must be used.
>>>> +
>>>> +'''gc_result''' Intrinsic
>>>> +^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>> +
>>>> +Syntax:
>>>> +"""""""
>>>> +
>>>> +::
>>>> +
>>>> +      declare type*
>>>> +        @gc_result_ptr(i32 %statepoint_token)
>>>> +
>>>> +      declare fX
>>>> +        @gc_result_float(i32 %statepoint_token)
>>>> +
>>>> +      declare iX
>>>> +        @gc_result_int(i32 %statepoint_token)
>>>> +
>>>> +Overview:
>>>> +"""""""""
>>>> +
>>>> +'''gc_result''' extracts the result of the original call instruction
>>>> which was replaced by the '''gc_statepoint'''.  The '''gc_result'''
>>>> intrinsic is actually a family of three intrinsics due to an implementation
>>>> limitation.  Other than the type of the return value, the semantics are the
>>>> same.
>>>> +
>>>> +Operands:
>>>> +"""""""""
>>>> +
>>>> +The first and only argument is the '''gc.statepoint''' which starts
>>>> the safepoint sequence of which this '''gc_result'' is a part.  Despite the
>>>> typing of this as a generic i32, *only* the value defined by a
>>>> '''gc.statepoint''' is legal here.
>>>> +
>>>> +Semantics:
>>>> +""""""""""
>>>> +
>>>> +The ''gc_result'' represents the return value of the call target of
>>>> the ''statepoint''.  The type of the ''gc_result'' must exactly match the
>>>> type of the target.  If the call target returns void, there will be no
>>>> ''gc_result''.
>>>> +
>>>> +A ''gc_result'' is modeled as a 'readnone' pure function.  It has no
>>>> side effects since it is just a projection of the return value of the
>>>> previous call represented by the ''gc_statepoint''.
>>>> +
>>>> +'''gc_relocate''' Intrinsic
>>>> +^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>> +
>>>> +Syntax:
>>>> +"""""""
>>>> +
>>>> +::
>>>> +
>>>> +      declare <type> addrspace(1)*
>>>> +        @gc_relocate(i32 %token, i32 %base_offset, i32 %pointer_offset)
>>>> +
>>>> +Overview:
>>>> +"""""""""
>>>> +
>>>> +A ''gc_relocate'' returns the potentially relocated value of a pointer
>>>> at the safepoint.
>>>> +
>>>> +Operands:
>>>> +"""""""""
>>>> +
>>>> +The first argument is the '''gc.statepoint''' which starts the
>>>> safepoint sequence of which this '''gc_relocation'' is a part.  Despite the
>>>> typing of this as a generic i32, *only* the value defined by a
>>>> '''gc.statepoint''' is legal here.
>>>> +
>>>> +The second argument is an index into the statepoints list of arguments
>>>> which specifies the base pointer for the pointer being relocated.  This
>>>> index must land within the 'gc parameter' section of the statepoint's
>>>> argument list.
>>>> +
>>>> +The third argument is an index into the statepoint's list of arguments
>>>> which specify the (potentially) derived pointer being relocated.  It is
>>>> legal for this index to be the same as the second argument if-and-only-if a
>>>> base pointer is being relocated. This index must land within the 'gc
>>>> parameter' section of the statepoint's argument list.
>>>> +
>>>> +Semantics:
>>>> +""""""""""
>>>> +The return value of ''gc_relocate'' is the potentially relocated value
>>>> of the pointer specified by it's arguments.  It is unspecified how the
>>>> value of the returned pointer relates to the argument to the
>>>> ''gc_statepoint'' other than that a) it points to the same source language
>>>> object with the same offset, and b) the 'based-on' relationship of the
>>>> newly relocated pointers is a projection of the unrelocated pointers.  In
>>>> particular, the integer value of the pointer returned is unspecified.
>>>> +
>>>> +A ''gc_relocate'' is modeled as a 'readnone' pure function.  It has no
>>>> side effects since it is just a way to extract information about work done
>>>> during the actual call modeled by the ''gc_statepoint''.
>>>> +
>>>> +
>>>> +StackMap Format
>>>> +================
>>>> +
>>>> +Locations for each pointer value which may need read and/or updated by
>>>> the runtime or collector are provided via the StackMap format specified in
>>>> the PatchPoint documentation.
>>>> +
>>>> +.. TODO: link
>>>> +
>>>> +Each statepoint generates the following Locations:
>>>> +
>>>> +* Constant which describes number of following deopt *Locations* (not
>>>> operands)
>>>> +* Variable number of Locations, one for each deopt parameter listed in
>>>> the IR statepoint (same number as described by previous Constant)
>>>> +* Variable number of Locations pairs, one pair for each unique pointer
>>>> which needs relocated.  The first Location in each pair describes the base
>>>> pointer for the object.  The second is the derived pointer actually being
>>>> relocated.  It is guaranteed that the base pointer must also appear
>>>> explicitly as a relocation pair if used after the statepoint. There may be
>>>> fewer pairs then gc parameters in the IR statepoint. Each *unique* pair
>>>> will occur at least once; duplicates are possible.
>>>> +
>>>> +Note that the Locations used in each section may describe the same
>>>> physical location.  e.g. A stack slot may appear as a deopt location, a gc
>>>> base pointer, and a gc derived pointer.
>>>> +
>>>> +The ID field of the 'StkMapRecord' for a statepoint is meaningless and
>>>> it's value is explicitly unspecified.
>>>> +
>>>> +The LiveOut section of the StkMapRecord will be empty for a statepoint
>>>> record.
>>>> +
>>>> +Safepoint Semantics & Verification
>>>> +==================================
>>>> +
>>>> +The fundamental correctness property for the compiled code's
>>>> correctness w.r.t. the garbage collector is a dynamic one.  It must be the
>>>> case that there is no dynamic trace such that a operation involving a
>>>> potentially relocated pointer is observably-after a safepoint which could
>>>> relocate it.  'observably-after' is this usage means that an outside
>>>> observer could observe this sequence of events in a way which precludes the
>>>> operation being performed before the safepoint.
>>>> +
>>>> +To understand why this 'observable-after' property is required,
>>>> consider a null comparison performed on the original copy of a relocated
>>>> pointer.  Assuming that control flow follows the safepoint, there is no way
>>>> to observe externally whether the null comparison is performed before or
>>>> after the safepoint.  (Remember, the original Value is unmodified by the
>>>> safepoint.)  The compiler is free to make either scheduling choice.
>>>> +
>>>> +The actual correctness property implemented is slightly stronger than
>>>> this.  We require that there be no *static path* on which a potentially
>>>> relocated pointer is 'observably-after' it may have been relocated.  This
>>>> is slightly stronger than is strictly necessary (and thus may disallow some
>>>> otherwise valid programs), but greatly simplifies reasoning about
>>>> correctness of the compiled code.
>>>> +
>>>> +By construction, this property will be upheld by the optimizer if
>>>> correctly established in the source IR.  This is a key invariant of the
>>>> design.
>>>> +
>>>> +The existing IR Verifier pass has been extended to check most of the
>>>> local restrictions on the intrinsics mentioned in their respective
>>>> documentation.  The current implementation in LLVM does not check the key
>>>> relocation invariant, but this is ongoing work on developing such a
>>>> verifier.  Please ask on llvmdev if you're interested in experimenting with
>>>> the current version.
>>>> +
>>>>
>>>>
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing listllvm-commits at cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>>
>>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150226/2b2041f6/attachment.html>


More information about the llvm-commits mailing list