[llvm] r223143 - [Statepoints 4/4] Statepoint infrastructure for garbage collection: Documentation
Philip Reames
listmail at philipreames.com
Tue Feb 24 16:24:28 PST 2015
Fixed. Other comments welcome.
On 02/24/2015 02:44 PM, Philip Reames wrote:
> Your timing is good. I'm working on docs today and should get to this
> by end of day. :)
>
> Philip
>
> On 02/24/2015 02:37 PM, Sean Silva wrote:
>> Necro-nit (wasn't sure where to post this feedback; I realize that
>> this has been slightly updated in ToT): please update the prototypes
>> here to match their current definitions (e.g. `llvm.experimental.`
>> prefix).
>>
>> (sorry for the delay in getting to this)
>>
>> -- Sean Silva
>>
>> On Tue, Dec 2, 2014 at 11:37 AM, Philip Reames
>> <listmail at philipreames.com <mailto:listmail at philipreames.com>> wrote:
>>
>> Author: reames
>> Date: Tue Dec 2 13:37:00 2014
>> New Revision: 223143
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=223143&view=rev
>> Log:
>> [Statepoints 4/4] Statepoint infrastructure for garbage
>> collection: Documentation
>>
>> This is the fourth and final patch in the statepoint series. It
>> contains the documentation for the statepoint intrinsics and
>> their usage.
>>
>> There's definitely still room to improve the documentation here,
>> but I wanted to get this landed so it was available for others.
>> There will likely be a series of small cleanup changes over the
>> next few weeks as we work to clarify and revise the
>> documentation. If you have comments or questions, please feel
>> free to discuss them either in this commit thread, the original
>> review thread, or on llvmdev. Comments are more than welcome.
>>
>> Reviewed by: atrick, ributzka
>> Differential Revision: http://reviews.llvm.org/D5683
>>
>>
>>
>> Added:
>> llvm/trunk/docs/Statepoints.rst
>>
>> Added: llvm/trunk/docs/Statepoints.rst
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/Statepoints.rst?rev=223143&view=auto
>> ==============================================================================
>> --- llvm/trunk/docs/Statepoints.rst (added)
>> +++ llvm/trunk/docs/Statepoints.rst Tue Dec 2 13:37:00 2014
>> @@ -0,0 +1,209 @@
>> +=====================================
>> +Garbage Collection Safepoints in LLVM
>> +=====================================
>> +
>> +.. contents::
>> + :local:
>> + :depth: 2
>> +
>> +Status
>> +=======
>> +
>> +This document describes a set of experimental extensions to
>> LLVM. Use with caution. Because the intrinsics have experimental
>> status, compatibility across LLVM releases is not guaranteed.
>> +
>> +LLVM currently supports an alternate mechanism for conservative
>> garbage collection support using the gc_root intrinsic. The
>> mechanism described here shares little in common with the
>> alternate implementation and it is hoped that this mechanism will
>> eventually replace the gc_root mechanism.
>> +
>> +Overview
>> +========
>> +
>> +To collect dead objects, garbage collectors must be able to
>> identify any references to objects contained within executing
>> code, and, depending on the collector, potentially update them.
>> The collector does not need this information at all points in
>> code - that would make the problem much harder - but only at well
>> defined points in the execution known as 'safepoints' For a most
>> collectors, it is sufficient to track at least one copy of each
>> unique pointer value. However, for a collector which wishes to
>> relocate objects directly reachable from running code, a higher
>> standard is required.
>> +
>> +One additional challenge is that the compiler may compute
>> intermediate results ("derived pointers") which point outside of
>> the allocation or even into the middle of another allocation.
>> The eventual use of this intermediate value must yield an address
>> within the bounds of the allocation, but such "exterior derived
>> pointers" may be visible to the collector. Given this, a garbage
>> collector can not safely rely on the runtime value of an address
>> to indicate the object it is associated with. If the garbage
>> collector wishes to move any object, the compiler must provide a
>> mapping for each pointer to an indication of its allocation.
>> +
>> +To simplify the interaction between a collector and the compiled
>> code, most garbage collectors are organized in terms of two three
>> abstractions: load barriers, store barriers, and safepoints.
>> +
>> +#. A load barrier is a bit of code executed immediately after
>> the machine load instruction, but before any use of the value
>> loaded. Depending on the collector, such a barrier may be needed
>> for all loads, merely loads of a particular type (in the original
>> source language), or none at all.
>> +#. Analogously, a store barrier is a code fragement that runs
>> immediately before the machine store instruction, but after the
>> computation of the value stored. The most common use of a store
>> barrier is to update a 'card table' in a generational garbage
>> collector.
>> +
>> +#. A safepoint is a location at which pointers visible to the
>> compiled code (i.e. currently in registers or on the stack) are
>> allowed to change. After the safepoint completes, the actual
>> pointer value may differ, but the 'object' (as seen by the source
>> language) pointed to will not.
>> +
>> + Note that the term 'safepoint' is somewhat overloaded. It
>> refers to both the location at which the machine state is
>> parsable and the coordination protocol involved in bring
>> application threads to a point at which the collector can safely
>> use that information. The term "statepoint" as used in this
>> document refers exclusively to the former.
>> +
>> +This document focuses on the last item - compiler support for
>> safepoints in generated code. We will assume that an outside
>> mechanism has decided where to place safepoints. From our
>> perspective, all safepoints will be function calls. To support
>> relocation of objects directly reachable from values in compiled
>> code, the collector must be able to:
>> +
>> +#. identify every copy of a pointer (including copies introduced
>> by the compiler itself) at the safepoint,
>> +#. identify which object each pointer relates to, and
>> +#. potentially update each of those copies.
>> +
>> +This document describes the mechanism by which an LLVM based
>> compiler can provide this information to a language
>> runtime/collector and ensure that all pointers can be read and
>> updated if desired. The heart of the approach is to construct
>> (or rewrite) the IR in a manner where the possible updates
>> performed by the garbage collector are explicitly visible in the
>> IR. Doing so requires that we:
>> +
>> +#. create a new SSA value for each potentially relocated
>> pointer, and ensure that no uses of the original (non relocated)
>> value is reachable after the safepoint,
>> +#. specify the relocation in a way which is opaque to the
>> compiler to ensure that the optimizer can not introduce new uses
>> of an unrelocated value after a statepoint. This prevents the
>> optimizer from performing unsound optimizations.
>> +#. recording a mapping of live pointers (and the allocation
>> they're associated with) for each statepoint.
>> +
>> +At the most abstract level, inserting a safepoint can be thought
>> of as replacing a call instruction with a call to a multiple
>> return value function which both calls the original target of the
>> call, returns it's result, and returns updated values for any
>> live pointers to garbage collected objects.
>> +
>> + Note that the task of identifying all live pointers to garbage
>> collected values, transforming the IR to expose a pointer giving
>> the base object for every such live pointer, and inserting all
>> the intrinsics correctly is explicitly out of scope for this
>> document. The recommended approach is described in the section
>> of Late Safepoint Placement below.
>> +
>> +This abstract function call is concretely represented by a
>> sequence of intrinsic calls known as a 'statepoint sequence'.
>> +
>> +
>> +Let's consider a simple call in LLVM IR:
>> + todo
>> +
>> +Depending on our language we may need to allow a safepoint
>> during the execution of the function called from this site. If
>> so, we need to let the collector update local values in the
>> current frame.
>> +
>> +Let's say we need to relocate SSA values 'a', 'b', and 'c' at
>> this safepoint. To represent this, we would generate the
>> statepoint sequence::
>> + put an example sequence here
>> +
>> +Ideally, this sequence would have been represented as a M
>> argument, N return value function (where M is the number of
>> values being relocated + the original call arguments and N is the
>> original return value + each relocated value), but LLVM does not
>> easily support such a representation.
>> +
>> +Instead, the statepoint intrinsic marks the actual site of the
>> safepoint or statepoint. The statepoint returns a token value
>> (which exists only at compile time). To get back the original
>> return value of the call, we use the 'gc_result' intrinsic. To
>> get the relocation of each pointer in turn, we use the
>> 'gc_relocate' intrinsic with the appropriate index. Note that
>> both the gc_relocate and gc_result are tied to the statepoint.
>> The combination forms a "statepoint sequence" and represents the
>> entitety of a parseable call or 'statepoint'.
>> +
>> +When lowered, this example would generate the following x86
>> assembly::
>> + put assembly here
>> +
>> +Each of the potentially relocated values has been spilled to the
>> stack, and a record of that location has been recorded to the
>> StackMap section. If the garbage collector needs to update any
>> of these pointers during the call, it knows exactly what to change.
>> +
>> +Intrinsics
>> +===========
>> +
>> +'''gc_statepoint''' Intrinsic
>> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> +
>> +Syntax:
>> +"""""""
>> +
>> +::
>> +
>> + declare i32
>> + @gc_statepoint(func_type <target>, i64 <#call args>.
>> + i64 <unused>, ... (call parameters),
>> + i64 <# deopt args>, ... (deopt parameters),
>> + ... (gc parameters))
>> +
>> +Overview:
>> +"""""""""
>> +
>> +The statepoint intrinsic represents a call which is parse-able
>> by the runtime.
>> +
>> +Operands:
>> +"""""""""
>> +
>> +The 'target' operand is the function actually being called. The
>> target can be specified as either a symbolic LLVM funciton, or as
>> an arbitrary Value of appropriate function type. Note that the
>> function type must match the signature of the callee and the
>> types of the 'call parameters' arguments.
>> +
>> +The '#call args' operand is the number of arguments to the
>> actual call. It must exactly match the number of arguments
>> passed in the 'call parameters' variable length section.
>> +
>> +The 'unused' operand is unused and likely to be removed. Please
>> do not use.
>> +
>> +The 'call parameters' arguments are simply the arguments which
>> need to be passed to the call target. They will be lowered
>> according to the specified calling convention and otherwise
>> handled like a normal call instruction. The number of arguments
>> must exactly match what is specified in '# call args'. The types
>> must match the signature of 'target'.
>> +
>> +The 'deopt parameters' arguments contain an arbitrary list of
>> Values which is meaningful to the runtime. The runtime may read
>> any of these values, but is assumed not to modify them. If the
>> garbage collector might need to modify one of these values, it
>> must also be listed in the 'gc pointer' argument list. The '#
>> deopt args' field indicates how many operands are to be
>> interpreted as 'deopt parameters'.
>> +
>> +The 'gc parameters' arguments contain every pointer to a garbage
>> collector object which potentially needs to be updated by the
>> garbage collector. Note that the argument list must explicitly
>> contain a base pointer for every derived pointer listed. The
>> order of arguments is unimportant. Unlike the other variable
>> length parameter sets, this list is not length prefixed.
>> +
>> +Semantics:
>> +""""""""""
>> +
>> +A statepoint is assumed to read and write all memory. As a
>> result, memory operations can not be reordered past a
>> statepoint. It is illegal to mark a statepoint as being either
>> 'readonly' or 'readnone'.
>> +
>> +Note that legal IR can not perform any memory operation on a 'gc
>> pointer' argument of the statepoint in a location statically
>> reachable from the statepoint. Instead, the explicitly relocated
>> value (from a ''gc_relocate'') must be used.
>> +
>> +'''gc_result''' Intrinsic
>> +^^^^^^^^^^^^^^^^^^^^^^^^^^
>> +
>> +Syntax:
>> +"""""""
>> +
>> +::
>> +
>> + declare type*
>> + @gc_result_ptr(i32 %statepoint_token)
>> +
>> + declare fX
>> + @gc_result_float(i32 %statepoint_token)
>> +
>> + declare iX
>> + @gc_result_int(i32 %statepoint_token)
>> +
>> +Overview:
>> +"""""""""
>> +
>> +'''gc_result''' extracts the result of the original call
>> instruction which was replaced by the '''gc_statepoint'''. The
>> '''gc_result''' intrinsic is actually a family of three
>> intrinsics due to an implementation limitation. Other than the
>> type of the return value, the semantics are the same.
>> +
>> +Operands:
>> +"""""""""
>> +
>> +The first and only argument is the '''gc.statepoint''' which
>> starts the safepoint sequence of which this '''gc_result'' is a
>> part. Despite the typing of this as a generic i32, *only* the
>> value defined by a '''gc.statepoint''' is legal here.
>> +
>> +Semantics:
>> +""""""""""
>> +
>> +The ''gc_result'' represents the return value of the call target
>> of the ''statepoint''. The type of the ''gc_result'' must
>> exactly match the type of the target. If the call target returns
>> void, there will be no ''gc_result''.
>> +
>> +A ''gc_result'' is modeled as a 'readnone' pure function. It
>> has no side effects since it is just a projection of the return
>> value of the previous call represented by the ''gc_statepoint''.
>> +
>> +'''gc_relocate''' Intrinsic
>> +^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> +
>> +Syntax:
>> +"""""""
>> +
>> +::
>> +
>> + declare <type> addrspace(1)*
>> + @gc_relocate(i32 %token, i32 %base_offset, i32
>> %pointer_offset)
>> +
>> +Overview:
>> +"""""""""
>> +
>> +A ''gc_relocate'' returns the potentially relocated value of a
>> pointer at the safepoint.
>> +
>> +Operands:
>> +"""""""""
>> +
>> +The first argument is the '''gc.statepoint''' which starts the
>> safepoint sequence of which this '''gc_relocation'' is a part.
>> Despite the typing of this as a generic i32, *only* the value
>> defined by a '''gc.statepoint''' is legal here.
>> +
>> +The second argument is an index into the statepoints list of
>> arguments which specifies the base pointer for the pointer being
>> relocated. This index must land within the 'gc parameter'
>> section of the statepoint's argument list.
>> +
>> +The third argument is an index into the statepoint's list of
>> arguments which specify the (potentially) derived pointer being
>> relocated. It is legal for this index to be the same as the
>> second argument if-and-only-if a base pointer is being relocated.
>> This index must land within the 'gc parameter' section of the
>> statepoint's argument list.
>> +
>> +Semantics:
>> +""""""""""
>> +The return value of ''gc_relocate'' is the potentially relocated
>> value of the pointer specified by it's arguments. It is
>> unspecified how the value of the returned pointer relates to the
>> argument to the ''gc_statepoint'' other than that a) it points to
>> the same source language object with the same offset, and b) the
>> 'based-on' relationship of the newly relocated pointers is a
>> projection of the unrelocated pointers. In particular, the
>> integer value of the pointer returned is unspecified.
>> +
>> +A ''gc_relocate'' is modeled as a 'readnone' pure function. It
>> has no side effects since it is just a way to extract information
>> about work done during the actual call modeled by the
>> ''gc_statepoint''.
>> +
>> +
>> +StackMap Format
>> +================
>> +
>> +Locations for each pointer value which may need read and/or
>> updated by the runtime or collector are provided via the StackMap
>> format specified in the PatchPoint documentation.
>> +
>> +.. TODO: link
>> +
>> +Each statepoint generates the following Locations:
>> +
>> +* Constant which describes number of following deopt *Locations*
>> (not operands)
>> +* Variable number of Locations, one for each deopt parameter
>> listed in the IR statepoint (same number as described by previous
>> Constant)
>> +* Variable number of Locations pairs, one pair for each unique
>> pointer which needs relocated. The first Location in each pair
>> describes the base pointer for the object. The second is the
>> derived pointer actually being relocated. It is guaranteed that
>> the base pointer must also appear explicitly as a relocation pair
>> if used after the statepoint. There may be fewer pairs then gc
>> parameters in the IR statepoint. Each *unique* pair will occur at
>> least once; duplicates are possible.
>> +
>> +Note that the Locations used in each section may describe the
>> same physical location. e.g. A stack slot may appear as a deopt
>> location, a gc base pointer, and a gc derived pointer.
>> +
>> +The ID field of the 'StkMapRecord' for a statepoint is
>> meaningless and it's value is explicitly unspecified.
>> +
>> +The LiveOut section of the StkMapRecord will be empty for a
>> statepoint record.
>> +
>> +Safepoint Semantics & Verification
>> +==================================
>> +
>> +The fundamental correctness property for the compiled code's
>> correctness w.r.t. the garbage collector is a dynamic one. It
>> must be the case that there is no dynamic trace such that a
>> operation involving a potentially relocated pointer is
>> observably-after a safepoint which could relocate it.
>> 'observably-after' is this usage means that an outside observer
>> could observe this sequence of events in a way which precludes
>> the operation being performed before the safepoint.
>> +
>> +To understand why this 'observable-after' property is required,
>> consider a null comparison performed on the original copy of a
>> relocated pointer. Assuming that control flow follows the
>> safepoint, there is no way to observe externally whether the null
>> comparison is performed before or after the safepoint.
>> (Remember, the original Value is unmodified by the safepoint.)
>> The compiler is free to make either scheduling choice.
>> +
>> +The actual correctness property implemented is slightly stronger
>> than this. We require that there be no *static path* on which a
>> potentially relocated pointer is 'observably-after' it may have
>> been relocated. This is slightly stronger than is strictly
>> necessary (and thus may disallow some otherwise valid programs),
>> but greatly simplifies reasoning about correctness of the
>> compiled code.
>> +
>> +By construction, this property will be upheld by the optimizer
>> if correctly established in the source IR. This is a key
>> invariant of the design.
>> +
>> +The existing IR Verifier pass has been extended to check most of
>> the local restrictions on the intrinsics mentioned in their
>> respective documentation. The current implementation in LLVM
>> does not check the key relocation invariant, but this is ongoing
>> work on developing such a verifier. Please ask on llvmdev if
>> you're interested in experimenting with the current version.
>> +
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>>
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150224/b40759bc/attachment.html>
More information about the llvm-commits
mailing list