<html>
  <head>
    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    On the queue for tomorrow.<br>
    <br>
    Other things which need to happen:<br>
    - Move intrinsic definitions into LangRef<br>
    - Flesh out a description of the "statepoint-example" GC.<br>
    - Document the fact there's no a form of statepoint sequence without
    explicitly relocations, update code with asserts & flags
    respectively<br>
    <br>
    I'm considering just removing the Statepoints page entirely and
    merging the content into GarbageCollection.  I probably wont
    actually go ahead with that just yet.<br>
    <br>
    I also need a place to transcribe my private TODO list somewhere
    public.  The docs probably aren't the right place for this though.<br>
    <br>
    <div class="moz-cite-prefix">On 02/24/2015 04:56 PM, Sean Silva
      wrote:<br>
    </div>
    <blockquote
cite="mid:CAHnXoakEmEbr6hAa9oEDaVQe60Qf9raCr_z_4v1OHxh69s6AmQ@mail.gmail.com"
      type="cite">
      <div dir="ltr">There are a couple todo/"put assembly here" in the
        file currently. It would be nice to flesh those out.</div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Tue, Feb 24, 2015 at 4:24 PM, Philip
          Reames <span dir="ltr"><<a moz-do-not-send="true"
              href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>></span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div bgcolor="#FFFFFF" text="#000000"> Fixed.  Other
              comments welcome.
              <div>
                <div class="h5"><br>
                  <br>
                  <div>On 02/24/2015 02:44 PM, Philip Reames wrote:<br>
                  </div>
                  <blockquote type="cite"> Your timing is good.  I'm
                    working on docs today and should get to this by end
                    of day.  :)<br>
                    <br>
                    Philip<br>
                    <br>
                    <div>On 02/24/2015 02:37 PM, Sean Silva wrote:<br>
                    </div>
                    <blockquote type="cite">
                      <div dir="ltr">Necro-nit (wasn't sure where to
                        post this feedback; I realize that this has been
                        slightly updated in ToT): please update the
                        prototypes here to match their current
                        definitions (e.g. `llvm.experimental.` prefix).
                        <div><br>
                        </div>
                        <div>(sorry for the delay in getting to this)</div>
                        <div><br>
                        </div>
                        <div>-- Sean Silva</div>
                      </div>
                      <div class="gmail_extra"><br>
                        <div class="gmail_quote">On Tue, Dec 2, 2014 at
                          11:37 AM, Philip Reames <span dir="ltr"><<a
                              moz-do-not-send="true"
                              href="mailto:listmail@philipreames.com"
                              target="_blank">listmail@philipreames.com</a>></span>
                          wrote:<br>
                          <blockquote class="gmail_quote"
                            style="margin:0 0 0 .8ex;border-left:1px
                            #ccc solid;padding-left:1ex">Author: reames<br>
                            Date: Tue Dec  2 13:37:00 2014<br>
                            New Revision: 223143<br>
                            <br>
                            URL: <a moz-do-not-send="true"
                              href="http://llvm.org/viewvc/llvm-project?rev=223143&view=rev"
                              target="_blank">http://llvm.org/viewvc/llvm-project?rev=223143&view=rev</a><br>
                            Log:<br>
                            [Statepoints 4/4] Statepoint infrastructure
                            for garbage collection: Documentation<br>
                            <br>
                            This is the fourth and final patch in the
                            statepoint series.  It contains the
                            documentation for the statepoint intrinsics
                            and their usage.<br>
                            <br>
                            There's definitely still room to improve the
                            documentation here, but I wanted to get this
                            landed so it was available for others. 
                            There will likely be a series of small
                            cleanup changes over the next few weeks as
                            we work to clarify and revise the
                            documentation.  If you have comments or
                            questions, please feel free to discuss them
                            either in this commit thread, the original
                            review thread, or on llvmdev.  Comments are
                            more than welcome.<br>
                            <br>
                            Reviewed by: atrick, ributzka<br>
                            Differential Revision: <a
                              moz-do-not-send="true"
                              href="http://reviews.llvm.org/D5683"
                              target="_blank">http://reviews.llvm.org/D5683</a><br>
                            <br>
                            <br>
                            <br>
                            Added:<br>
                                llvm/trunk/docs/Statepoints.rst<br>
                            <br>
                            Added: llvm/trunk/docs/Statepoints.rst<br>
                            URL: <a moz-do-not-send="true"
href="http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/Statepoints.rst?rev=223143&view=auto"
                              target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/Statepoints.rst?rev=223143&view=auto</a><br>
==============================================================================<br>
                            --- llvm/trunk/docs/Statepoints.rst (added)<br>
                            +++ llvm/trunk/docs/Statepoints.rst Tue Dec 
                            2 13:37:00 2014<br>
                            @@ -0,0 +1,209 @@<br>
                            +=====================================<br>
                            +Garbage Collection Safepoints in LLVM<br>
                            +=====================================<br>
                            +<br>
                            +.. contents::<br>
                            +   :local:<br>
                            +   :depth: 2<br>
                            +<br>
                            +Status<br>
                            +=======<br>
                            +<br>
                            +This document describes a set of
                            experimental extensions to LLVM. Use with
                            caution.  Because the intrinsics have
                            experimental status, compatibility across
                            LLVM releases is not guaranteed.<br>
                            +<br>
                            +LLVM currently supports an alternate
                            mechanism for conservative garbage
                            collection support using the gc_root
                            intrinsic.  The mechanism described here
                            shares little in common with the alternate
                            implementation and it is hoped that this
                            mechanism will eventually replace the
                            gc_root mechanism.<br>
                            +<br>
                            +Overview<br>
                            +========<br>
                            +<br>
                            +To collect dead objects, garbage collectors
                            must be able to identify any references to
                            objects contained within executing code,
                            and, depending on the collector, potentially
                            update them.  The collector does not need
                            this information at all points in code -
                            that would make the problem much harder -
                            but only at well defined points in the
                            execution known as 'safepoints'  For a most
                            collectors, it is sufficient to track at
                            least one copy of each unique pointer
                            value.  However, for a collector which
                            wishes to relocate objects directly
                            reachable from running code, a higher
                            standard is required.<br>
                            +<br>
                            +One additional challenge is that the
                            compiler may compute intermediate results
                            ("derived pointers") which point outside of
                            the allocation or even into the middle of
                            another allocation.  The eventual use of
                            this intermediate value must yield an
                            address within the bounds of the allocation,
                            but such "exterior derived pointers" may be
                            visible to the collector.  Given this, a
                            garbage collector can not safely rely on the
                            runtime value of an address to indicate the
                            object it is associated with.  If the
                            garbage collector wishes to move any object,
                            the compiler must provide a mapping for each
                            pointer to an indication of its allocation.<br>
                            +<br>
                            +To simplify the interaction between a
                            collector and the compiled code, most
                            garbage collectors are organized in terms of
                            two three abstractions: load barriers, store
                            barriers, and safepoints.<br>
                            +<br>
                            +#. A load barrier is a bit of code executed
                            immediately after the machine load
                            instruction, but before any use of the value
                            loaded.  Depending on the collector, such a
                            barrier may be needed for all loads, merely
                            loads of a particular type (in the original
                            source language), or none at all.<br>
                            +#. Analogously, a store barrier is a code
                            fragement that runs immediately before the
                            machine store instruction, but after the
                            computation of the value stored.  The most
                            common use of a store barrier is to update a
                            'card table' in a generational garbage
                            collector.<br>
                            +<br>
                            +#. A safepoint is a location at which
                            pointers visible to the compiled code (i.e.
                            currently in registers or on the stack) are
                            allowed to change.  After the safepoint
                            completes, the actual pointer value may
                            differ, but the 'object' (as seen by the
                            source language) pointed to will not.<br>
                            +<br>
                            +  Note that the term 'safepoint' is
                            somewhat overloaded.  It refers to both the
                            location at which the machine state is
                            parsable and the coordination protocol
                            involved in bring application threads to a
                            point at which the collector can safely use
                            that information.  The term "statepoint" as
                            used in this document refers exclusively to
                            the former.<br>
                            +<br>
                            +This document focuses on the last item -
                            compiler support for safepoints in generated
                            code.  We will assume that an outside
                            mechanism has decided where to place
                            safepoints.  From our perspective, all
                            safepoints will be function calls.  To
                            support relocation of objects directly
                            reachable from values in compiled code, the
                            collector must be able to:<br>
                            +<br>
                            +#. identify every copy of a pointer
                            (including copies introduced by the compiler
                            itself) at the safepoint,<br>
                            +#. identify which object each pointer
                            relates to, and<br>
                            +#. potentially update each of those copies.<br>
                            +<br>
                            +This document describes the mechanism by
                            which an LLVM based compiler can provide
                            this information to a language
                            runtime/collector and ensure that all
                            pointers can be read and updated if
                            desired.  The heart of the approach is to
                            construct (or rewrite) the IR in a manner
                            where the possible updates performed by the
                            garbage collector are explicitly visible in
                            the IR.  Doing so requires that we:<br>
                            +<br>
                            +#. create a new SSA value for each
                            potentially relocated pointer, and ensure
                            that no uses of the original (non relocated)
                            value is reachable after the safepoint,<br>
                            +#. specify the relocation in a way which is
                            opaque to the compiler to ensure that the
                            optimizer can not introduce new uses of an
                            unrelocated value after a statepoint. This
                            prevents the optimizer from performing
                            unsound optimizations.<br>
                            +#. recording a mapping of live pointers
                            (and the allocation they're associated with)
                            for each statepoint.<br>
                            +<br>
                            +At the most abstract level, inserting a
                            safepoint can be thought of as replacing a
                            call instruction with a call to a multiple
                            return value function which both calls the
                            original target of the call, returns it's
                            result, and returns updated values for any
                            live pointers to garbage collected objects.<br>
                            +<br>
                            +  Note that the task of identifying all
                            live pointers to garbage collected values,
                            transforming the IR to expose a pointer
                            giving the base object for every such live
                            pointer, and inserting all the intrinsics
                            correctly is explicitly out of scope for
                            this document.  The recommended approach is
                            described in the section of Late Safepoint
                            Placement below.<br>
                            +<br>
                            +This abstract function call is concretely
                            represented by a sequence of intrinsic calls
                            known as a 'statepoint sequence'.<br>
                            +<br>
                            +<br>
                            +Let's consider a simple call in LLVM IR:<br>
                            +  todo<br>
                            +<br>
                            +Depending on our language we may need to
                            allow a safepoint during the execution of
                            the function called from this site.  If so,
                            we need to let the collector update local
                            values in the current frame.<br>
                            +<br>
                            +Let's say we need to relocate SSA values
                            'a', 'b', and 'c' at this safepoint.  To
                            represent this, we would generate the
                            statepoint sequence::<br>
                            +  put an example sequence here<br>
                            +<br>
                            +Ideally, this sequence would have been
                            represented as a M argument, N return value
                            function (where M is the number of values
                            being relocated + the original call
                            arguments and N is the original return value
                            + each relocated value), but LLVM does not
                            easily support such a representation.<br>
                            +<br>
                            +Instead, the statepoint intrinsic marks the
                            actual site of the safepoint or statepoint. 
                            The statepoint returns a token value (which
                            exists only at compile time).  To get back
                            the original return value of the call, we
                            use the 'gc_result' intrinsic.  To get the
                            relocation of each pointer in turn, we use
                            the 'gc_relocate' intrinsic with the
                            appropriate index.  Note that both the
                            gc_relocate and gc_result are tied to the
                            statepoint.  The combination forms a
                            "statepoint sequence" and represents the
                            entitety of a parseable call or
                            'statepoint'.<br>
                            +<br>
                            +When lowered, this example would generate
                            the following x86 assembly::<br>
                            +  put assembly here<br>
                            +<br>
                            +Each of the potentially relocated values
                            has been spilled to the stack, and a record
                            of that location has been recorded to the
                            StackMap section.  If the garbage collector
                            needs to update any of these pointers during
                            the call, it knows exactly what to change.<br>
                            +<br>
                            +Intrinsics<br>
                            +===========<br>
                            +<br>
                            +'''gc_statepoint''' Intrinsic<br>
                            +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^<br>
                            +<br>
                            +Syntax:<br>
                            +"""""""<br>
                            +<br>
                            +::<br>
                            +<br>
                            +      declare i32<br>
                            +        @gc_statepoint(func_type
                            <target>, i64 <#call args>.<br>
                            +                       i64 <unused>,
                            ... (call parameters),<br>
                            +                       i64 <# deopt
                            args>, ... (deopt parameters),<br>
                            +                       ... (gc parameters))<br>
                            +<br>
                            +Overview:<br>
                            +"""""""""<br>
                            +<br>
                            +The statepoint intrinsic represents a call
                            which is parse-able by the runtime.<br>
                            +<br>
                            +Operands:<br>
                            +"""""""""<br>
                            +<br>
                            +The 'target' operand is the function
                            actually being called.  The target can be
                            specified as either a symbolic LLVM
                            funciton, or as an arbitrary Value of
                            appropriate function type.  Note that the
                            function type must match the signature of
                            the callee and the types of the 'call
                            parameters' arguments.<br>
                            +<br>
                            +The '#call args' operand is the number of
                            arguments to the actual call.  It must
                            exactly match the number of arguments passed
                            in the 'call parameters' variable length
                            section.<br>
                            +<br>
                            +The 'unused' operand is unused and likely
                            to be removed.  Please do not use.<br>
                            +<br>
                            +The 'call parameters' arguments are simply
                            the arguments which need to be passed to the
                            call target.  They will be lowered according
                            to the specified calling convention and
                            otherwise handled like a normal call
                            instruction.  The number of arguments must
                            exactly match what is specified in '# call
                            args'.  The types must match the signature
                            of 'target'.<br>
                            +<br>
                            +The 'deopt parameters' arguments contain an
                            arbitrary list of Values which is meaningful
                            to the runtime.  The runtime may read any of
                            these values, but is assumed not to modify
                            them.  If the garbage collector might need
                            to modify one of these values, it must also
                            be listed in the 'gc pointer' argument
                            list.  The '# deopt args' field indicates
                            how many operands are to be interpreted as
                            'deopt parameters'.<br>
                            +<br>
                            +The 'gc parameters' arguments contain every
                            pointer to a garbage collector object which
                            potentially needs to be updated by the
                            garbage collector.  Note that the argument
                            list must explicitly contain a base pointer
                            for every derived pointer listed.  The order
                            of arguments is unimportant.  Unlike the
                            other variable length parameter sets, this
                            list is not length prefixed.<br>
                            +<br>
                            +Semantics:<br>
                            +""""""""""<br>
                            +<br>
                            +A statepoint is assumed to read and write
                            all memory.  As a result, memory operations
                            can not be reordered past a statepoint.  It
                            is illegal to mark a statepoint as being
                            either 'readonly' or 'readnone'.<br>
                            +<br>
                            +Note that legal IR can not perform any
                            memory operation on a 'gc pointer' argument
                            of the statepoint in a location statically
                            reachable from the statepoint.  Instead, the
                            explicitly relocated value (from a
                            ''gc_relocate'') must be used.<br>
                            +<br>
                            +'''gc_result''' Intrinsic<br>
                            +^^^^^^^^^^^^^^^^^^^^^^^^^^<br>
                            +<br>
                            +Syntax:<br>
                            +"""""""<br>
                            +<br>
                            +::<br>
                            +<br>
                            +      declare type*<br>
                            +        @gc_result_ptr(i32
                            %statepoint_token)<br>
                            +<br>
                            +      declare fX<br>
                            +        @gc_result_float(i32
                            %statepoint_token)<br>
                            +<br>
                            +      declare iX<br>
                            +        @gc_result_int(i32
                            %statepoint_token)<br>
                            +<br>
                            +Overview:<br>
                            +"""""""""<br>
                            +<br>
                            +'''gc_result''' extracts the result of the
                            original call instruction which was replaced
                            by the '''gc_statepoint'''.  The
                            '''gc_result''' intrinsic is actually a
                            family of three intrinsics due to an
                            implementation limitation.  Other than the
                            type of the return value, the semantics are
                            the same.<br>
                            +<br>
                            +Operands:<br>
                            +"""""""""<br>
                            +<br>
                            +The first and only argument is the
                            '''gc.statepoint''' which starts the
                            safepoint sequence of which this
                            '''gc_result'' is a part.  Despite the
                            typing of this as a generic i32, *only* the
                            value defined by a '''gc.statepoint''' is
                            legal here.<br>
                            +<br>
                            +Semantics:<br>
                            +""""""""""<br>
                            +<br>
                            +The ''gc_result'' represents the return
                            value of the call target of the
                            ''statepoint''.  The type of the
                            ''gc_result'' must exactly match the type of
                            the target.  If the call target returns
                            void, there will be no ''gc_result''.<br>
                            +<br>
                            +A ''gc_result'' is modeled as a 'readnone'
                            pure function.  It has no side effects since
                            it is just a projection of the return value
                            of the previous call represented by the
                            ''gc_statepoint''.<br>
                            +<br>
                            +'''gc_relocate''' Intrinsic<br>
                            +^^^^^^^^^^^^^^^^^^^^^^^^^^^<br>
                            +<br>
                            +Syntax:<br>
                            +"""""""<br>
                            +<br>
                            +::<br>
                            +<br>
                            +      declare <type> addrspace(1)*<br>
                            +        @gc_relocate(i32 %token, i32
                            %base_offset, i32 %pointer_offset)<br>
                            +<br>
                            +Overview:<br>
                            +"""""""""<br>
                            +<br>
                            +A ''gc_relocate'' returns the potentially
                            relocated value of a pointer at the
                            safepoint.<br>
                            +<br>
                            +Operands:<br>
                            +"""""""""<br>
                            +<br>
                            +The first argument is the
                            '''gc.statepoint''' which starts the
                            safepoint sequence of which this
                            '''gc_relocation'' is a part.  Despite the
                            typing of this as a generic i32, *only* the
                            value defined by a '''gc.statepoint''' is
                            legal here.<br>
                            +<br>
                            +The second argument is an index into the
                            statepoints list of arguments which
                            specifies the base pointer for the pointer
                            being relocated.  This index must land
                            within the 'gc parameter' section of the
                            statepoint's argument list.<br>
                            +<br>
                            +The third argument is an index into the
                            statepoint's list of arguments which specify
                            the (potentially) derived pointer being
                            relocated.  It is legal for this index to be
                            the same as the second argument
                            if-and-only-if a base pointer is being
                            relocated. This index must land within the
                            'gc parameter' section of the statepoint's
                            argument list.<br>
                            +<br>
                            +Semantics:<br>
                            +""""""""""<br>
                            +The return value of ''gc_relocate'' is the
                            potentially relocated value of the pointer
                            specified by it's arguments.  It is
                            unspecified how the value of the returned
                            pointer relates to the argument to the
                            ''gc_statepoint'' other than that a) it
                            points to the same source language object
                            with the same offset, and b) the 'based-on'
                            relationship of the newly relocated pointers
                            is a projection of the unrelocated
                            pointers.  In particular, the integer value
                            of the pointer returned is unspecified.<br>
                            +<br>
                            +A ''gc_relocate'' is modeled as a
                            'readnone' pure function.  It has no side
                            effects since it is just a way to extract
                            information about work done during the
                            actual call modeled by the
                            ''gc_statepoint''.<br>
                            +<br>
                            +<br>
                            +StackMap Format<br>
                            +================<br>
                            +<br>
                            +Locations for each pointer value which may
                            need read and/or updated by the runtime or
                            collector are provided via the StackMap
                            format specified in the PatchPoint
                            documentation.<br>
                            +<br>
                            +.. TODO: link<br>
                            +<br>
                            +Each statepoint generates the following
                            Locations:<br>
                            +<br>
                            +* Constant which describes number of
                            following deopt *Locations* (not operands)<br>
                            +* Variable number of Locations, one for
                            each deopt parameter listed in the IR
                            statepoint (same number as described by
                            previous Constant)<br>
                            +* Variable number of Locations pairs, one
                            pair for each unique pointer which needs
                            relocated.  The first Location in each pair
                            describes the base pointer for the object. 
                            The second is the derived pointer actually
                            being relocated.  It is guaranteed that the
                            base pointer must also appear explicitly as
                            a relocation pair if used after the
                            statepoint. There may be fewer pairs then gc
                            parameters in the IR statepoint. Each
                            *unique* pair will occur at least once;
                            duplicates are possible.<br>
                            +<br>
                            +Note that the Locations used in each
                            section may describe the same physical
                            location.  e.g. A stack slot may appear as a
                            deopt location, a gc base pointer, and a gc
                            derived pointer.<br>
                            +<br>
                            +The ID field of the 'StkMapRecord' for a
                            statepoint is meaningless and it's value is
                            explicitly unspecified.<br>
                            +<br>
                            +The LiveOut section of the StkMapRecord
                            will be empty for a statepoint record.<br>
                            +<br>
                            +Safepoint Semantics & Verification<br>
                            +==================================<br>
                            +<br>
                            +The fundamental correctness property for
                            the compiled code's correctness w.r.t. the
                            garbage collector is a dynamic one.  It must
                            be the case that there is no dynamic trace
                            such that a operation involving a
                            potentially relocated pointer is
                            observably-after a safepoint which could
                            relocate it.  'observably-after' is this
                            usage means that an outside observer could
                            observe this sequence of events in a way
                            which precludes the operation being
                            performed before the safepoint.<br>
                            +<br>
                            +To understand why this 'observable-after'
                            property is required, consider a null
                            comparison performed on the original copy of
                            a relocated pointer.  Assuming that control
                            flow follows the safepoint, there is no way
                            to observe externally whether the null
                            comparison is performed before or after the
                            safepoint.  (Remember, the original Value is
                            unmodified by the safepoint.)  The compiler
                            is free to make either scheduling choice.<br>
                            +<br>
                            +The actual correctness property implemented
                            is slightly stronger than this.  We require
                            that there be no *static path* on which a
                            potentially relocated pointer is
                            'observably-after' it may have been
                            relocated.  This is slightly stronger than
                            is strictly necessary (and thus may disallow
                            some otherwise valid programs), but greatly
                            simplifies reasoning about correctness of
                            the compiled code.<br>
                            +<br>
                            +By construction, this property will be
                            upheld by the optimizer if correctly
                            established in the source IR.  This is a key
                            invariant of the design.<br>
                            +<br>
                            +The existing IR Verifier pass has been
                            extended to check most of the local
                            restrictions on the intrinsics mentioned in
                            their respective documentation.  The current
                            implementation in LLVM does not check the
                            key relocation invariant, but this is
                            ongoing work on developing such a verifier. 
                            Please ask on llvmdev if you're interested
                            in experimenting with the current version.<br>
                            +<br>
                            <br>
                            <br>
_______________________________________________<br>
                            llvm-commits mailing list<br>
                            <a moz-do-not-send="true"
                              href="mailto:llvm-commits@cs.uiuc.edu"
                              target="_blank">llvm-commits@cs.uiuc.edu</a><br>
                            <a moz-do-not-send="true"
                              href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits"
                              target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>
                          </blockquote>
                        </div>
                        <br>
                      </div>
                    </blockquote>
                    <br>
                    <br>
                    <fieldset></fieldset>
                    <br>
                    <pre>_______________________________________________
llvm-commits mailing list
<a moz-do-not-send="true" href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a>
<a moz-do-not-send="true" href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a>
</pre>
                  </blockquote>
                  <br>
                </div>
              </div>
            </div>
          </blockquote>
        </div>
        <br>
      </div>
    </blockquote>
    <br>
  </body>
</html>