<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<br>
<div class="moz-cite-prefix">On 02/25/2015 02:14 PM, Sean Silva
wrote:<br>
</div>
<blockquote
cite="mid:CAHnXoanG64E-CCa1aE=kwCQ=yAaJRsUf5Fi4d_AWYz0LtTgFQw@mail.gmail.com"
type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue, Feb 24, 2015 at 5:31 PM,
Philip Reames <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> On the queue for
tomorrow.<br>
<br>
Other things which need to happen:<br>
- Move intrinsic definitions into LangRef<br>
</div>
</blockquote>
<div><br>
</div>
<div>Please don't do this until they are non-experimental
(or have they become non-experimental?). Actually, I don't
think this would be the first time we've promoted
instrinsics to non-experimental, and I'm not sure how to
handle it best (what to move and what to copy, etc.).</div>
</div>
</div>
</div>
</blockquote>
They are still experimental at this point. I'll abide by your
wishes as that seems to fit what's in LangRef now. I will add a
section to the LangRef linking to the Statepoint docs analogous for
what is done for patchpoints. <br>
<br>
<br>
<blockquote
cite="mid:CAHnXoanG64E-CCa1aE=kwCQ=yAaJRsUf5Fi4d_AWYz0LtTgFQw@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> - Flesh out a
description of the "statepoint-example" GC.<br>
- Document the fact there's no a form of statepoint
sequence without explicitly relocations, update code
with asserts & flags respectively<br>
<br>
I'm considering just removing the Statepoints page
entirely and merging the content into
GarbageCollection. I probably wont actually go ahead
with that just yet.<br>
</div>
</blockquote>
<div><br>
</div>
<div>If you do this, please leave the Statepoints.rst page
empty with a link to GarbageCollection.rst; that way,
links to Statepoints across the net don't break. The real
solution is for the server to serve an http redirect, but
we don't have a way of indicating that currently.</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> <br>
I also need a place to transcribe my private TODO list
somewhere public. The docs probably aren't the right
place for this though.
<div>
<div class="h5"><br>
<br>
<div>On 02/24/2015 04:56 PM, Sean Silva wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">There are a couple todo/"put
assembly here" in the file currently. It would
be nice to flesh those out.</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue, Feb 24, 2015 at
4:24 PM, Philip Reames <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:listmail@philipreames.com"
target="_blank">listmail@philipreames.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
Fixed. Other comments welcome.
<div>
<div><br>
<br>
<div>On 02/24/2015 02:44 PM, Philip
Reames wrote:<br>
</div>
<blockquote type="cite"> Your timing
is good. I'm working on docs today
and should get to this by end of
day. :)<br>
<br>
Philip<br>
<br>
<div>On 02/24/2015 02:37 PM, Sean
Silva wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Necro-nit (wasn't
sure where to post this
feedback; I realize that this
has been slightly updated in
ToT): please update the
prototypes here to match their
current definitions (e.g.
`llvm.experimental.` prefix).
<div><br>
</div>
<div>(sorry for the delay in
getting to this)</div>
<div><br>
</div>
<div>-- Sean Silva</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Tue,
Dec 2, 2014 at 11:37 AM,
Philip Reames <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:listmail@philipreames.com"
target="_blank">listmail@philipreames.com</a>></span>
wrote:<br>
<blockquote
class="gmail_quote"
style="margin:0 0 0
.8ex;border-left:1px #ccc
solid;padding-left:1ex">Author:
reames<br>
Date: Tue Dec 2 13:37:00
2014<br>
New Revision: 223143<br>
<br>
URL: <a
moz-do-not-send="true"
href="http://llvm.org/viewvc/llvm-project?rev=223143&view=rev"
target="_blank">http://llvm.org/viewvc/llvm-project?rev=223143&view=rev</a><br>
Log:<br>
[Statepoints 4/4] Statepoint
infrastructure for garbage
collection: Documentation<br>
<br>
This is the fourth and final
patch in the statepoint
series. It contains the
documentation for the
statepoint intrinsics and
their usage.<br>
<br>
There's definitely still
room to improve the
documentation here, but I
wanted to get this landed so
it was available for
others. There will likely
be a series of small cleanup
changes over the next few
weeks as we work to clarify
and revise the
documentation. If you have
comments or questions,
please feel free to discuss
them either in this commit
thread, the original review
thread, or on llvmdev.
Comments are more than
welcome.<br>
<br>
Reviewed by: atrick,
ributzka<br>
Differential Revision: <a
moz-do-not-send="true"
href="http://reviews.llvm.org/D5683"
target="_blank">http://reviews.llvm.org/D5683</a><br>
<br>
<br>
<br>
Added:<br>
llvm/trunk/docs/Statepoints.rst<br>
<br>
Added:
llvm/trunk/docs/Statepoints.rst<br>
URL: <a
moz-do-not-send="true"
href="http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/Statepoints.rst?rev=223143&view=auto"
target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/Statepoints.rst?rev=223143&view=auto</a><br>
==============================================================================<br>
---
llvm/trunk/docs/Statepoints.rst
(added)<br>
+++
llvm/trunk/docs/Statepoints.rst
Tue Dec 2 13:37:00 2014<br>
@@ -0,0 +1,209 @@<br>
+=====================================<br>
+Garbage Collection
Safepoints in LLVM<br>
+=====================================<br>
+<br>
+.. contents::<br>
+ :local:<br>
+ :depth: 2<br>
+<br>
+Status<br>
+=======<br>
+<br>
+This document describes a
set of experimental
extensions to LLVM. Use with
caution. Because the
intrinsics have experimental
status, compatibility across
LLVM releases is not
guaranteed.<br>
+<br>
+LLVM currently supports an
alternate mechanism for
conservative garbage
collection support using the
gc_root intrinsic. The
mechanism described here
shares little in common with
the alternate implementation
and it is hoped that this
mechanism will eventually
replace the gc_root
mechanism.<br>
+<br>
+Overview<br>
+========<br>
+<br>
+To collect dead objects,
garbage collectors must be
able to identify any
references to objects
contained within executing
code, and, depending on the
collector, potentially
update them. The collector
does not need this
information at all points in
code - that would make the
problem much harder - but
only at well defined points
in the execution known as
'safepoints' For a most
collectors, it is sufficient
to track at least one copy
of each unique pointer
value. However, for a
collector which wishes to
relocate objects directly
reachable from running code,
a higher standard is
required.<br>
+<br>
+One additional challenge is
that the compiler may
compute intermediate results
("derived pointers") which
point outside of the
allocation or even into the
middle of another
allocation. The eventual
use of this intermediate
value must yield an address
within the bounds of the
allocation, but such
"exterior derived pointers"
may be visible to the
collector. Given this, a
garbage collector can not
safely rely on the runtime
value of an address to
indicate the object it is
associated with. If the
garbage collector wishes to
move any object, the
compiler must provide a
mapping for each pointer to
an indication of its
allocation.<br>
+<br>
+To simplify the interaction
between a collector and the
compiled code, most garbage
collectors are organized in
terms of two three
abstractions: load barriers,
store barriers, and
safepoints.<br>
+<br>
+#. A load barrier is a bit
of code executed immediately
after the machine load
instruction, but before any
use of the value loaded.
Depending on the collector,
such a barrier may be needed
for all loads, merely loads
of a particular type (in the
original source language),
or none at all.<br>
+#. Analogously, a store
barrier is a code fragement
that runs immediately before
the machine store
instruction, but after the
computation of the value
stored. The most common use
of a store barrier is to
update a 'card table' in a
generational garbage
collector.<br>
+<br>
+#. A safepoint is a
location at which pointers
visible to the compiled code
(i.e. currently in registers
or on the stack) are allowed
to change. After the
safepoint completes, the
actual pointer value may
differ, but the 'object' (as
seen by the source language)
pointed to will not.<br>
+<br>
+ Note that the term
'safepoint' is somewhat
overloaded. It refers to
both the location at which
the machine state is
parsable and the
coordination protocol
involved in bring
application threads to a
point at which the collector
can safely use that
information. The term
"statepoint" as used in this
document refers exclusively
to the former.<br>
+<br>
+This document focuses on
the last item - compiler
support for safepoints in
generated code. We will
assume that an outside
mechanism has decided where
to place safepoints. From
our perspective, all
safepoints will be function
calls. To support
relocation of objects
directly reachable from
values in compiled code, the
collector must be able to:<br>
+<br>
+#. identify every copy of a
pointer (including copies
introduced by the compiler
itself) at the safepoint,<br>
+#. identify which object
each pointer relates to, and<br>
+#. potentially update each
of those copies.<br>
+<br>
+This document describes the
mechanism by which an LLVM
based compiler can provide
this information to a
language runtime/collector
and ensure that all pointers
can be read and updated if
desired. The heart of the
approach is to construct (or
rewrite) the IR in a manner
where the possible updates
performed by the garbage
collector are explicitly
visible in the IR. Doing so
requires that we:<br>
+<br>
+#. create a new SSA value
for each potentially
relocated pointer, and
ensure that no uses of the
original (non relocated)
value is reachable after the
safepoint,<br>
+#. specify the relocation
in a way which is opaque to
the compiler to ensure that
the optimizer can not
introduce new uses of an
unrelocated value after a
statepoint. This prevents
the optimizer from
performing unsound
optimizations.<br>
+#. recording a mapping of
live pointers (and the
allocation they're
associated with) for each
statepoint.<br>
+<br>
+At the most abstract level,
inserting a safepoint can be
thought of as replacing a
call instruction with a call
to a multiple return value
function which both calls
the original target of the
call, returns it's result,
and returns updated values
for any live pointers to
garbage collected objects.<br>
+<br>
+ Note that the task of
identifying all live
pointers to garbage
collected values,
transforming the IR to
expose a pointer giving the
base object for every such
live pointer, and inserting
all the intrinsics correctly
is explicitly out of scope
for this document. The
recommended approach is
described in the section of
Late Safepoint Placement
below.<br>
+<br>
+This abstract function call
is concretely represented by
a sequence of intrinsic
calls known as a 'statepoint
sequence'.<br>
+<br>
+<br>
+Let's consider a simple
call in LLVM IR:<br>
+ todo<br>
+<br>
+Depending on our language
we may need to allow a
safepoint during the
execution of the function
called from this site. If
so, we need to let the
collector update local
values in the current frame.<br>
+<br>
+Let's say we need to
relocate SSA values 'a',
'b', and 'c' at this
safepoint. To represent
this, we would generate the
statepoint sequence::<br>
+ put an example sequence
here<br>
+<br>
+Ideally, this sequence
would have been represented
as a M argument, N return
value function (where M is
the number of values being
relocated + the original
call arguments and N is the
original return value + each
relocated value), but LLVM
does not easily support such
a representation.<br>
+<br>
+Instead, the statepoint
intrinsic marks the actual
site of the safepoint or
statepoint. The statepoint
returns a token value (which
exists only at compile
time). To get back the
original return value of the
call, we use the 'gc_result'
intrinsic. To get the
relocation of each pointer
in turn, we use the
'gc_relocate' intrinsic with
the appropriate index. Note
that both the gc_relocate
and gc_result are tied to
the statepoint. The
combination forms a
"statepoint sequence" and
represents the entitety of a
parseable call or
'statepoint'.<br>
+<br>
+When lowered, this example
would generate the following
x86 assembly::<br>
+ put assembly here<br>
+<br>
+Each of the potentially
relocated values has been
spilled to the stack, and a
record of that location has
been recorded to the
StackMap section. If the
garbage collector needs to
update any of these pointers
during the call, it knows
exactly what to change.<br>
+<br>
+Intrinsics<br>
+===========<br>
+<br>
+'''gc_statepoint'''
Intrinsic<br>
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^<br>
+<br>
+Syntax:<br>
+"""""""<br>
+<br>
+::<br>
+<br>
+ declare i32<br>
+
@gc_statepoint(func_type
<target>, i64
<#call args>.<br>
+ i64
<unused>, ... (call
parameters),<br>
+ i64
<# deopt args>, ...
(deopt parameters),<br>
+ ...
(gc parameters))<br>
+<br>
+Overview:<br>
+"""""""""<br>
+<br>
+The statepoint intrinsic
represents a call which is
parse-able by the runtime.<br>
+<br>
+Operands:<br>
+"""""""""<br>
+<br>
+The 'target' operand is the
function actually being
called. The target can be
specified as either a
symbolic LLVM funciton, or
as an arbitrary Value of
appropriate function type.
Note that the function type
must match the signature of
the callee and the types of
the 'call parameters'
arguments.<br>
+<br>
+The '#call args' operand is
the number of arguments to
the actual call. It must
exactly match the number of
arguments passed in the
'call parameters' variable
length section.<br>
+<br>
+The 'unused' operand is
unused and likely to be
removed. Please do not use.<br>
+<br>
+The 'call parameters'
arguments are simply the
arguments which need to be
passed to the call target.
They will be lowered
according to the specified
calling convention and
otherwise handled like a
normal call instruction.
The number of arguments must
exactly match what is
specified in '# call args'.
The types must match the
signature of 'target'.<br>
+<br>
+The 'deopt parameters'
arguments contain an
arbitrary list of Values
which is meaningful to the
runtime. The runtime may
read any of these values,
but is assumed not to modify
them. If the garbage
collector might need to
modify one of these values,
it must also be listed in
the 'gc pointer' argument
list. The '# deopt args'
field indicates how many
operands are to be
interpreted as 'deopt
parameters'.<br>
+<br>
+The 'gc parameters'
arguments contain every
pointer to a garbage
collector object which
potentially needs to be
updated by the garbage
collector. Note that the
argument list must
explicitly contain a base
pointer for every derived
pointer listed. The order
of arguments is
unimportant. Unlike the
other variable length
parameter sets, this list is
not length prefixed.<br>
+<br>
+Semantics:<br>
+""""""""""<br>
+<br>
+A statepoint is assumed to
read and write all memory.
As a result, memory
operations can not be
reordered past a
statepoint. It is illegal
to mark a statepoint as
being either 'readonly' or
'readnone'.<br>
+<br>
+Note that legal IR can not
perform any memory operation
on a 'gc pointer' argument
of the statepoint in a
location statically
reachable from the
statepoint. Instead, the
explicitly relocated value
(from a ''gc_relocate'')
must be used.<br>
+<br>
+'''gc_result''' Intrinsic<br>
+^^^^^^^^^^^^^^^^^^^^^^^^^^<br>
+<br>
+Syntax:<br>
+"""""""<br>
+<br>
+::<br>
+<br>
+ declare type*<br>
+ @gc_result_ptr(i32
%statepoint_token)<br>
+<br>
+ declare fX<br>
+
@gc_result_float(i32
%statepoint_token)<br>
+<br>
+ declare iX<br>
+ @gc_result_int(i32
%statepoint_token)<br>
+<br>
+Overview:<br>
+"""""""""<br>
+<br>
+'''gc_result''' extracts
the result of the original
call instruction which was
replaced by the
'''gc_statepoint'''. The
'''gc_result''' intrinsic is
actually a family of three
intrinsics due to an
implementation limitation.
Other than the type of the
return value, the semantics
are the same.<br>
+<br>
+Operands:<br>
+"""""""""<br>
+<br>
+The first and only argument
is the '''gc.statepoint'''
which starts the safepoint
sequence of which this
'''gc_result'' is a part.
Despite the typing of this
as a generic i32, *only* the
value defined by a
'''gc.statepoint''' is legal
here.<br>
+<br>
+Semantics:<br>
+""""""""""<br>
+<br>
+The ''gc_result''
represents the return value
of the call target of the
''statepoint''. The type of
the ''gc_result'' must
exactly match the type of
the target. If the call
target returns void, there
will be no ''gc_result''.<br>
+<br>
+A ''gc_result'' is modeled
as a 'readnone' pure
function. It has no side
effects since it is just a
projection of the return
value of the previous call
represented by the
''gc_statepoint''.<br>
+<br>
+'''gc_relocate''' Intrinsic<br>
+^^^^^^^^^^^^^^^^^^^^^^^^^^^<br>
+<br>
+Syntax:<br>
+"""""""<br>
+<br>
+::<br>
+<br>
+ declare <type>
addrspace(1)*<br>
+ @gc_relocate(i32
%token, i32 %base_offset,
i32 %pointer_offset)<br>
+<br>
+Overview:<br>
+"""""""""<br>
+<br>
+A ''gc_relocate'' returns
the potentially relocated
value of a pointer at the
safepoint.<br>
+<br>
+Operands:<br>
+"""""""""<br>
+<br>
+The first argument is the
'''gc.statepoint''' which
starts the safepoint
sequence of which this
'''gc_relocation'' is a
part. Despite the typing of
this as a generic i32,
*only* the value defined by
a '''gc.statepoint''' is
legal here.<br>
+<br>
+The second argument is an
index into the statepoints
list of arguments which
specifies the base pointer
for the pointer being
relocated. This index must
land within the 'gc
parameter' section of the
statepoint's argument list.<br>
+<br>
+The third argument is an
index into the statepoint's
list of arguments which
specify the (potentially)
derived pointer being
relocated. It is legal for
this index to be the same as
the second argument
if-and-only-if a base
pointer is being relocated.
This index must land within
the 'gc parameter' section
of the statepoint's argument
list.<br>
+<br>
+Semantics:<br>
+""""""""""<br>
+The return value of
''gc_relocate'' is the
potentially relocated value
of the pointer specified by
it's arguments. It is
unspecified how the value of
the returned pointer relates
to the argument to the
''gc_statepoint'' other than
that a) it points to the
same source language object
with the same offset, and b)
the 'based-on' relationship
of the newly relocated
pointers is a projection of
the unrelocated pointers.
In particular, the integer
value of the pointer
returned is unspecified.<br>
+<br>
+A ''gc_relocate'' is
modeled as a 'readnone' pure
function. It has no side
effects since it is just a
way to extract information
about work done during the
actual call modeled by the
''gc_statepoint''.<br>
+<br>
+<br>
+StackMap Format<br>
+================<br>
+<br>
+Locations for each pointer
value which may need read
and/or updated by the
runtime or collector are
provided via the StackMap
format specified in the
PatchPoint documentation.<br>
+<br>
+.. TODO: link<br>
+<br>
+Each statepoint generates
the following Locations:<br>
+<br>
+* Constant which describes
number of following deopt
*Locations* (not operands)<br>
+* Variable number of
Locations, one for each
deopt parameter listed in
the IR statepoint (same
number as described by
previous Constant)<br>
+* Variable number of
Locations pairs, one pair
for each unique pointer
which needs relocated. The
first Location in each pair
describes the base pointer
for the object. The second
is the derived pointer
actually being relocated.
It is guaranteed that the
base pointer must also
appear explicitly as a
relocation pair if used
after the statepoint. There
may be fewer pairs then gc
parameters in the IR
statepoint. Each *unique*
pair will occur at least
once; duplicates are
possible.<br>
+<br>
+Note that the Locations
used in each section may
describe the same physical
location. e.g. A stack slot
may appear as a deopt
location, a gc base pointer,
and a gc derived pointer.<br>
+<br>
+The ID field of the
'StkMapRecord' for a
statepoint is meaningless
and it's value is explicitly
unspecified.<br>
+<br>
+The LiveOut section of the
StkMapRecord will be empty
for a statepoint record.<br>
+<br>
+Safepoint Semantics &
Verification<br>
+==================================<br>
+<br>
+The fundamental correctness
property for the compiled
code's correctness w.r.t.
the garbage collector is a
dynamic one. It must be the
case that there is no
dynamic trace such that a
operation involving a
potentially relocated
pointer is observably-after
a safepoint which could
relocate it.
'observably-after' is this
usage means that an outside
observer could observe this
sequence of events in a way
which precludes the
operation being performed
before the safepoint.<br>
+<br>
+To understand why this
'observable-after' property
is required, consider a null
comparison performed on the
original copy of a relocated
pointer. Assuming that
control flow follows the
safepoint, there is no way
to observe externally
whether the null comparison
is performed before or after
the safepoint. (Remember,
the original Value is
unmodified by the
safepoint.) The compiler is
free to make either
scheduling choice.<br>
+<br>
+The actual correctness
property implemented is
slightly stronger than
this. We require that there
be no *static path* on which
a potentially relocated
pointer is
'observably-after' it may
have been relocated. This
is slightly stronger than is
strictly necessary (and thus
may disallow some otherwise
valid programs), but greatly
simplifies reasoning about
correctness of the compiled
code.<br>
+<br>
+By construction, this
property will be upheld by
the optimizer if correctly
established in the source
IR. This is a key invariant
of the design.<br>
+<br>
+The existing IR Verifier
pass has been extended to
check most of the local
restrictions on the
intrinsics mentioned in
their respective
documentation. The current
implementation in LLVM does
not check the key relocation
invariant, but this is
ongoing work on developing
such a verifier. Please ask
on llvmdev if you're
interested in experimenting
with the current version.<br>
+<br>
<br>
<br>
_______________________________________________<br>
llvm-commits mailing list<br>
<a moz-do-not-send="true"
href="mailto:llvm-commits@cs.uiuc.edu"
target="_blank">llvm-commits@cs.uiuc.edu</a><br>
<a moz-do-not-send="true"
href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits"
target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
<br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
llvm-commits mailing list
<a moz-do-not-send="true" href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a>
<a moz-do-not-send="true" href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a>
</pre>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</div>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote>
<br>
</body>
</html>