<div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Oct 25, 2013 at 8:35 PM, Philip Reames <span dir="ltr"><<a href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><div class="im">
<div>On 10/25/13 1:10 PM, Ben Karel wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr"><br>
<div class="gmail_extra"><br>
<br>
<div class="gmail_quote">On Thu, Oct 24, 2013 at 6:42 PM,
Sanjoy Das <span dir="ltr"><<a href="mailto:sanjoy@azulsystems.com" target="_blank">sanjoy@azulsystems.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Hi
Rafael, Andrew,<br>
<br>
Thank you for the prompt reply.<br>
<br>
One approach we've been considering involves representing
the<br>
constraint "pointers to heap objects are invalidated at
every<br>
safepoint" somehow in the IR itself. So, if %a and %b are
values the<br>
GC is interested in, the safepoint at the back-edge of a
loop might<br>
look like:<br>
<br>
; <label>: body<br>
%a = phi i32 [ %a.relocated, %body ] [
%a.initial_value, %pred ]<br>
%b = phi i32 [ %b.relocated, %body ] [
%b.initial_value, %pred ]<br>
;; Use %a and %b<br>
<br>
;; The safepoint starts here<br>
%a.relocated = @llvm.gc_relocate(%a)<br>
%b.relocated = @llvm.gc_relocate(%b)<br>
br %body<br>
<br>
This allows us to not bother with relocating derived
pointers pointing<br>
inside %a and %b, since it is semantically incorrect for
llvm to reuse<br>
them in the next iteration of the loop. </blockquote>
<div><br>
</div>
<div>This is the right general idea, but you can already
express this constraint in LLVM as it exists today, by
using llvm.gcroot(). As you noted, this also solves the
interior-pointer problem by making it the front end's job
to convey to LLVM when it would/would not be safe to cache
interior pointers across loop iterations. The invariant
that a front-end must maintain is that any pointer which
is live across a given potential-GC-point must be reloaded
from its root after a (relocating) GC might have occurred.
This falls naturally out of the viewpoint that %a is not
an object pointer, it's the name of an object pointer's
value at a given point in time. So, of course, whenever
that object pointer's value might change, there must be a
new name.</div>
</div>
</div>
</div>
</blockquote></div>
To rephrase, you're saying that the current gcroot mechanism is
analogous to a boxed object pointer. We could model a safepoint by
storing the unboxed value into the box, applying an opaque operation
to the box, and reloading the raw object pointer from the box. (The
opaque operator is key to prevent the store/load pair from being
removed. It also implements the actual safepoint operation.) Is
this a correct restatement?<br></div></blockquote><div><br></div><div>What does it mean to model a safepoint?</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
Here's an example:<br>
gcroot a = ...; b = ...;<br>
object* ax = *a, bx = *b;<br>
repeat 500000 iterations { <br>
spin for 50 instructions<br>
*a = ax;<br>
*b = bx;<br>
safepoint(a, b)<br>
ax = *a;<br>
bx = *b;<br>
} <br>
<br>
If so, I would agree with you that this is a correct encoding of
object relocation. I think what got me confused was using the
in-tree examples to reverse engineer the semantics. Both the Erlang
and Ocaml examples insert safepoints after all the LLVM IR passes
are run and derived pointers were inserted. This was the part that
got me thinking that the gcroot semantics were unsuitable for a
precise relocating collector. If you completely ignored these
implementations and inserted the full box/safepoint call/unbox
implementation at each safepoint before invoking any optimizations,
I think this would be correct.<br>
<br>
One problem with this encoding is that there does not currently
exist an obvious mechanism to describe a safepoint in the IR
itself. (i.e. there is no llvm.gcsafepoint) You could model this
with a "well known" function which a custom GCStrategy recorded a
stackmap on every call. It would be good to extend the existing
intrinsics with such a mechanism. <br>
<br>
At least if I'm reading things right, the current *implementation*
is not a correct implementation of the intended semantics. In
particular, the safepoint operation which provides the opaque
manipulation of the boxed gcroots is not preserved into the
SelectionDAG. As a result, optimizations on the SelectionDAG could
eliminate the now adjacent box/unbox pairs. This would break a
relocating collector. <br></div></blockquote><div><br></div><div>Since GC safepoints aren't explicitly represented at the IR level, either, SelectionDAG is a red herring: regular IR-level optimizations need the same scrutiny. But safepoints won't be inserted arbitrarily. Because there are only four places where safe points can occur (pre/post call, loop, return), there are essentially only two optimizations that might break a relocating collector: a call being reordered with the stores and loads surrounding it, or constant propagation of an alloca's contents across a loop backedge without eliminating the backedge (iff the GC plugin requests loop safepoints). Neither of these are performed by LLVM, AFAICT, which implies that the implementation is not broken in the way you suggested.</div>
<div><br></div><div>But if you can prove me wrong by producing an example that LLVM optimizations break, that would be cool :-)</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
Leaving this aside, there are also a number of performance problems
with the current implementation. I'll summarize them here for now,
but will expand into approaches for resolving each if this looks
like the most likely implementation strategy. <br>
1) Explicitly adding box/safepoint/unbox prevents creation of
derived pointers. This prevents optimizations such as
loop-blocking. Ideally, we'd rather let the derived pointers be
created and capture them for updating at the safepoint. <br>
2) The redundant loads and stores required for box/unbox. (These
might be partially eliminated by other optimization passes.)<br>
3) The inability to allocate GC roots to registers. <br>
4) Frame sizes are implicitly doubled since both an unboxed and
boxed representation of every object pointer is required. <br>
<br>
Frankly, I think that (3) is likely solvable, but that (1) and (4)
could be fatal for a high performance implementation. I don't have
hard numbers on that; I'm simply stating my intuition. <br><div class="im">
<br>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div><br>
</div>
<div>The fact that the mutable memory associated with a
gcroot() is allocated on the stack (rather than, say, a
machine register) is an implementation detail; fixing it
doesn't require altering the (conceptual) interface for
LLVM's existing GC support, AFAICT.</div>
</div>
</div>
</div>
</blockquote></div>
While in principal, I agree with you here, in practice the
difference is actually quite significant. I am not convinced that
the implementation would be straight forward. Let's set this aside
for the moment and focus on the more fundamental questions. <br><div class="im">
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">
<div> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">We
lower gc_relocate to a<br>
pseudo opcode which lowered into nothing after register
allocation.<br>
<br>
The problem is, of course, the performance penalty. Does
it make<br>
sense to get the register allocator "see" the gc_relocate
instruction<br>
as a copy so that they get the same register / slot? Will
that<br>
violate the intended semantics of gc_relocate (for
example, after<br>
assigning the same register / slot to %a and %a.relocated,
are there<br>
passes that will try to cache derived pointers across loop<br>
iterations)?<br>
<br>
Thanks,<br>
-- Sanjoy<br>
<div>
<div><br>
<br>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a>
<a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
</div>
</div>
</blockquote>
</div>
<br>
</div>
</div>
<br>
<fieldset></fieldset>
<br>
<pre>_______________________________________________
LLVM Developers mailing list
<a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a>
</pre>
</blockquote>
<br>
</div></div>
</blockquote></div><br></div></div>