<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<br>
<div class="moz-cite-prefix">On 10/16/2014 02:51 PM, Philip Reames
wrote:<br>
</div>
<blockquote cite="mid:54403DFE.5010807@philipreames.com" type="cite">
<br>
On 10/15/2014 02:52 PM, Philip Reames wrote:
<br>
<blockquote type="cite">Kevin,
<br>
<br>
Let me try to answer the point you're getting at. In doing so,
I want to explicitly separate the statepoint intrinsics which
are currently up for review, and the future late safepoint
placement. The statepoint intrinsics have value separate from
the late safepoint placement approach, and I want to justify
them on their own merits.
<br>
<br>
The basic problem we're trying to solve with these intrinsics is
supporting fully relocating collectors. By definition, such a
collector needs to be precise w.r.t. root tracking. Even worse,
we need to ensure that *all copies* of a pointer are updated.
It is not acceptable to make two copies of a pointer, update one
of them, then use the other for a memory access.
<br>
<br>
If the compiler is allowed to introduce derived pointers (i.e.
pointer valued temporaries created by the compiler which point
somewhere within an object, or outside it, but associated with
it), we also need to track which *object* each *pointer* to be
updated is associated with. This is required to safely update
the pointers.
<br>
<br>
For the sake of argument, let's say our frontend does safepoint
insertion.
<br>
<br>
There's a couple of approaches which seem like they might work,
let's explore each in turn:
<br>
- We could use patchpoints to record all the values needed for
the GC stack map. This mostly works, but requires that the
patchpoint not be marked readonly or readnone (to prevent
illegal reorderings). That could be a usage convention. The
real problem is that the compiler is still free to introduce
multiple *copies* of an SSA value over the patchpoint. (This is
completely legal under SSA semantics.) When it does so, it
creates a situation where the gc could fail to update a pointer
which will then be dereferenced. That's a bug. Worth stating
explicitly, I believe the patchpoint scheme would be sufficient
*if you do not every relocate a root*.
<br>
- We could use the gc.root. gc.root defines the allocs, but
does not define the call format, or any of the mechanisms to
ensure proper relocation. As such, it *by itself* is not
viable. Also, gc.root inherently assumes every value will have
a stack slot. Without *heavy* reengineering, there's no way to
have a gc pointer in a callee saved register over a call site.
This is an unfortunate limitation. Any call representation
without explicit relocation suffers from the same bug as the
patchpoint scheme.
<br>
- We could combine gc.root allocas and patchpoints. This
essentially combines the flaws (no gc pointers in callee saved
registers over calls, and missed copies), with no benefit.
<br>
<br>
The statepoint intrinsics are basically the patchpoint option
above, but with relocation made explicit in the IR. While it's
still legal for the optimizer to create a copy of the value
feeding a statepoint, that's now okay. By construction, there
can be no use of the original SSA value (and thus the copy)
after the statepoint. Instead, the explicitly relocated value is
used.
<br>
<br>
To summarize: We need (something like) statepoints for
correctness of fully relocating collectors.
<br>
<br>
(The points I'm making here are somewhat subtle. If it would
help to have IR examples here, ask. I'm deferring writing them
because it's time consuming.)
<br>
</blockquote>
I need to withdraw this part of my comments. After further
reflection and discussion offline, I was reminded that you can
implement full relocation semantics with gcroot. The parts about
patchpoints stands, but the gcroot comments are inaccurate.
<br>
<br>
I need to leave early today, but I plan to respond tomorrow with a
more complete analysis of the tradeoffs between gcroots and
statepoints. Sorry for the confusion.
<br>
</blockquote>
Ok, let's take a second try at explaining the differences between
statepoints and gc.roots. I managed to get myself confused last
time and made a couple of statements which were inaccurate. As a
reminder, this is not talking about late safepoint placement at
all. LSP can work with either mechanism. <br>
<br>
From a functional correctness standpoint, gc.root and statepoint are
equivalent. They can both support relocating collectors, including
those which relocate roots. To prevent future confusion, let me
review how each works. <br>
<br>
gc.root uses explicit spill slots in the IR in the form of allocas.
Each alloca escapes (through the gcroot call itself); as a result,
the compiler must assume that any readwrite call can both consume
and update the values in question. Additionally, the fact that all
calls are readwrite prevents reordering of unrelated loads past the
call. gcroot relies on the fact that no SSA value relocated at a
call site is used at a site reachable from the call. Instead, a new
SSA value (whose relation to the original is unknown by the
compiler) is introduced by loading from the (potentially clobbered)
alloca. gcroot creates a single stack map table for the entire
function. It is the compiled code's responsibility to ensure that
all values in the allocas are either valid live pointers or null. <br>
<br>
Statepoints use most of the same techniques. We rely on not having
an SSA value used on both sides of a call, but we manage the
relocation via explicit IR relocation operations, not loads and
stores. We require the call to be read/write to prevent reordering
of unrelated loads. Since the spill slots are not visible in the
IR, we do not need the reasoning about escapes that gc.root does. <br>
<br>
To explicitly state this again since I screwed this up once before,
both statepoints and gc.roots can correctly represent relocation
semantics in the IR. In fact, the underlying reasoning about their
correctness are rather similar. <br>
<br>
They do differ fairly substantially in the details though. Let's
consider a few examples.<br>
<br>
<b>SSA vs Memory</b> - gcroot encodes relocations as memory
operations (stores, clobbering calls, loads) where statepoint uses
first class SSA values. We believe this makes optimizations more
straightforward.<br>
<br>
Consider a simple optimization for null pointer relocation. If the
optimizer manages to establish that one of the value being relocated
is null, propagating this across a statepoint is straightforward.
(For each gc.relocate, if source is null, replaceAllUsesWith null.)
Implementing this same optimization for gc.root is harder since the
store and load may have been reordered from immediately around the
call. This isn't an unsolvable problem by any means, but it would
be a GVN change, not an InstCombine one. In practice, we believe
InstCombine style optimizations to be advantageous since they're
simpler to write and debug. Arguably, they're also more powerful
given the current pipeline since they have multiple opportunities to
trigger.<br>
<b><br>
</b><b>Derived Pointers</b> - gcroot can represent derived pointers,
but only via convention. There is no convention specified, so it's
up to the frontend to create it's own. Statepoints define a
convention (explicitly in the relocation operation) which makes
describing optimizations straight forward.<br>
<br>
One thing we plan to do with the statepoint representation is to
implement an "easily derived pointer" optimization (to run near
CodeGenPrep). On X86, it's far cheaper to recreate a GEP base + 5
derived pointer than relocate it. Recognizing this case is quite
straight forward given the statepoint representation.<br>
<br>
A frontend could implement a similar optimization for gcroot at IR
generation time. You could also implement such an optimization over
the load/call/store representation, but the implementation would be
much more complex (analogous to the null optimization above). <br>
<br>
To be fair, gc.root may need such an optimization less. Since
call-safepoints are inserted early, CSE has not yet run. As a
result, there may be fewer "easily derived pointers" live across a
call. <br>
<br>
<b>Format</b> - Statepoints use a standard format. gc.root supports
custom formats. Either could be extended to support the other
without much difficulty. <br>
<br>
The more material difference between the two is that gc.root
generates a single stack map for the entire function while
statepoints generate a unique stack map per call site. Having a
single stack map imposes a slight penalty on code compiled with
gc.root since dead values must explicitly be removed from the alloca
(by a write of null). In the wrong situation (say a tight loop with
two calls), this could be material. <br>
<br>
<b>Lowering </b>- Currently, both gc.root and statepoint lower to
stack slots. gc.root does this at the IR level, statepoints does so
in SelectionDAG. <br>
<br>
The design of statepoints is intended to allow pushing the explicit
relocations back through the backend. The reason this is desirable
is that pointers can be left in callee saved registers over call
sites. Without substantial re-engineering, such a thing is not
possible for gc.root. The importance of this from a performance
perspective is debatable. It is my belief that the key benefit
would be in a) reducing frame sizes (by not requiring spill slots),
and b) avoiding spills around calls.<br>
<br>
An advantage of gc.root is that the backend can remain largely
ignorant of the gc.root mechanism. By the point the backend
encounters them, a gc.root is just another alloca. One potential
problem with the current implementation is that the escape is lost
when lowering; the gcroot call is lowered to an entry into a side
table and the alloca no longer escapes. This is a source of
possible bugs, but is also a straightforward fix. <br>
<br>
As to the lowering currently implemented, it's debatable which is
better. Statepoints optimize constants, and unifies based on
SDValue. As a result, two IR level values of different types (with
the same bit pattern) can end up sharing the same stackslot.
However, it suffers when trying to assign stack slots. We currently
use heuristics, but you can end up with ugly shuffling of values
around on the stack across basic blocks. (There's a number of ways
to improve that, but it's not yet implemented.) gc.root doesn't
suffer from this problem since stack slots are assigned by the
frontend. <br>
<br>
Since the stack spills and reloads are visible at the IR layer,
gcroot gets the full ability of the optimizer to remove redundant
reloads. Statepoints only get to leverage the pieces in the
backend. In theory, this could result in materially worse
spill/reload code for statepoints. In practice, this appears not to
matter much provided the same value is assigned to the same slot
across both calls, but I don't actually have much data here to say
anything conclusively yet.<br>
<br>
I haven't tried to measure frame size for gc.root vs statepoints. I
suspect that statepoints may come out slightly ahead, but I doubt
this is material. There are also cases (see "easily derived
pointers" above), where gc.root may come out ahead. <br>
<br>
<b>IR Level Optimization</b> - Both gc.root and statepoints cripple
optimization (by design!). gcroot works better with inlining today,
but statepoints could be easily enhanced to handle this case. (The
same work would benefit symbolic patchpoints.) <br>
<br>
It is my belief that statepoints are easier to optimize (i.e. teach
to LICM), but this is purely my guess with no real evidence. Both
suffer from the fact that calls must be marked readwrite. Not
having to reason about memory seems easier, but I'm open to other
arguments here. <br>
<br>
<b>Community Support</b><b> & Compatibility</b><br>
From a practical perspective, statepoints have active users behind
them. We are interested in continuing to enhance and optimize them
in the public tree. The same support does not seem to exist for
gcroot. <br>
<br>
The implementation of statepoints is largely aligned with that of
patchpoints. The implementation of gcroot is completely separate
and poorly understood by the majority of the community.<br>
<br>
It wouldn't be hard to write a translation pass from gcroot to
statepoints or from statepoints to gcroot. If folks are concerned
about compatibility, this would be a reasonable option. The largest
challenge to transparently replacing one with the other is in
generating the right output format. <br>
<b><br>
</b><b>Summary</b><br>
To summarize, gcroot and statepoints are functionally equivalent
(modulo possible bugs.) In their current form, the two are largely
comparable with each having some benefits. Long term, we believe a
statepoint representation will allow better code generation and IR
level optimization of code with safepoints inserted. We believe
statepoints to be easier to optimize both at the IR level and
backend. <br>
<br>
Again, the late safepoint proposal is independent and could be done
with either representation. It's currently implemented on
statepoints, but it could be extended to gcroot without too much
work. <br>
<blockquote cite="mid:54403DFE.5010807@philipreames.com" type="cite">
<blockquote type="cite">
<br>
<br>
Other advantages of the statepoint approach:
<br>
<br>
The gc.relocate intrinsics (part of the statepoint proposal)
also makes it explicit in the IR what the base object of each
pointer to be relocated is. This isn't *required* (you could
encode the same information in the arguments of the statepoint),
but making it explicit is much cleaner.
<br>
<br>
The explicit relocation notation has the potential to be
extended in to the backend. With some register allocator
changes (not part of this patch!), we could support gc pointers
in callee saved registers. This is possible with the
(incorrect) patchpoint scheme. It is possible, but *hard*, with
the gc.root scheme.
<br>
<br>
The posted patch includes a couple of small optimizations (i.e.
null forwarding) that help performance, but could (probably) be
implemented on top of another scheme. We have a number of
planned optimizations on the statepoint mechanism.
<br>
<br>
<br>
Now, let me finally bring up late safepoint placement. The only
real impact on this patch is that, to date, we have only focused
on the *correctness* of a statepoint passing through the
optimizer. We have not attempted to teach the optimizer about
how to leverage one or perform optimizations over one. There's
room for improvement here (i.e. not completely blocking
inlining), but we prefer to approach this problem by simply
inserting them late. You could instead choose to insert them
at generation time, and teach the optimizer about their
semantics. That *strategy choice* is independent of the
representation choosen provided that representation is
*correct*.
<br>
<br>
Yours,
<br>
Philip
<br>
<br>
On 10/14/2014 07:01 PM, Kevin Modzelewski wrote:
<br>
<blockquote type="cite">I think a change like this might be more
compelling if you could give more detail on how it would
actually help (I can't find the detail I'm looking for in your
blog posts). It seems like the value of this patch is that it
will work with late safepoint placement, but it'd be nice to
see some examples of cases where late safepoint placement
gives you something that early safepoint placement (ie by the
frontend) doesn't. It kind of feels like either approach will
work well with only non-gc values, and neither approach will
be able to do much optimization when you do function calls.
I'm not trying to claim that that's necessarily true, but it'd
be easier to understand your point if there was some example
IR.
<br>
<br>
<a class="moz-txt-link-freetext" href="http://reviews.llvm.org/D5683">http://reviews.llvm.org/D5683</a>
<br>
<br>
<br>
</blockquote>
<br>
</blockquote>
<br>
_______________________________________________
<br>
llvm-commits mailing list
<br>
<a class="moz-txt-link-abbreviated" href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a>
<br>
<a class="moz-txt-link-freetext" href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a>
<br>
</blockquote>
<br>
</body>
</html>