<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<br>
<div class="moz-cite-prefix">On 12/09/2014 03:12 AM, Ben Karel
wrote:<br>
</div>
<blockquote
cite="mid:CABQwL+EFAw=iVR+EV8=d9Ppz+FGMY1vkmE-m844ZNwMdNAgSQQ@mail.gmail.com"
type="cite">
<div dir="ltr">Hello Philip,
<div><br>
</div>
<div>I am an active user of LLVM's GC infrastructure. Here are
some notes on how I currently use gcroot():</div>
</div>
</blockquote>
Wow, thank you for sharing! Your usage is far beyond anything else
I've heard of. Can I ask which project/language this is? Or is
that proprietary? <br>
<blockquote
cite="mid:CABQwL+EFAw=iVR+EV8=d9Ppz+FGMY1vkmE-m844ZNwMdNAgSQQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div><br>
</div>
<div>1) I use several IRs. A high-level (CPS/SSA hybrid) IR is
the target of inlining and other such high-level optimization;
most allocation is implicit in the high level IR. Closure
conversion/lambda-lifting produces a variant of that IR,
extended with explicit allocation constructs. A separate pass
introduces the allocations themselves.</div>
<div>2) Next, a dataflow-based root insertion pass inserts the
minimal(ish) set of roots and root reloads to preserve the
invariant that every live GCable pointer is in a root at each
potential GC point. The root insertion pass also computes
liveness and uses it to minimize the set of roots via slot
coloring. The result of root insertion is (very close to) LLVM
IR.</div>
</div>
</blockquote>
I would be really interested in hearing more about your
implementation for this part. We need to solve a similar problem in
the statepoint lowering code. What we have to date is a simply
greedy scheme that works reasonable well in practice, but leaves
lots of room for improvement. <br>
<blockquote
cite="mid:CABQwL+EFAw=iVR+EV8=d9Ppz+FGMY1vkmE-m844ZNwMdNAgSQQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>3) Potential GC points are determined inter-procedurally,
and call sites which are statically known to (transitively)
not GC do not break GCable pointer live ranges.</div>
</div>
</blockquote>
I was planning something like this for LLVM at some point. It's
relatively low on my priority list since IPO tends to be of minimal
interest when JITing which is my primary use case. <br>
<blockquote
cite="mid:CABQwL+EFAw=iVR+EV8=d9Ppz+FGMY1vkmE-m844ZNwMdNAgSQQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>4) I use a lightly-modified variant of the OCaml plugin's
stackmap format.</div>
<div>5) I don't currently use load or store barriers, but do
eventually plan to.</div>
<div>6) I do support GCing through global variables.</div>
<div>7) I wrote a custom LLVM pass to verify that values loaded
from GC roots aren't used across GC points.</div>
</div>
</blockquote>
Cool. We have a similar one for statepoints. (If you can share,
getting both into the common tree would be nice.)<br>
<blockquote
cite="mid:CABQwL+EFAw=iVR+EV8=d9Ppz+FGMY1vkmE-m844ZNwMdNAgSQQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div><br>
</div>
<div>When I first started using gcroot(), I used the naïve
approach of reloading roots after every call, and encountered
significant overhead. The optimizations sketched above
significantly reduced that overhead. Unfortunately, I don't
know of any meaningful language-independent benchmark suites
for measuring that sort of overhead, and of course the
overhead on any given program will strongly depend on details
of how that program is written...</div>
<div><br>
</div>
<div>Also, FWIW when I first started, I got up and running with
the shadow stack, just as Gordon described, before
transitioning to the "real" infrastructure. As long as it
doesn't carry a significant burden on the implementation side,
I think it's worth having, because it significantly improves
the learning curve for new users.</div>
</div>
</blockquote>
I'm planning on retaining the shadow stack mechanism. <br>
<blockquote
cite="mid:CABQwL+EFAw=iVR+EV8=d9Ppz+FGMY1vkmE-m844ZNwMdNAgSQQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div><br>
</div>
<div>OK, direct answers to some of your questions:</div>
<div><br>
</div>
<div>* I think custom stack map formats have small but non-zero
value. There have been a few papers (most in the early 90's, I
think) which showed that stack maps can make up a non-trivial
fraction of a binary's size, and thus are increasingly
desirable to optimize as programs grow larger. My verdict: if
support for custom formats are ever actively impeding forward
progress, toss 'em; otherwise, there should (eventually) be a
more detailed look at the costs and benefits.</div>
</div>
</blockquote>
My preferred usage model would be: LLVM generates standard format,
runtime parses and saves in custom binary format.<br>
<br>
Having said that, retaining the capability doesn't seem to involve
too much complexity. I see no reason to kill it since multiple
folks seem to want it. <br>
<blockquote
cite="mid:CABQwL+EFAw=iVR+EV8=d9Ppz+FGMY1vkmE-m844ZNwMdNAgSQQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div><br>
</div>
<div>* As above, I think the primary benefit of
one-stackmap-per-safepoint is saving space at the cost of time
(and root traffic). AFAIK the primary virtue of the
one-stack-slot-per-gcroot() implementation is implementation
simplicity, nothing more.</div>
<div><br>
</div>
<div>* The ultimate strength & weakness of gcroot() is that
basically everything is left to the frontend. There's very
little unavoidable overhead imposed on a frontend that is
willing to go the distance to generate non-naïve code -- any
knowledge that the frontend has can be used to generate better
code. Unfortunately, this also places a rather heavy burden on
the frontend to generate valid & efficient code.</div>
<div><br>
</div>
<div><br>
</div>
<div>Since I haven't had the chance to look beyond mailing list
& blog posts on statepoints, I can't comment much on their
tradeoffs vs gcroots. The high-level impression I get is that
statepoints will allow a less-sophisticated frontend to get
better results than naïve usage of gcroot(). It also looks
like statepoints will do a better job of steering frontends
away from the pitfalls of using GC-invalidated pointer
values. I have no idea whether there will be any codegen
benefit for more sophisticated frontends, but I'll be happy to
port my implementation to statepoints once they've gotten a
chance to settle down somewhat, and provide more detailed
feedback to help determine what to eventually do with
gcroot(). I can currently only do ad-hoc benchmarking, but
hopefully by the time gcroot() is on the chopping block, I'll
have a more extensive automated performance regression test
suite ;-)</div>
</div>
</blockquote>
I'll be very interested in your results. I'm quite sure you'll find
stability and performance bugs. The existing code 'works' but has
really only been hammered in one particular use case (source
language, virtual machine). Every new consumer will help to make
the code more robust. Let me know when you're ready to start
prototyping this. I'm happy to answer questions and help guide you
around any bugs you might find. <br>
<br>
I suspect from what you've said above that we might want to extend
the mechanism slightly to allow you to retain your preassigned stack
slots. You may be getting better spilling code than the current
implementation would give you by default. This seems like it might
be a generally useful mechanism. We could either use a call
attribute on the gc.relocate, a bit of metadata, or possible an
optional third argument that specifies an 'abstract slot'. It would
depend on your initial results and how much we've managed to improve
the lowering code by then. :)<br>
<blockquote
cite="mid:CABQwL+EFAw=iVR+EV8=d9Ppz+FGMY1vkmE-m844ZNwMdNAgSQQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div><br>
</div>
<div><br>
</div>
<div>One quick question: I think I understand why address spaces
are needed to distinguish GCable pointers for late safepoint
placement, but are they also needed if the frontend is
inserting all safepoints itself?</div>
</div>
</blockquote>
Nope. They might allow some additional sanity checks, but nothing
that is currently checked in relies on address spaces in any way.
My plan is to make the pointer distinction mechanism (addrspace,
gcroot, vs custom) a property of the GCStrategy with easily control
extension points. <br>
<blockquote
cite="mid:CABQwL+EFAw=iVR+EV8=d9Ppz+FGMY1vkmE-m844ZNwMdNAgSQQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div><br>
</div>
<div><br>
</div>
<div>Finally, thank you for taking on the burden of improving
LLVM's GC functionality! <br>
</div>
</div>
</blockquote>
Thank you for sharing your experience!<br>
<blockquote
cite="mid:CABQwL+EFAw=iVR+EV8=d9Ppz+FGMY1vkmE-m844ZNwMdNAgSQQ@mail.gmail.com"
type="cite">
<div dir="ltr">
<div><br>
</div>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Thu, Dec 4, 2014 at 8:50 PM, Philip
Reames <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">Now that
the statepoint changes have landed, I wanted to start a
discussion about what's next for GC support in LLVM. I'm
going to sketch out a strawman proposal, but I'm not set on
any of this. I mostly just want to draw interested parties
out of the woodwork. :)<br>
<br>
Overall Direction:<br>
In the short term, my intent is to preserve the
functionality of the existing code, but migrate towards a
position where the gcroot specific pieces are optional and
well separated. I also plan to start updating the
documentation to reflect a separation between the general
support for garbage collection (function attributes,
identifying references, load and store barrier lowering,
generating stack maps) and the implementation choices
(gcroot & it's lowering vs statepoints & addr spaces
for identifying references).<br>
<br>
Longer term, I plan to *EVENTUALLY DELETE* the existing
gcroot lowering code and in tree GCStrategies unless an
interesting party speaks up. I have no problem with
retaining some of the existing pieces for legacy support or
helping users to migrate, but as of right now, I don't know
of any such active users. The only exception to this might
be the shadow stack GC. Eventually in this context is at
least six months from now, but likely less than 18 months.
Hopefully, that's vague enough. :)<br>
<br>
HELP - If anyone knows which Ocaml implementation and which
Erlang implementation triggered the in tree GC strategies,
please let me know!<br>
<br>
<br>
Near Term Changes:<br>
- Migrate ownership of GCStrategy objects from GCModuleInfo
to LLVMContext. In theory, this looses the ability for two
different Modules to have the same collector with different
state, but I know of no use case for this.<br>
- Modify the primary Function::getGC/setGC interface to
return a reference the GCStrategy object, not a string. I
will provide a Function::setGCString and getGCString.<br>
- Extend the GCStrategy class to include a notion of which
compilation strategy is being used. The two choices right
now will be Legacy and Statepoint. (Longer term, this will
likely become a more fine grained choice.)<br>
- Separate GCStategy and related pieces from the
GCFunctionInfo/GCModuleInfo/GCMetadataPrinter lowering
code. At first, this will simply mean clarifying
documentation and rearranging code a bit.<br>
- Document/clarify the callbacks used to customize the
lowering. Decide which of these make sense to preserve and
document.<br>
<br>
(Lest anyone get the wrong idea, the above changes are
intended to be minor cleanup. I'm not looking to do
anything controversial yet.)<br>
<br>
Questions:<br>
- Is proving the ability to generate a custom binary stack
map format a valuable feature? Adapting the new statepoint
infrastructure to work with the existing GCMetadataPrinter
classes wouldn't be particularly hard.<br>
- Are there any GCs out there that need gcroot's single
stack slot per value implementation? By default,
statepoints may generate a different stackmap for every
safepoint in a function.<br>
- Is using gcroot and allocas to mark pointers as garbage
collected references valuable? (As opposed to using address
spaces on the SSA values themselves?) Long term, should we
retain the gcroot marker intrinsics at all?<br>
<br>
<br>
Philip<br>
<br>
Appendix: The Current Implementations Key Classes:<br>
<br>
GCStrategy - Provides a configurable description of the
collector. The strategy can also override parts of the
default GC root lowering strategy. The concept of such a
collector description is very valuable, but the current
implementation could use some cleanup. In particular, the
custom lowering hooks are a bit of a mess.<br>
<br>
GCMetadataPrinter - Provides a means to dump a custom binary
format describing each functions safepoints. All safepoints
in a function must share a single root Value to stack slot
mapping.<br>
<br>
GCModuleInfo/GCFunctionInfo - These contain the metadata
which is saved to enable GCMetadataPrinter.<br>
<br>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a moz-do-not-send="true" href="mailto:LLVMdev@cs.uiuc.edu"
target="_blank">LLVMdev@cs.uiuc.edu</a> <a
moz-do-not-send="true" href="http://llvm.cs.uiuc.edu"
target="_blank">http://llvm.cs.uiuc.edu</a><br>
<a moz-do-not-send="true"
href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev"
target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
</blockquote>
</div>
<br>
</div>
</blockquote>
<br>
</body>
</html>