<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <br>

    <div class="moz-cite-prefix">On 12/09/2014 03:12 AM, Ben Karel

      wrote:<br>

    </div>

    <blockquote

cite="mid:CABQwL+EFAw=iVR+EV8=d9Ppz+FGMY1vkmE-m844ZNwMdNAgSQQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">Hello Philip,

        <div><br>

        </div>

        <div>I am an active user of LLVM's GC infrastructure. Here are

          some notes on how I currently use gcroot():</div>

      </div>

    </blockquote>

    Wow, thank you for sharing!  Your usage is far beyond anything else

    I've heard of.  Can I ask which project/language this is?  Or is

    that proprietary?  <br>

    <blockquote

cite="mid:CABQwL+EFAw=iVR+EV8=d9Ppz+FGMY1vkmE-m844ZNwMdNAgSQQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div><br>

        </div>

        <div>1) I use several IRs. A high-level (CPS/SSA hybrid) IR is

          the target of inlining and other such high-level optimization;

          most allocation is implicit in the high level IR. Closure

          conversion/lambda-lifting produces a variant of that IR,

          extended with explicit allocation constructs. A separate pass

          introduces the allocations themselves.</div>

        <div>2) Next, a dataflow-based root insertion pass inserts the

          minimal(ish) set of roots and root reloads to preserve the

          invariant that every live GCable pointer is in a root at each

          potential GC point. The root insertion pass also computes

          liveness and uses it to minimize the set of roots via slot

          coloring. The result of root insertion is (very close to) LLVM

          IR.</div>

      </div>

    </blockquote>

    I would be really interested in hearing more about your

    implementation for this part.  We need to solve a similar problem in

    the statepoint lowering code.  What we have to date is a simply

    greedy scheme that works reasonable well in practice, but leaves

    lots of room for improvement.  <br>

    <blockquote

cite="mid:CABQwL+EFAw=iVR+EV8=d9Ppz+FGMY1vkmE-m844ZNwMdNAgSQQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div>3) Potential GC points are determined inter-procedurally,

          and call sites which are statically known to (transitively)

          not GC do not break GCable pointer live ranges.</div>

      </div>

    </blockquote>

    I was planning something like this for LLVM at some point.  It's

    relatively low on my priority list since IPO tends to be of minimal

    interest when JITing which is my primary use case.  <br>

    <blockquote

cite="mid:CABQwL+EFAw=iVR+EV8=d9Ppz+FGMY1vkmE-m844ZNwMdNAgSQQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div>4) I use a lightly-modified variant of the OCaml plugin's

          stackmap format.</div>

        <div>5) I don't currently use load or store barriers, but do

          eventually plan to.</div>

        <div>6) I do support GCing through global variables.</div>

        <div>7) I wrote a custom LLVM pass to verify that values loaded

          from GC roots aren't used across GC points.</div>

      </div>

    </blockquote>

    Cool.  We have a similar one for statepoints.  (If you can share,

    getting both into the common tree would be nice.)<br>

    <blockquote

cite="mid:CABQwL+EFAw=iVR+EV8=d9Ppz+FGMY1vkmE-m844ZNwMdNAgSQQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div><br>

        </div>

        <div>When I first started using gcroot(), I used the naïve

          approach of reloading roots after every call, and encountered

          significant overhead. The optimizations sketched above

          significantly reduced that overhead.  Unfortunately, I don't

          know of any meaningful language-independent benchmark suites

          for measuring that sort of overhead, and of course the

          overhead on any given program will strongly depend on details

          of how that program is written...</div>

        <div><br>

        </div>

        <div>Also, FWIW when I first started, I got up and running with

          the shadow stack, just as Gordon described, before

          transitioning to the "real" infrastructure. As long as it

          doesn't carry a significant burden on the implementation side,

          I think it's worth having, because it significantly improves

          the learning curve for new users.</div>

      </div>

    </blockquote>

    I'm planning on retaining the shadow stack mechanism.  <br>

    <blockquote

cite="mid:CABQwL+EFAw=iVR+EV8=d9Ppz+FGMY1vkmE-m844ZNwMdNAgSQQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div><br>

        </div>

        <div>OK, direct answers to some of your questions:</div>

        <div><br>

        </div>

        <div>* I think custom stack map formats have small but non-zero

          value. There have been a few papers (most in the early 90's, I

          think) which showed that stack maps can make up a non-trivial

          fraction of a binary's size, and thus are increasingly

          desirable to optimize as programs grow larger. My verdict: if

          support for custom formats are ever actively impeding forward

          progress, toss 'em; otherwise, there should (eventually) be a

          more detailed look at the costs and benefits.</div>

      </div>

    </blockquote>

    My preferred usage model would be: LLVM generates standard format,

    runtime parses and saves in custom binary format.<br>

    <br>

    Having said that, retaining the capability doesn't seem to involve

    too much complexity.  I see no reason to kill it since multiple

    folks seem to want it.  <br>

    <blockquote

cite="mid:CABQwL+EFAw=iVR+EV8=d9Ppz+FGMY1vkmE-m844ZNwMdNAgSQQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div><br>

        </div>

        <div>* As above, I think the primary benefit of

          one-stackmap-per-safepoint is saving space at the cost of time

          (and root traffic). AFAIK the primary virtue of the

          one-stack-slot-per-gcroot() implementation is implementation

          simplicity, nothing more.</div>

        <div><br>

        </div>

        <div>* The ultimate strength & weakness of gcroot() is that

          basically everything is left to the frontend. There's very

          little unavoidable overhead imposed on a frontend that is

          willing to go the distance to generate non-naïve code -- any

          knowledge that the frontend has can be used to generate better

          code. Unfortunately, this also places a rather heavy burden on

          the frontend to generate valid & efficient code.</div>

        <div><br>

        </div>

        <div><br>

        </div>

        <div>Since I haven't had the chance to look beyond mailing list

          & blog posts on statepoints, I can't comment much on their

          tradeoffs vs gcroots. The high-level impression I get is that

          statepoints will allow a less-sophisticated frontend to get

          better results than naïve usage of gcroot(). It also looks

          like statepoints will do a better job of steering frontends

          away from the pitfalls of using GC-invalidated pointer

          values.  I have no idea whether there will be any codegen

          benefit for more sophisticated frontends, but I'll be happy to

          port my implementation to statepoints once they've gotten a

          chance to settle down somewhat, and provide more detailed

          feedback to help determine what to eventually do with

          gcroot(). I can currently only do ad-hoc benchmarking, but

          hopefully by the time gcroot() is on the chopping block, I'll

          have a more extensive automated performance regression test

          suite ;-)</div>

      </div>

    </blockquote>

    I'll be very interested in your results.  I'm quite sure you'll find

    stability and performance bugs.  The existing code 'works' but has

    really only been hammered in one particular use case (source

    language, virtual machine).  Every new consumer will help to make

    the code more robust.  Let me know when you're ready to start

    prototyping this.  I'm happy to answer questions and help guide you

    around any bugs you might find.  <br>

    <br>

    I suspect from what you've said above that we might want to extend

    the mechanism slightly to allow you to retain your preassigned stack

    slots.  You may be getting better spilling code than the current

    implementation would give you by default.  This seems like it might

    be a generally useful mechanism.  We could either use a call

    attribute on the gc.relocate, a bit of metadata, or possible an

    optional third argument that specifies an 'abstract slot'.  It would

    depend on your initial results and how much we've managed to improve

    the lowering code by then.  :)<br>

    <blockquote

cite="mid:CABQwL+EFAw=iVR+EV8=d9Ppz+FGMY1vkmE-m844ZNwMdNAgSQQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div><br>

        </div>

        <div><br>

        </div>

        <div>One quick question: I think I understand why address spaces

          are needed to distinguish GCable pointers for late safepoint

          placement, but are they also needed if the frontend is

          inserting all safepoints itself?</div>

      </div>

    </blockquote>

    Nope.  They might allow some additional sanity checks, but nothing

    that is currently checked in relies on address spaces in any way. 

    My plan is to make the pointer distinction mechanism (addrspace,

    gcroot, vs custom) a property of the GCStrategy with easily control

    extension points.  <br>

    <blockquote

cite="mid:CABQwL+EFAw=iVR+EV8=d9Ppz+FGMY1vkmE-m844ZNwMdNAgSQQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div><br>

        </div>

        <div><br>

        </div>

        <div>Finally, thank you for taking on the burden of improving

          LLVM's GC functionality! <br>

        </div>

      </div>

    </blockquote>

    Thank you for sharing your experience!<br>

    <blockquote

cite="mid:CABQwL+EFAw=iVR+EV8=d9Ppz+FGMY1vkmE-m844ZNwMdNAgSQQ@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div><br>

        </div>

      </div>

      <div class="gmail_extra"><br>

        <div class="gmail_quote">On Thu, Dec 4, 2014 at 8:50 PM, Philip

          Reames <span dir="ltr"><<a moz-do-not-send="true"

              href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>></span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">Now that

            the statepoint changes have landed, I wanted to start a

            discussion about what's next for GC support in LLVM.  I'm

            going to sketch out a strawman proposal, but I'm not set on

            any of this.  I mostly just want to draw interested parties

            out of the woodwork.  :)<br>

            <br>

            Overall Direction:<br>

            In the short term, my intent is to preserve the

            functionality of the existing code, but migrate towards a

            position where the gcroot specific pieces are optional and

            well separated.  I also plan to start updating the

            documentation to reflect a separation between the general

            support for garbage collection (function attributes,

            identifying references, load and store barrier lowering,

            generating stack maps) and the implementation choices

            (gcroot & it's lowering vs statepoints & addr spaces

            for identifying references).<br>

            <br>

            Longer term, I plan to *EVENTUALLY DELETE* the existing

            gcroot lowering code and in tree GCStrategies unless an

            interesting party speaks up.  I have no problem with

            retaining some of the existing pieces for legacy support or

            helping users to migrate, but as of right now, I don't know

            of any such active users.  The only exception to this might

            be the shadow stack GC.  Eventually in this context is at

            least six months from now, but likely less than 18 months. 

            Hopefully, that's vague enough.  :)<br>

            <br>

            HELP - If anyone knows which Ocaml implementation and which

            Erlang implementation triggered the in tree GC strategies,

            please let me know!<br>

            <br>

            <br>

            Near Term Changes:<br>

            - Migrate ownership of GCStrategy objects from GCModuleInfo

            to LLVMContext.  In theory, this looses the ability for two

            different Modules to have the same collector with different

            state, but I know of no use case for this.<br>

            - Modify the primary Function::getGC/setGC interface to

            return a reference the GCStrategy object, not a string.  I

            will provide a Function::setGCString and getGCString.<br>

            - Extend the GCStrategy class to include a notion of which

            compilation strategy is being used.  The two choices right

            now will be Legacy and Statepoint.  (Longer term, this will

            likely become a more fine grained choice.)<br>

            - Separate GCStategy and related pieces from the

            GCFunctionInfo/GCModuleInfo/GCMetadataPrinter lowering

            code.  At first, this will simply mean clarifying

            documentation and rearranging code a bit.<br>

            - Document/clarify the callbacks used to customize the

            lowering. Decide which of these make sense to preserve and

            document.<br>

            <br>

            (Lest anyone get the wrong idea, the above changes are

            intended to be minor cleanup.  I'm not looking to do

            anything controversial yet.)<br>

            <br>

            Questions:<br>

            - Is proving the ability to generate a custom binary stack

            map format a valuable feature?  Adapting the new statepoint

            infrastructure to work with the existing GCMetadataPrinter

            classes wouldn't be particularly hard.<br>

            - Are there any GCs out there that need gcroot's single

            stack slot per value implementation?   By default,

            statepoints may generate a different stackmap for every

            safepoint in a function.<br>

            - Is using gcroot and allocas to mark pointers as garbage

            collected references valuable?  (As opposed to using address

            spaces on the SSA values themselves?)  Long term, should we

            retain the gcroot marker intrinsics at all?<br>

            <br>

            <br>

            Philip<br>

            <br>

            Appendix: The Current Implementations Key Classes:<br>

            <br>

            GCStrategy - Provides a configurable description of the

            collector. The strategy can also override parts of the

            default GC root lowering strategy.  The concept of such a

            collector description is very valuable, but the current

            implementation could use some cleanup.  In particular, the

            custom lowering hooks are a bit of a mess.<br>

            <br>

            GCMetadataPrinter - Provides a means to dump a custom binary

            format describing each functions safepoints.  All safepoints

            in a function must share a single root Value to stack slot

            mapping.<br>

            <br>

            GCModuleInfo/GCFunctionInfo - These contain the metadata

            which is saved to enable GCMetadataPrinter.<br>

            <br>

            _______________________________________________<br>

            LLVM Developers mailing list<br>

            <a moz-do-not-send="true" href="mailto:LLVMdev@cs.uiuc.edu"

              target="_blank">LLVMdev@cs.uiuc.edu</a>         <a

              moz-do-not-send="true" href="http://llvm.cs.uiuc.edu"

              target="_blank">http://llvm.cs.uiuc.edu</a><br>

            <a moz-do-not-send="true"

              href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev"

              target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

          </blockquote>

        </div>

        <br>

      </div>

    </blockquote>

    <br>

  </body>

</html>