<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">On 10/25/13 9:37 PM, Michael Lewis
      wrote:<br>
    </div>
    <blockquote
cite="mid:CAEm7p3tJdP6dKcL8Q4gFeVS_5U=6Yc5D5uR_kx6-4PCAtN0gzg@mail.gmail.com"
      type="cite">
      <div dir="ltr">I'm also highly interested in relocating-GC support
        from LLVM. Up until now my GC implementation has been
        non-relocating which is obviously kind of a bummer given that it
        inhibits certain classes of memory allocation/deallocation
        tricks.</div>
    </blockquote>
    Out of idle curiosity, which optimizations were you most interested
    in?<br>
    <blockquote
cite="mid:CAEm7p3tJdP6dKcL8Q4gFeVS_5U=6Yc5D5uR_kx6-4PCAtN0gzg@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div>I wrote up a bunch of my findings on the implementation of
          my GC here: <a moz-do-not-send="true"
href="https://code.google.com/p/epoch-language/wiki/GarbageCollectionScheme">https://code.google.com/p/epoch-language/wiki/GarbageCollectionScheme</a></div>
      </div>
    </blockquote>
    Thanks for sharing your experiences.  I had actually run across this
    a while ago and read through it.  It's always nice to learn from
    others rather than repeating the same lessons. For example, the
    stack coloring problem you mention was one I'm glad to know I don't
    have to discover myself.  :)<br>
    <blockquote
cite="mid:CAEm7p3tJdP6dKcL8Q4gFeVS_5U=6Yc5D5uR_kx6-4PCAtN0gzg@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div><br>
        </div>
        <div><br>
        </div>
        <div>Frankly I haven't had time to get much further than idle
          pondering of how I'd go about implementing relocation, but it
          seems to me like the existing read/write barrier intrinsics
          might be sufficient if abused properly by the front-end and
          lowered carefully. My operating hypothesis has been to stash
          barriers at key points - identified by the front-end itself -
          and possibly elide them during lowering if it's safe to do so.
          If my understanding is correct, it should be possible to lower
          the barriers into exactly the kind of boxing/unboxing
          procedure described in this thread.</div>
        <div><br>
        </div>
        <div>Based on my experimentation so far, which is admittedly
          minimal, I think the overhead of updating relocated pointers
          is actually not as bad as it sounds. It isn't strictly
          necessary to store both a boxed *and* unboxed pointer in every
          frame. In fact, in the current incarnation of gcroot, marking
          an alloca as a gcroot effectively forces a stack spill for
          that alloca; this means that the relocating collector just
          updates the single existing pointer on the stack when it does
          its magic, and you're done. With proper barriers in place,
          and/or careful location of safepoints by the front-end, it's
          not that hard to make sure that a relocated pointer gets
          updated.</div>
      </div>
    </blockquote>
    I'm not concerned about the correctness of such an implementation. 
    I am concerned about the performance.  I think it's time for us to
    do some prototyping on our side to see how things work out in
    practice.  I can't promise to share the detailed results, but I will
    try to share as much as I can.  <br>
    <blockquote
cite="mid:CAEm7p3tJdP6dKcL8Q4gFeVS_5U=6Yc5D5uR_kx6-4PCAtN0gzg@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div><br>
        </div>
        <div>The real trick, as near as I can tell, would be updating
          registers that have live roots during a collection. But again
          based on my past investigations this should just be a matter
          of ensuring that post-collection you have a barrier that is
          lowered into a register refresh based on the on-stack
          relocated pointer value.</div>
        <div><br>
        </div>
        <div><br>
        </div>
        <div>One thing I've been meaning to try and do is use this
          barrier-abuse scheme to work around the existing lack of
          support for tracing roots in machine registers, by effectively
          setting up an artificial stack spill just prior to a
          collection, and otherwise operating on registers as much as
          possible instead of gcroot'ed allocas. Again, I've only
          considered this as a thought exercise until now, so I
          apologize if there's some obvious flaw I'm not aware of in my
          reasoning. I haven't actually done any of this stuff yet so
          it's more speculative than anything - just trying to
          creatively engineer around the existing limitations :-)</div>
      </div>
    </blockquote>
    Hey speculation is fine!  That's what I'm doing at the moment.  You
    have to speculate while trying figure out if something is worth
    actually implementing.  :)<br>
    <br>
    This is conceptually close to the scheme I'm considering, though
    different in details.  If you get a chance, let me know how it works
    out.  <br>
    <blockquote
cite="mid:CAEm7p3tJdP6dKcL8Q4gFeVS_5U=6Yc5D5uR_kx6-4PCAtN0gzg@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div><br>
        </div>
        <div><br>
        </div>
        <div><br>
        </div>
        <div> - Mike</div>
        <div><br>
        </div>
        <div><br>
        </div>
        <div class="gmail_extra"><br>
          <br>
          <div class="gmail_quote">On Fri, Oct 25, 2013 at 5:35 PM,
            Philip Reames <span dir="ltr"><<a moz-do-not-send="true"
                href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>></span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0 0 0
              .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div bgcolor="#FFFFFF" text="#000000">
                <div class="im">
                  <div>On 10/25/13 1:10 PM, Ben Karel wrote:<br>
                  </div>
                  <blockquote type="cite">
                    <div dir="ltr"><br>
                      <div class="gmail_extra"><br>
                        <br>
                        <div class="gmail_quote">On Thu, Oct 24, 2013 at
                          6:42 PM, Sanjoy Das <span dir="ltr"><<a
                              moz-do-not-send="true"
                              href="mailto:sanjoy@azulsystems.com"
                              target="_blank">sanjoy@azulsystems.com</a>></span>
                          wrote:<br>
                          <blockquote class="gmail_quote"
                            style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Hi


                            Rafael, Andrew,<br>
                            <br>
                            Thank you for the prompt reply.<br>
                            <br>
                            One approach we've been considering involves
                            representing the<br>
                            constraint "pointers to heap objects are
                            invalidated at every<br>
                            safepoint" somehow in the IR itself.  So, if
                            %a and %b are values the<br>
                            GC is interested in, the safepoint at the
                            back-edge of a loop might<br>
                            look like:<br>
                            <br>
                              ; <label>: body<br>
                                %a = phi i32 [ %a.relocated, %body ] [
                            %a.initial_value, %pred ]<br>
                                %b = phi i32 [ %b.relocated, %body ] [
                            %b.initial_value, %pred ]<br>
                                ;; Use %a and %b<br>
                            <br>
                                ;; The safepoint starts here<br>
                                %a.relocated = @llvm.gc_relocate(%a)<br>
                                %b.relocated = @llvm.gc_relocate(%b)<br>
                                br %body<br>
                            <br>
                            This allows us to not bother with relocating
                            derived pointers pointing<br>
                            inside %a and %b, since it is semantically
                            incorrect for llvm to reuse<br>
                            them in the next iteration of the loop. </blockquote>
                          <div><br>
                          </div>
                          <div>This is the right general idea, but you
                            can already express this constraint in LLVM
                            as it exists today, by using
                            llvm.gcroot().  As you noted, this also
                            solves the interior-pointer problem by
                            making it the front end's job to convey to
                            LLVM when it would/would not be safe to
                            cache interior pointers across loop
                            iterations. The invariant that a front-end
                            must maintain is that any pointer which is
                            live across a given potential-GC-point must
                            be reloaded from its root after a
                            (relocating) GC might have occurred. This
                            falls naturally out of the viewpoint that %a
                            is not an object pointer, it's the name of
                            an object pointer's value at a given point
                            in time. So, of course, whenever that object
                            pointer's value might change, there must be
                            a new name.</div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </div>
                To rephrase, you're saying that the current gcroot
                mechanism is analogous to a boxed object pointer.  We
                could model a safepoint by storing the unboxed value
                into the box, applying an opaque operation to the box,
                and reloading the raw object pointer from the box.  (The
                opaque operator is key to prevent the store/load pair
                from being removed.  It also implements the actual
                safepoint operation.)  Is this a correct restatement?<br>
                <br>
                Here's an example:<br>
                gcroot a = ...; b = ...;<br>
                object* ax = *a, bx = *b;<br>
                repeat 500000 iterations { <br>
                  spin for 50 instructions<br>
                  *a = ax;<br>
                  *b = bx;<br>
                  safepoint(a, b)<br>
                  ax = *a;<br>
                  bx = *b;<br>
                } <br>
                <br>
                If so, I would agree with you that this is a correct
                encoding of object relocation.  I think what got me
                confused was using the in-tree examples to reverse
                engineer the semantics.  Both the Erlang and Ocaml
                examples insert safepoints after all the LLVM IR passes
                are run and derived pointers were inserted.  This was
                the part that got me thinking that the gcroot semantics
                were unsuitable for a precise relocating collector.  If
                you completely ignored these implementations and
                inserted the full box/safepoint call/unbox
                implementation at each safepoint before invoking any
                optimizations, I think this would be correct.  <br>
                <br>
                One problem with this encoding is that there does not
                currently exist an obvious mechanism to describe a
                safepoint in the IR itself.  (i.e. there is no
                llvm.gcsafepoint)  You could model this with a "well
                known" function which a custom GCStrategy recorded a
                stackmap on every call.  It would be good to extend the
                existing intrinsics with such a mechanism.  <br>
                <br>
                At least if I'm reading things right, the current
                *implementation* is not a correct implementation of the
                intended semantics.  In particular, the safepoint
                operation which provides the opaque manipulation of the
                boxed gcroots is not preserved into the SelectionDAG. 
                As a result, optimizations on the SelectionDAG could
                eliminate the now adjacent box/unbox pairs.  This would
                break a relocating collector.  <br>
                <br>
                Leaving this aside, there are also a number of
                performance problems with the current implementation. 
                I'll summarize them here for now, but will expand into
                approaches for resolving each if this looks like the
                most likely implementation strategy.  <br>
                1) Explicitly adding box/safepoint/unbox prevents
                creation of derived pointers.  This prevents
                optimizations such as loop-blocking.  Ideally, we'd
                rather let the derived pointers be created and capture
                them for updating at the safepoint.  <br>
                2) The redundant loads and stores required for
                box/unbox.  (These might be partially eliminated by
                other optimization passes.)<br>
                3) The inability to allocate GC roots to registers.  <br>
                4) Frame sizes are implicitly doubled since both an
                unboxed and boxed representation of every object pointer
                is required.  <br>
                <br>
                Frankly, I think that (3) is likely solvable, but that
                (1) and (4) could be fatal for a high performance
                implementation.  I don't have hard numbers on that; I'm
                simply stating my intuition.  <br>
                <div class="im"> <br>
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div class="gmail_extra">
                        <div class="gmail_quote">
                          <div><br>
                          </div>
                          <div>The fact that the mutable memory
                            associated with a gcroot() is allocated on
                            the stack (rather than, say, a machine
                            register) is an implementation detail;
                            fixing it doesn't require altering the
                            (conceptual) interface for LLVM's existing
                            GC support, AFAICT.</div>
                        </div>
                      </div>
                    </div>
                  </blockquote>
                </div>
                While in principal, I agree with you here, in practice
                the difference is actually quite significant.  I am not
                convinced that the implementation would be straight
                forward.  Let's set this aside for the moment and focus
                on the more fundamental questions.  <br>
                <div class="im">
                  <blockquote type="cite">
                    <div dir="ltr">
                      <div class="gmail_extra">
                        <div class="gmail_quote">
                          <div> </div>
                          <blockquote class="gmail_quote"
                            style="margin:0px 0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">We


                            lower gc_relocate to a<br>
                            pseudo opcode which lowered into nothing
                            after register allocation.<br>
                            <br>
                            The problem is, of course, the performance
                            penalty.  Does it make<br>
                            sense to get the register allocator "see"
                            the gc_relocate instruction<br>
                            as a copy so that they get the same register
                            / slot?  Will that<br>
                            violate the intended semantics of
                            gc_relocate (for example, after<br>
                            assigning the same register / slot to %a and
                            %a.relocated, are there<br>
                            passes that will try to cache derived
                            pointers across loop<br>
                            iterations)?<br>
                            <br>
                            Thanks,<br>
                            -- Sanjoy<br>
                            <div>
                              <div><br>
                                <br>
_______________________________________________<br>
                                LLVM Developers mailing list<br>
                                <a moz-do-not-send="true"
                                  href="mailto:LLVMdev@cs.uiuc.edu"
                                  target="_blank">LLVMdev@cs.uiuc.edu</a>
                                        <a moz-do-not-send="true"
                                  href="http://llvm.cs.uiuc.edu"
                                  target="_blank">http://llvm.cs.uiuc.edu</a><br>
                                <a moz-do-not-send="true"
                                  href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev"
                                  target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
                              </div>
                            </div>
                          </blockquote>
                        </div>
                        <br>
                      </div>
                    </div>
                    <br>
                    <fieldset></fieldset>
                    <br>
                    <pre>_______________________________________________
LLVM Developers mailing list
<a moz-do-not-send="true" href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a>         <a moz-do-not-send="true" href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a>
<a moz-do-not-send="true" href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a>
</pre>
                  </blockquote>
                  <br>
                </div>
              </div>
              <br>
              _______________________________________________<br>
              LLVM Developers mailing list<br>
              <a moz-do-not-send="true"
                href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>
                      <a moz-do-not-send="true"
                href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>
              <a moz-do-not-send="true"
                href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev"
                target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
              <br>
            </blockquote>
          </div>
          <br>
        </div>
      </div>
    </blockquote>
    <br>
  </body>
</html>