<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <br>

    <div class="moz-cite-prefix">On 10/16/2014 02:51 PM, Philip Reames

      wrote:<br>

    </div>

    <blockquote cite="mid:54403DFE.5010807@philipreames.com" type="cite">

      <br>

      On 10/15/2014 02:52 PM, Philip Reames wrote:

      <br>

      <blockquote type="cite">Kevin,

        <br>

        <br>

        Let me try to answer the point you're getting at.  In doing so,

        I want to explicitly separate the statepoint intrinsics which

        are currently up for review, and the future late safepoint

        placement. The statepoint intrinsics have value separate from

        the late safepoint placement approach, and I want to justify

        them on their own merits.

        <br>

        <br>

        The basic problem we're trying to solve with these intrinsics is

        supporting fully relocating collectors.  By definition, such a

        collector needs to be precise w.r.t. root tracking.  Even worse,

        we need to ensure that *all copies* of a pointer are updated. 

        It is not acceptable to make two copies of a pointer, update one

        of them, then use the other for a memory access.

        <br>

        <br>

        If the compiler is allowed to introduce derived pointers (i.e.

        pointer valued temporaries created by the compiler which point

        somewhere within an object, or outside it, but associated with

        it), we also need to track which *object* each *pointer* to be

        updated is associated with.  This is required to safely update

        the pointers.

        <br>

        <br>

        For the sake of argument, let's say our frontend does safepoint

        insertion.

        <br>

        <br>

        There's a couple of approaches which seem like they might work,

        let's explore each in turn:

        <br>

        - We could use patchpoints to record all the values needed for

        the GC stack map.  This mostly works, but requires that the

        patchpoint not be marked readonly or readnone (to prevent

        illegal reorderings).  That could be a usage convention.  The

        real problem is that the compiler is still free to introduce

        multiple *copies* of an SSA value over the patchpoint.  (This is

        completely legal under SSA semantics.)  When it does so, it

        creates a situation where the gc could fail to update a pointer

        which will then be dereferenced. That's a bug.  Worth stating

        explicitly, I believe the patchpoint scheme would be sufficient

        *if you do not every relocate a root*.

        <br>

        - We could use the gc.root.  gc.root defines the allocs, but

        does not define the call format, or any of the mechanisms to

        ensure proper relocation.  As such, it *by itself* is not

        viable.  Also, gc.root inherently assumes every value will have

        a stack slot. Without *heavy* reengineering, there's no way to

        have a gc pointer in a callee saved register over a call site. 

        This is an unfortunate limitation.  Any call representation

        without explicit relocation suffers from the same bug as the

        patchpoint scheme.

        <br>

        - We could combine gc.root allocas and patchpoints.  This

        essentially combines the flaws (no gc pointers in callee saved

        registers over calls, and missed copies), with no benefit.

        <br>

        <br>

        The statepoint intrinsics are basically the patchpoint option

        above, but with relocation made explicit in the IR.  While it's

        still legal for the optimizer to create a copy of the value

        feeding a statepoint, that's now okay.  By construction, there

        can be no use of the original SSA value (and thus the copy)

        after the statepoint. Instead, the explicitly relocated value is

        used.

        <br>

        <br>

        To summarize: We need (something like) statepoints for

        correctness of fully relocating collectors.

        <br>

        <br>

        (The points I'm making here are somewhat subtle.  If it would

        help to have IR examples here, ask.  I'm deferring writing them

        because it's time consuming.)

        <br>

      </blockquote>

      I need to withdraw this part of my comments.  After further

      reflection and discussion offline, I was reminded that you can

      implement full relocation semantics with gcroot.  The parts about

      patchpoints stands, but the gcroot comments are inaccurate.

      <br>

      <br>

      I need to leave early today, but I plan to respond tomorrow with a

      more complete analysis of the tradeoffs between gcroots and

      statepoints.  Sorry for the confusion.

      <br>

    </blockquote>

    Ok, let's take a second try at explaining the differences between

    statepoints and gc.roots.  I managed to get myself confused last

    time and made a couple of statements which were inaccurate.  As a

    reminder, this is not talking about late safepoint placement at

    all.  LSP can work with either mechanism.  <br>

    <br>

    From a functional correctness standpoint, gc.root and statepoint are

    equivalent.  They can both support relocating collectors, including

    those which relocate roots.  To prevent future confusion, let me

    review how each works.  <br>

    <br>

    gc.root uses explicit spill slots in the IR in the form of allocas. 

    Each alloca escapes (through the gcroot call itself); as a result,

    the compiler must assume that any readwrite call can both consume

    and update the values in question.  Additionally, the fact that all

    calls are readwrite prevents reordering of unrelated loads past the

    call.  gcroot relies on the fact that no SSA value relocated at a

    call site is used at a site reachable from the call.  Instead, a new

    SSA value (whose relation to the original is unknown by the

    compiler) is introduced by loading from the (potentially clobbered)

    alloca.  gcroot creates a single stack map table for the entire

    function.  It is the compiled code's responsibility to ensure that

    all values in the allocas are either valid live pointers or null.  <br>

    <br>

    Statepoints use most of the same techniques.  We rely on not having

    an SSA value used on both sides of a call, but we manage the

    relocation via explicit IR relocation operations, not loads and

    stores.  We require the call to be read/write to prevent reordering

    of unrelated loads.  Since the spill slots are not visible in the

    IR, we do not need the reasoning about escapes that gc.root does.  <br>

    <br>

    To explicitly state this again since I screwed this up once before,

    both statepoints and gc.roots can correctly represent relocation

    semantics in the IR.  In fact, the underlying reasoning about their

    correctness are rather similar.  <br>

    <br>

    They do differ fairly substantially in the details though.  Let's

    consider a few examples.<br>

    <br>

    <b>SSA vs Memory</b> - gcroot encodes relocations as memory

    operations (stores, clobbering calls, loads) where statepoint uses

    first class SSA values.  We believe this makes optimizations more

    straightforward.<br>

    <br>

    Consider a simple optimization for null pointer relocation.  If the

    optimizer manages to establish that one of the value being relocated

    is null, propagating this across a statepoint is straightforward. 

    (For each gc.relocate, if source is null, replaceAllUsesWith null.) 

    Implementing this same optimization for gc.root is harder since the

    store and load may have been reordered from immediately around the

    call.  This isn't an unsolvable problem by any means, but it would

    be a GVN change, not an InstCombine one.  In practice, we believe

    InstCombine style optimizations to be advantageous since they're

    simpler to write and debug.  Arguably, they're also more powerful

    given the current pipeline since they have multiple opportunities to

    trigger.<br>

    <b><br>

    </b><b>Derived Pointers</b> - gcroot can represent derived pointers,

    but only via convention.  There is no convention specified, so it's

    up to the frontend to create it's own.  Statepoints define a

    convention (explicitly in the relocation operation) which makes

    describing optimizations straight forward.<br>

    <br>

    One thing we plan to do with the statepoint representation is to

    implement an "easily derived pointer" optimization (to run near

    CodeGenPrep).  On X86, it's far cheaper to recreate a GEP base + 5

    derived pointer than relocate it.  Recognizing this case is quite

    straight forward given the statepoint representation.<br>

    <br>

    A frontend could implement a similar optimization for gcroot at IR

    generation time.  You could also implement such an optimization over

    the load/call/store representation, but the implementation would be

    much more complex (analogous to the null optimization above).  <br>

    <br>

    To be fair, gc.root may need such an optimization less.  Since

    call-safepoints are inserted early, CSE has not yet run.  As a

    result, there may be fewer "easily derived pointers" live across a

    call.  <br>

    <br>

    <b>Format</b> - Statepoints use a standard format.  gc.root supports

    custom formats.  Either could be extended to support the other

    without much difficulty.  <br>

    <br>

    The more material difference between the two is that gc.root

    generates a single stack map for the entire function while

    statepoints generate a unique stack map per call site.  Having a

    single stack map imposes a slight penalty on code compiled with

    gc.root since dead values must explicitly be removed from the alloca

    (by a write of null).  In the wrong situation (say a tight loop with

    two calls), this could be material.  <br>

    <br>

    <b>Lowering </b>- Currently, both gc.root and statepoint lower to

    stack slots.  gc.root does this at the IR level, statepoints does so

    in SelectionDAG.  <br>

    <br>

    The design of statepoints is intended to allow pushing the explicit

    relocations back through the backend.  The reason this is desirable

    is that pointers can be left in callee saved registers over call

    sites.  Without substantial re-engineering, such a thing is not

    possible for gc.root.  The importance of this from a performance

    perspective is debatable.  It is my belief that the key benefit

    would be in a) reducing frame sizes (by not requiring spill slots),

    and b) avoiding spills around calls.<br>

    <br>

    An advantage of gc.root is that the backend can remain largely

    ignorant of the gc.root mechanism.  By the point the backend

    encounters them, a gc.root is just another alloca.  One potential

    problem with the current implementation is that the escape is lost

    when lowering; the gcroot call is lowered to an entry into a side

    table and the alloca no longer escapes.  This is a source of

    possible bugs, but is also a straightforward fix.    <br>

    <br>

    As to the lowering currently implemented, it's debatable which is

    better.  Statepoints optimize constants, and unifies based on

    SDValue.  As a result, two IR level values of different types (with

    the same bit pattern) can end up sharing the same stackslot. 

    However, it suffers when trying to assign stack slots.  We currently

    use heuristics, but you can end up with ugly shuffling of values

    around on the stack across basic blocks.  (There's a number of ways

    to improve that, but it's not yet implemented.)  gc.root doesn't

    suffer from this problem since stack slots are assigned by the

    frontend.  <br>

    <br>

    Since the stack spills and reloads are visible at the IR layer,

    gcroot gets the full ability of the optimizer to remove redundant

    reloads.  Statepoints only get to leverage the pieces in the

    backend.  In theory, this could result in materially worse

    spill/reload code for statepoints.  In practice, this appears not to

    matter much provided the same value is assigned to the same slot

    across both calls, but I don't actually have much data here to say

    anything conclusively yet.<br>

    <br>

    I haven't tried to measure frame size for gc.root vs statepoints.  I

    suspect that statepoints may come out slightly ahead, but I doubt

    this is material.  There are also cases (see "easily derived

    pointers" above), where gc.root may come out ahead.  <br>

    <br>

    <b>IR Level Optimization</b> - Both gc.root and statepoints cripple

    optimization (by design!).  gcroot works better with inlining today,

    but statepoints could be easily enhanced to handle this case.  (The

    same work would benefit symbolic patchpoints.)  <br>

    <br>

    It is my belief that statepoints are easier to optimize (i.e. teach

    to LICM), but this is purely my guess with no real evidence.  Both

    suffer from the fact that calls must be marked readwrite.  Not

    having to reason about memory seems easier, but I'm open to other

    arguments here.  <br>

    <br>

    <b>Community Support</b><b> & Compatibility</b><br>

    From a practical perspective, statepoints have active users behind

    them.  We are interested in continuing to enhance and optimize them

    in the public tree.  The same support does not seem to exist for

    gcroot.  <br>

    <br>

    The implementation of statepoints is largely aligned with that of

    patchpoints.  The implementation of gcroot is completely separate

    and poorly understood by the majority of the community.<br>

    <br>

    It wouldn't be hard to write a translation pass from gcroot to

    statepoints or from statepoints to gcroot.  If folks are concerned

    about compatibility, this would be a reasonable option.  The largest

    challenge to transparently replacing one with the other is in

    generating the right output format.  <br>

    <b><br>

    </b><b>Summary</b><br>

    To summarize, gcroot and statepoints are functionally equivalent

    (modulo possible bugs.)  In their current form, the two are largely

    comparable with each having some benefits.  Long term, we believe a

    statepoint representation will allow better code generation and IR

    level optimization of code with safepoints inserted.  We believe

    statepoints to be easier to optimize both at the IR level and

    backend. <br>

    <br>

    Again, the late safepoint proposal is independent and could be done

    with either representation.  It's currently implemented on

    statepoints, but it could be extended to gcroot without too much

    work.  <br>

    <blockquote cite="mid:54403DFE.5010807@philipreames.com" type="cite">

      <blockquote type="cite">

        <br>

        <br>

        Other advantages of the statepoint approach:

        <br>

        <br>

        The gc.relocate intrinsics (part of the statepoint proposal)

        also makes it explicit in the IR what the base object of each

        pointer to be relocated is.  This isn't *required* (you could

        encode the same information in the arguments of the statepoint),

        but making it explicit is much cleaner.

        <br>

        <br>

        The explicit relocation notation has the potential to be

        extended in to the backend.  With some register allocator

        changes (not part of this patch!), we could support gc pointers

        in callee saved registers.  This is possible with the

        (incorrect) patchpoint scheme.  It is possible, but *hard*, with

        the gc.root scheme.

        <br>

        <br>

        The posted patch includes a couple of small optimizations (i.e.

        null forwarding) that help performance, but could (probably) be

        implemented on top of another scheme.  We have a number of

        planned optimizations on the statepoint mechanism.

        <br>

        <br>

        <br>

        Now, let me finally bring up late safepoint placement. The only

        real impact on this patch is that, to date, we have only focused

        on the *correctness* of a statepoint passing through the

        optimizer.  We have not attempted to teach the optimizer about

        how to leverage one or perform optimizations over one.  There's

        room for improvement here (i.e. not completely blocking

        inlining), but we prefer to approach this problem by simply

        inserting them late.   You could instead choose to insert them

        at generation time, and teach the optimizer about their

        semantics.  That *strategy choice* is independent of the

        representation choosen provided that representation is

        *correct*.

        <br>

        <br>

        Yours,

        <br>

        Philip

        <br>

        <br>

        On 10/14/2014 07:01 PM, Kevin Modzelewski wrote:

        <br>

        <blockquote type="cite">I think a change like this might be more

          compelling if you could give more detail on how it would

          actually help (I can't find the detail I'm looking for in your

          blog posts).  It seems like the value of this patch is that it

          will work with late safepoint placement, but it'd be nice to

          see some examples of cases where late safepoint placement

          gives you something that early safepoint placement (ie by the

          frontend) doesn't.  It kind of feels like either approach will

          work well with only non-gc values, and neither approach will

          be able to do much optimization when you do function calls. 

          I'm not trying to claim that that's necessarily true, but it'd

          be easier to understand your point if there was some example

          IR.

          <br>

          <br>

          <a class="moz-txt-link-freetext" href="http://reviews.llvm.org/D5683">http://reviews.llvm.org/D5683</a>

          <br>

          <br>

          <br>

        </blockquote>

        <br>

      </blockquote>

      <br>

      _______________________________________________

      <br>

      llvm-commits mailing list

      <br>

      <a class="moz-txt-link-abbreviated" href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a>

      <br>

      <a class="moz-txt-link-freetext" href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a>

      <br>

    </blockquote>

    <br>

  </body>

</html>