<html>

  <head>

    <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    + llvmdev in the hopes that a broader audience sees this and chimes

    in.<br>

    <br>

    At this point, it's clear Chandler and I disagree on the right

    approach going forward.  What do others think?<br>

    <br>

    There are really two independent questions here.  I've tried to

    separate them and provide appropriate context for the discussion so

    far.  <br>

    <br>

    Chandler, if I managed to misrepresent your positions at all,

    sorry!, and please correct me.<br>

    <div class="moz-cite-prefix"><br>

      On 01/15/2015 05:41 PM, Chandler Carruth wrote:<br>

      <br>

      <b>ISSUE #1</b> - Should GC specific properties be checked in the

      IR Verifier?<br>

    </div>

    <blockquote

cite="mid:CAGCO0KghD7iu-rQZiNAxBwkSJUS5sF0ia_92w9bfeGKEUJHZ9w@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra"><br>

          <div class="gmail_quote">On Thu, Jan 15, 2015 at 4:58 PM,

            Philip Reames <span dir="ltr"><<a moz-do-not-send="true"

                href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>></span>

            wrote:<br>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex"><span

                class="">

                <blockquote type="cite">

                  <div dir="ltr">

                    <div class="gmail_extra">

                      <div class="gmail_quote">

                        <div>So, I'm not sold on this being in the

                          verifier at all. But that seems quite likely a

                          separate discussion. </div>

                      </div>

                    </div>

                  </div>

                </blockquote>

              </span> Why?  Feel free to forward this to llvm-dev

              depending on the answer and it's length.<br>

              <br>

              Being able to state things like: "statepoints in functions

              with GC X only relocate values in an address space X

              expects" seems to be entirely reasonable and desirable to

              me.</blockquote>

            <div><br>

            </div>

            <div>Absolutely, but I don't think it belongs in the IR

              verifier. The verifier needs to be extremely light weight

              and I don't think it makes sense to check these kinds of

              constraints there. The contract of the verifier is that it

              validates those properties which you should feel free to

              assert() on when writing other passes. I don't think we

              should add such properties lightly.</div>

            <div><br>

            </div>

            <div>But of course it makes sense to check these properties.

              One way is through the verifyAnalysis functionality in the

              old pass manager (and there will be some largely-automated

              analog in the new one). This allows an analysis to check

              that its analysis specific invariants continue to hold.

              Another strategy is the (somewhat poorly named) Lint pass.

              The idea is that for some invariants it may make more

              sense to add a custom pass which does nothing other than

              check and diagnose violations of an invariant.</div>

            <div><br>

            </div>

            <div>If checking this invariant in other ways were really

              hard to do, then maybe it would make sense in the

              verifier. But I'm not seeing anything like that at this

              point.</div>

          </div>

        </div>

      </div>

    </blockquote>

    I believe it does make sense to check *some* GC related properties

    in the Verifier.  My position is that the GC attribute essentially

    imposes an additional set of restrictions on what it means to have

    valid IR.  Specifically, we should be able to check cheap, local

    facts about the IR such as:<br>

    - A GC which uses gc.statepoints does not also use gc.roots and vice

    versa.<br>

    - A gc.statepoint/gc.relocate sequence only relocates pointers with

    the "right" address space.  (This applies only to GCs which use

    address spaces to distinguish pointers.)<br>

    - If a derived pointer is relocated, it's base must also be

    available in the gc.statepoint.  (This applies only to relocating

    GCs.)  Otherwise, the base pointer (must?, may?) be (null?, undef?).<br>

    - There should be no gc.relocates in IR used by a non-relocating GC.<br>

    - A gc.statepoint whose relocations haven't been made explicit yet -

    indicated by a possible future flag on gc.statepoint - isn't legal

    for a GC which doesn't use late rewriting.  (See

    <a class="moz-txt-link-freetext" href="http://reviews.llvm.org/D6975">http://reviews.llvm.org/D6975</a> for context.)<br>

    <br>

    (The previous list is meant to be illustrated examples.  Only some

    of them actually exist so far, and I make no commitment to ever

    implement the others since they involve infrastructure and design

    work which hasn't happened yet.)<br>

    <br>

    To use your previously stated criteria, every one of the above is

    (or at least should/would be) an assertion in the lowering code or

    in some pass.  <br>

    <br>

    I do *not* believe that the Verifier is the right place to check

    more complicated GC properties.  For example, trying to check that a

    gc.statepoint relocates all pointer values which might be live over

    the gc.statepoint is an expensive validation check which does *not*

    belong in the Verifier.  I have no opinion on what mechanism might

    be best to represent such checks (yet.)  Both the verifyAnalysis,

    and Lint mechanisms might be reasonable choices.<br>

    <br>

    There's also a gray area in between that I'm really not sure about. 

    For example, the late safepoint placement/rewriting stuff relies on

    there not being new addrspacecast instructions inserted by the

    optimizer.  (And none in the original IR.)  It's unclear whether

    having that as a GC specific check in the Verifier is the right

    approach or not.  I think we can address these case by case as they

    come up.<br>

    <br>

    <br>

    <br>

    <br>

    <br>

    <br>

    <br>

    <b>ISSUE #2 </b>- What should the access model for GCStrategy be? 

    Is it part of the IR?  Or should it be treated like an Analysis

    result?<br>

    <br>

    <blockquote

cite="mid:CAGCO0KghD7iu-rQZiNAxBwkSJUS5sF0ia_92w9bfeGKEUJHZ9w@mail.gmail.com"

      type="cite">

      <div dir="ltr">

        <div class="gmail_extra">

          <div class="gmail_quote">

            <div> <br>

              <span class=""></span></div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex"> "If so,

              have you looked at possibly making this a function

              analysis that has a cache of these which it populates

              exactly the same way you populate the context? That should

              still allow you to use essentially the same lookup path

              here.

              <br>

              <br>

              The reason I suggest this approach is that it helps

              simplify the core IR a bit by letting the IR just deal in

              an opaque attribute and relying on an analysis to reason

              about it. I think the code would be almost identical to

              what you have here just shuffled around a bit. I'd be

              happy with that either as the first version or with that

              happening as the immediate follow-up refactoring after

              this patch.

              " -- previous comment by Chandler<br>

              <br>

              "But I think it would be better to not have this in the

              Function API, and instead be an analysis over the

              function." -- previous comment by Chandler<br>

              <br>

              Response by Philip follows:<br>

              <br>

              The analogous thing would be a GCStrategyCache and a

              wrapping GCStrategyCacheWrapper pass.  I have no objection

              to doing this if that's what the community would prefer,

              but I believe this is undesirable.  My reasoning is:<br>

              1) There is no possibility of invalidation.  A GCStrategy

              is immutable and can only be changed at compile time (of

              LLVM itself).</blockquote>

            <div><br>

            </div>

            <div>There are actually quite a few things that have no

              invalidation criteria but I think make sense being exposed

              via the pass machinery. TargetLibraryInfo is an example

              I've just been touching.</div>

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex"> 2) If

              we have multiple instances of GCStrategyCache, we have

              multiple copies of each GCStrategy floating around with

              different lifetimes.  This seems potentially confusing at

              best.  (Particularly for any out of tree users... I'd

              really like the lifetime model to be simple.)<br>

            </blockquote>

            <div><br>

            </div>

            <div>I don't think this is necessary. You could easily have

              these be strict lazily created singletons.</div>

            <div> </div>

            <blockquote class="gmail_quote" style="margin:0 0 0

              .8ex;border-left:1px #ccc solid;padding-left:1ex"> 3)

              We're adding an IMHO utterly unnecessary abstraction which

              only increases the complexity of an already fairly complex

              mechanism.  I see no benefit here.  </blockquote>

          </div>

          <br>

          I think there is a significant benefit to separating the

          concerns of GC strategy from the concerns of the core IR. I

          also clearly don't think the complexity is going to be that

          large.</div>

      </div>

    </blockquote>

    Chandler - Thinking about this overnight, I've realized I don't

    actually have much of an objection here.  I still think the

    direction you're pushing is the wrong one, but it doesn't actually

    matter that much to my overall goals.  Having said that, let me try

    to convince you why the current (as of change 226311 which started

    this thread) is better.  <br>

    <br>

    (For the record, I will respect whatever the community decides and

    migrate the code in that direction.  I submitted the change in

    question while we debate - with Chandler's permission - while we

    debate, but I will update as needed.)<br>

    <br>

    My position comes down to two key points: the GCStrategy associated

    with a given Function is fundamental part of the IR definition, and

    the complexity we're talking about for the core IR is a single class

    with a set of boolean flags* - i.e. pretty minor.<br>

    <br>

    * Let's ignore the performCustomLowering mechanism.  I'd like to

    factor that out, or get rid of it, anyway.    <br>

    <br>

    The properties of the garbage collector that a given piece of code

    is being compiled for is a fundamental part of the IR definition. 

    There is no "safe default" to use.  If information about the GC is

    not available, the code will be likely be miscompiled.  As a result,

    a missing GCStrategy (i.e. one whose string key we don't recognize)

    is, and should be, a fatal error.  (Chandler's proposal wouldn't

    change this last point.)<br>

    <br>

    The analogy I'd make would be to data layout information.  The

    DataLayout effects the semantics of the IR, and is thus considered

    "part of" the Module, rather than something "computed from" a

    Module.  Similarly, if you're given a gc.statepoint for a relocating

    GC, it's not legal to treat it like one for a non-relocating GC.  In

    particular, there are legal optimizations for the later which are

    illegal for the former. The targeted GC is semantically "part of"

    the IR.  <br>

    <br>

    Of course, you could also analogize this to information about the

    sub-target a particular module is targeting.  We clearly don't treat

    sub-targets as "part of" the Module.  So honestly, it could go

    either way.  <br>

    <br>

    I had previously made an argument based on object lifetime.  After

    further reflection, having GCStrategy be a immutable object is fine

    by me.  It does imply we need to get rid of the

    performCustomLowering mechanism, but that's something I'd wanted to

    do anyway.  Note that the in tree ShadowStackGC subclass of

    GCStrategy is not an immutable object today.  There may also be out

    of tree subclasses with assumptions about mutability and lifetime

    baked in.  This is likely the case - if there are actually any users

    out there other than the ones who've already spoken up - since the

    ShadowStackGC was the logic starting place for a custom GC strategy

    implementation.  <br>

    <br>

    Philip<br>

    <br>

    <br>

  </body>

</html>