<html>

  <head>


    <meta http-equiv="content-type" content="text/html; charset=utf-8">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    Apologies for any duplicates, I didn't see this go through to the

    mailing list.<br>

    <div class="moz-forward-container"><br>

      -------- Forwarded Message --------

      <table class="moz-email-headers-table" border="0" cellpadding="0"

        cellspacing="0">

        <tbody>

          <tr>

            <th align="RIGHT" nowrap="nowrap" valign="BASELINE">Subject:

            </th>

            <td>Re: [PATCH] Move ownership of GCStrategy objects to

              LLVMContext</td>

          </tr>

          <tr>

            <th align="RIGHT" nowrap="nowrap" valign="BASELINE">Date: </th>

            <td>Fri, 16 Jan 2015 14:56:13 -0800</td>

          </tr>

          <tr>

            <th align="RIGHT" nowrap="nowrap" valign="BASELINE">From: </th>

            <td>Philip Reames <a class="moz-txt-link-rfc2396E" href="mailto:listmail@philipreames.com"><listmail@philipreames.com></a></td>

          </tr>

          <tr>

            <th align="RIGHT" nowrap="nowrap" valign="BASELINE">To: </th>

            <td>Chandler Carruth <a class="moz-txt-link-rfc2396E" href="mailto:chandlerc@google.com"><chandlerc@google.com></a></td>

          </tr>

          <tr>

            <th align="RIGHT" nowrap="nowrap" valign="BASELINE">CC: </th>

            <td><a class="moz-txt-link-abbreviated" href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a> <a class="moz-txt-link-rfc2396E" href="mailto:LLVMdev@cs.uiuc.edu"><LLVMdev@cs.uiuc.edu></a></td>

          </tr>

        </tbody>

      </table>

      <br>

      <br>

      <meta content="text/html; charset=utf-8" http-equiv="Content-Type">

      + llvmdev in the hopes that a broader audience sees this and

      chimes in.<br>

      <br>

      At this point, it's clear Chandler and I disagree on the right

      approach going forward.  What do others think?<br>

      <br>

      There are really two independent questions here.  I've tried to

      separate them and provide appropriate context for the discussion

      so far.  <br>

      <br>

      Chandler, if I managed to misrepresent your positions at all,

      sorry!, and please correct me.<br>

      <div class="moz-cite-prefix"><br>

        On 01/15/2015 05:41 PM, Chandler Carruth wrote:<br>

        <br>

        <b>ISSUE #1</b> - Should GC specific properties be checked in

        the IR Verifier?<br>

      </div>

      <blockquote

cite="mid:CAGCO0KghD7iu-rQZiNAxBwkSJUS5sF0ia_92w9bfeGKEUJHZ9w@mail.gmail.com"

        type="cite">

        <div dir="ltr">

          <div class="gmail_extra"><br>

            <div class="gmail_quote">On Thu, Jan 15, 2015 at 4:58 PM,

              Philip Reames <span dir="ltr"><<a

                  moz-do-not-send="true"

                  href="mailto:listmail@philipreames.com"

                  target="_blank">listmail@philipreames.com</a>></span>

              wrote:<br>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex"><span

                  class="">

                  <blockquote type="cite">

                    <div dir="ltr">

                      <div class="gmail_extra">

                        <div class="gmail_quote">

                          <div>So, I'm not sold on this being in the

                            verifier at all. But that seems quite likely

                            a separate discussion. </div>

                        </div>

                      </div>

                    </div>

                  </blockquote>

                </span> Why?  Feel free to forward this to llvm-dev

                depending on the answer and it's length.<br>

                <br>

                Being able to state things like: "statepoints in

                functions with GC X only relocate values in an address

                space X expects" seems to be entirely reasonable and

                desirable to me.</blockquote>

              <div><br>

              </div>

              <div>Absolutely, but I don't think it belongs in the IR

                verifier. The verifier needs to be extremely light

                weight and I don't think it makes sense to check these

                kinds of constraints there. The contract of the verifier

                is that it validates those properties which you should

                feel free to assert() on when writing other passes. I

                don't think we should add such properties lightly.</div>

              <div><br>

              </div>

              <div>But of course it makes sense to check these

                properties. One way is through the verifyAnalysis

                functionality in the old pass manager (and there will be

                some largely-automated analog in the new one). This

                allows an analysis to check that its analysis specific

                invariants continue to hold. Another strategy is the

                (somewhat poorly named) Lint pass. The idea is that for

                some invariants it may make more sense to add a custom

                pass which does nothing other than check and diagnose

                violations of an invariant.</div>

              <div><br>

              </div>

              <div>If checking this invariant in other ways were really

                hard to do, then maybe it would make sense in the

                verifier. But I'm not seeing anything like that at this

                point.</div>

            </div>

          </div>

        </div>

      </blockquote>

      I believe it does make sense to check *some* GC related properties

      in the Verifier.  My position is that the GC attribute essentially

      imposes an additional set of restrictions on what it means to have

      valid IR.  Specifically, we should be able to check cheap, local

      facts about the IR such as:<br>

      - A GC which uses gc.statepoints does not also use gc.roots and

      vice versa.<br>

      - A gc.statepoint/gc.relocate sequence only relocates pointers

      with the "right" address space.  (This applies only to GCs which

      use address spaces to distinguish pointers.)<br>

      - If a derived pointer is relocated, it's base must also be

      available in the gc.statepoint.  (This applies only to relocating

      GCs.)  Otherwise, the base pointer (must?, may?) be (null?,

      undef?).<br>

      - There should be no gc.relocates in IR used by a non-relocating

      GC.<br>

      - A gc.statepoint whose relocations haven't been made explicit yet

      - indicated by a possible future flag on gc.statepoint - isn't

      legal for a GC which doesn't use late rewriting.  (See <a

        moz-do-not-send="true" class="moz-txt-link-freetext"

        href="http://reviews.llvm.org/D6975">http://reviews.llvm.org/D6975</a>

      for context.)<br>

      <br>

      (The previous list is meant to be illustrated examples.  Only some

      of them actually exist so far, and I make no commitment to ever

      implement the others since they involve infrastructure and design

      work which hasn't happened yet.)<br>

      <br>

      To use your previously stated criteria, every one of the above is

      (or at least should/would be) an assertion in the lowering code or

      in some pass.  <br>

      <br>

      I do *not* believe that the Verifier is the right place to check

      more complicated GC properties.  For example, trying to check that

      a gc.statepoint relocates all pointer values which might be live

      over the gc.statepoint is an expensive validation check which does

      *not* belong in the Verifier.  I have no opinion on what mechanism

      might be best to represent such checks (yet.)  Both the

      verifyAnalysis, and Lint mechanisms might be reasonable choices.<br>

      <br>

      There's also a gray area in between that I'm really not sure

      about.  For example, the late safepoint placement/rewriting stuff

      relies on there not being new addrspacecast instructions inserted

      by the optimizer.  (And none in the original IR.)  It's unclear

      whether having that as a GC specific check in the Verifier is the

      right approach or not.  I think we can address these case by case

      as they come up.<br>

      <br>

      <br>

      <br>

      <br>

      <br>

      <br>

      <br>

      <b>ISSUE #2 </b>- What should the access model for GCStrategy

      be?  Is it part of the IR?  Or should it be treated like an

      Analysis result?<br>

      <br>

      <blockquote

cite="mid:CAGCO0KghD7iu-rQZiNAxBwkSJUS5sF0ia_92w9bfeGKEUJHZ9w@mail.gmail.com"

        type="cite">

        <div dir="ltr">

          <div class="gmail_extra">

            <div class="gmail_quote">

              <div> <br>

                <span class=""></span></div>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex"> "If

                so, have you looked at possibly making this a function

                analysis that has a cache of these which it populates

                exactly the same way you populate the context? That

                should still allow you to use essentially the same

                lookup path here. <br>

                <br>

                The reason I suggest this approach is that it helps

                simplify the core IR a bit by letting the IR just deal

                in an opaque attribute and relying on an analysis to

                reason about it. I think the code would be almost

                identical to what you have here just shuffled around a

                bit. I'd be happy with that either as the first version

                or with that happening as the immediate follow-up

                refactoring after this patch. " -- previous comment by

                Chandler<br>

                <br>

                "But I think it would be better to not have this in the

                Function API, and instead be an analysis over the

                function." -- previous comment by Chandler<br>

                <br>

                Response by Philip follows:<br>

                <br>

                The analogous thing would be a GCStrategyCache and a

                wrapping GCStrategyCacheWrapper pass.  I have no

                objection to doing this if that's what the community

                would prefer, but I believe this is undesirable.  My

                reasoning is:<br>

                1) There is no possibility of invalidation.  A

                GCStrategy is immutable and can only be changed at

                compile time (of LLVM itself).</blockquote>

              <div><br>

              </div>

              <div>There are actually quite a few things that have no

                invalidation criteria but I think make sense being

                exposed via the pass machinery. TargetLibraryInfo is an

                example I've just been touching.</div>

              <div> </div>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex"> 2) If

                we have multiple instances of GCStrategyCache, we have

                multiple copies of each GCStrategy floating around with

                different lifetimes.  This seems potentially confusing

                at best.  (Particularly for any out of tree users... I'd

                really like the lifetime model to be simple.)<br>

              </blockquote>

              <div><br>

              </div>

              <div>I don't think this is necessary. You could easily

                have these be strict lazily created singletons.</div>

              <div> </div>

              <blockquote class="gmail_quote" style="margin:0 0 0

                .8ex;border-left:1px #ccc solid;padding-left:1ex"> 3)

                We're adding an IMHO utterly unnecessary abstraction

                which only increases the complexity of an already fairly

                complex mechanism.  I see no benefit here.  </blockquote>

            </div>

            <br>

            I think there is a significant benefit to separating the

            concerns of GC strategy from the concerns of the core IR. I

            also clearly don't think the complexity is going to be that

            large.</div>

        </div>

      </blockquote>

      Chandler - Thinking about this overnight, I've realized I don't

      actually have much of an objection here.  I still think the

      direction you're pushing is the wrong one, but it doesn't actually

      matter that much to my overall goals.  Having said that, let me

      try to convince you why the current (as of change 226311 which

      started this thread) is better.  <br>

      <br>

      (For the record, I will respect whatever the community decides and

      migrate the code in that direction.  I submitted the change in

      question while we debate - with Chandler's permission - while we

      debate, but I will update as needed.)<br>

      <br>

      My position comes down to two key points: the GCStrategy

      associated with a given Function is fundamental part of the IR

      definition, and the complexity we're talking about for the core IR

      is a single class with a set of boolean flags* - i.e. pretty

      minor.<br>

      <br>

      * Let's ignore the performCustomLowering mechanism.  I'd like to

      factor that out, or get rid of it, anyway.    <br>

      <br>

      The properties of the garbage collector that a given piece of code

      is being compiled for is a fundamental part of the IR definition. 

      There is no "safe default" to use.  If information about the GC is

      not available, the code will be likely be miscompiled.  As a

      result, a missing GCStrategy (i.e. one whose string key we don't

      recognize) is, and should be, a fatal error.  (Chandler's proposal

      wouldn't change this last point.)<br>

      <br>

      The analogy I'd make would be to data layout information.  The

      DataLayout effects the semantics of the IR, and is thus considered

      "part of" the Module, rather than something "computed from" a

      Module.  Similarly, if you're given a gc.statepoint for a

      relocating GC, it's not legal to treat it like one for a

      non-relocating GC.  In particular, there are legal optimizations

      for the later which are illegal for the former. The targeted GC is

      semantically "part of" the IR.  <br>

      <br>

      Of course, you could also analogize this to information about the

      sub-target a particular module is targeting.  We clearly don't

      treat sub-targets as "part of" the Module.  So honestly, it could

      go either way.  <br>

      <br>

      I had previously made an argument based on object lifetime.  After

      further reflection, having GCStrategy be a immutable object is

      fine by me.  It does imply we need to get rid of the

      performCustomLowering mechanism, but that's something I'd wanted

      to do anyway.  Note that the in tree ShadowStackGC subclass of

      GCStrategy is not an immutable object today.  There may also be

      out of tree subclasses with assumptions about mutability and

      lifetime baked in.  This is likely the case - if there are

      actually any users out there other than the ones who've already

      spoken up - since the ShadowStackGC was the logic starting place

      for a custom GC strategy implementation.  <br>

      <br>

      Philip<br>

      <br>

      <br>

      <br>

    </div>

    <br>

  </body>

</html>