<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Fri, Jul 17, 2015 at 2:05 PM, Philip Reames <span dir="ltr"><<a href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
  
    
  
  <div bgcolor="#FFFFFF" text="#000000"><div><div class="h5">
    <br>
    <br>
    <div>On 07/16/2015 02:38 PM, Richard Smith
      wrote:<br>
    </div>
    <blockquote type="cite">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">On Thu, Jul 16, 2015 at 2:03 PM, John
            McCall <span dir="ltr"><<a href="mailto:rjmccall@apple.com" target="_blank">rjmccall@apple.com</a>></span>
            wrote:<br>
            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div style="word-wrap:break-word">
                <div>
                  <div>
                    <div>
                      <blockquote type="cite">
                        <div>On Jul 16, 2015, at 11:46 AM, Richard Smith
                          <<a href="mailto:richard@metafoo.co.uk" target="_blank">richard@metafoo.co.uk</a>>
                          wrote:</div>
                        <div>
                          <div dir="ltr">
                            <div class="gmail_extra">
                              <div class="gmail_quote">On Thu, Jul 16,
                                2015 at 11:29 AM, John McCall <span dir="ltr"><<a href="mailto:rjmccall@apple.com" target="_blank">rjmccall@apple.com</a>></span>
                                wrote:<br>
                                <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>> On
                                    Jul 15, 2015, at 10:11 PM, Hal
                                    Finkel <<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>>
                                    wrote:<br>
                                    ><br>
                                    > Hi everyone,<br>
                                    ><br>
                                    > C++11 added features that allow
                                    for certain parts of the class
                                    hierarchy to be closed, specifically
                                    the 'final' keyword and the
                                    semantics of anonymous namespaces,
                                    and I think we take advantage of
                                    these to enhance our ability to
                                    perform devirtualization. For
                                    example, given this situation:<br>
                                    ><br>
                                    > struct Base {<br>
                                    >  virtual void foo() = 0;<br>
                                    > };<br>
                                    ><br>
                                    > void external();<br>
                                    > struct Final final : Base {<br>
                                    >  void foo() {<br>
                                    >    external();<br>
                                    >  }<br>
                                    > };<br>
                                    ><br>
                                    > void dispatch(Base *B) {<br>
                                    >  B->foo();<br>
                                    > }<br>
                                    ><br>
                                    > void opportunity(Final *F) {<br>
                                    >  dispatch(F);<br>
                                    > }<br>
                                    ><br>
                                    > When we optimize this code, we
                                    do the expected thing and inline
                                    'dispatch' into 'opportunity' but we
                                    don't devirtualize the call to
                                    foo(). The fact that we know what
                                    the vtable of F is at that callsite
                                    is not exploited. To a lesser
                                    extent, we can do similar things for
                                    final virtual methods, and derived
                                    classes in anonymous namespaces
                                    (because Clang could determine
                                    whether or not a class (or method)
                                    there is effectively final).<br>
                                    ><br>
                                    > One possibility might be to
                                    @llvm.assume to say something about
                                    what the vtable ptr of F might
                                    be/contain should it be needed later
                                    when we emit the initial IR for
                                    'opportunity' (and then teach the
                                    optimizer to use that information),
                                    but I'm not at all sure that's the
                                    best solution. Thoughts?<br>
                                    <br>
                                  </span>The problem with any sort of
                                  @llvm.assume-encoded information about
                                  memory contents is that C++ does
                                  actually allow you to replace objects
                                  in memory, up to and including stuff
                                  like:<br>
                                  <br>
                                  {<br>
                                    MyClass c;<br>
                                  <br>
                                    // Reuse the storage temporarily. 
                                  UB to access the object through ‘c’
                                  now.<br>
                                    c.~MyClass();<br>
                                    auto c2 = new (&c)
                                  MyOtherClass();<br>
                                  <br>
                                    // The storage has to contain a
                                  ‘MyClass’ when it goes out of scope.<br>
                                    c2->~MyOtherClass();<br>
                                    new (&c) MyClass();<br>
                                  }<br>
                                  <br>
                                  The standard frontend devirtualization
                                  optimizations are permitted under a
                                  couple of different language rules,
                                  specifically that:<br>
                                  1. If you access an object through an
                                  l-value of a type, it has to
                                  dynamically be an object of that type
                                  (potentially a subobject).<br>
                                  2. Object replacement as above only
                                  “forwards” existing formal references
                                  under specific conditions, e.g. the
                                  dynamic type has to be the same,
                                  ‘const’ members have to have the same
                                  value, etc.  Using an unforwarded
                                  reference (like the name of the local
                                  variable ‘c’ above) doesn’t formally
                                  refer to a valid object and thus has
                                  undefined behavior.<br>
                                  <br>
                                  You can apply those rules much more
                                  broadly than the frontend does, of
                                  course; but those are the language
                                  tools you get.</blockquote>
                                <div><br>
                                </div>
                                <div>Right. Our current plan for
                                  modelling this is:</div>
                                <div><br>
                                </div>
                                <div>1) Change the meaning of the
                                  existing !invariant.load metadata (or
                                  add another parallel metadata kind) so
                                  that it allows load-load forwarding
                                  (even if the memory is not known to be
                                  unmodified between the loads) if:</div>
                              </div>
                            </div>
                          </div>
                        </div>
                      </blockquote>
                      <div><br>
                      </div>
                    </div>
                  </div>
                  invariant.load currently allows the load to be
                  reordered pretty aggressively, so I think you need a
                  new metadata.</div>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>Our thoughts were:</div>
            <div>1) The existing !invariant.load is redundant because
              it's exactly equivalent to a call to @llvm.invariant.start
              and a load.</div>
            <div>2) The new semantics are a more strict form of the old
              semantics, so no special action is required to upgrade old
              IR.</div>
            <div>... so changing the meaning of the existing metadata
              seemed preferable to adding a new,
              similar-but-not-quite-identical, form of the metadata. But
              either way seems fine.</div>
          </div>
        </div>
      </div>
    </blockquote></div></div>
    I'm going to argue pretty strongly in favour of the new form of
    metadata.  We've spent a lot of time getting !invariant.load working
    well for use cases like the "length" field in a Java array and I'd
    really hate to give that up.<br>
    <br>
    (One way of framing this is that the current !invariant.load gives a
    guarantee that there can't be a @llvm.invariant.end call anywhere in
    the program and that any @llvm.invariant.start occurs outside the
    visible scope of the compilation unit (Module, LTO, what have you)
    and must have executed before any code contained in said module
    which can describe the memory location can execute.  FYI, that last
    bit of strange wording is to allow initialization inside a malloc
    like function which returns a noalias pointer.)<br></div></blockquote><div><br></div><div>I had overlooked that !invariant.load also applies for loads /before/ the invariant load. I agree that this is different both from what we're proposing and from what you can achieve with @llvm.invariant.start. I would expect that you can use our metadata for the length in a Java array -- it seems like it'd be straightforward for you to arrange that all loads of the array field have the metadata (and that you use the same operand on all of them) -- but there's no real motivation behind reusing the existing metadata besides simplicity and cleanliness.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">
    I'm definitely open to working together on a revised version of a
    more general invariant mechanism.  In particular, we don't have a
    good way of modelling Java's "final" fields* in the IR today since
    the initialization logic may be visible to the compiler.  Coming up
    with something which supports both use cases would be really
    useful.<br></div></blockquote><div><br></div><div>This seems like something that our proposed mechanism may be able to support; we intend to use it for const and reference data members in C++, though the semantics of those are not quite the same.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">
    * Let's ignore the fact that few Java final fields are actually
    final.  That part of the problem is decidedly out of scope for
    LLVM.  :)<br>
    <br>
    <blockquote type="cite"><span class="">
      <div dir="ltr">
        <div class="gmail_extra">
          <div class="gmail_quote">
            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div style="word-wrap:break-word">
                <div><span>
                    <blockquote type="cite">
                      <div>
                        <div dir="ltr">
                          <div class="gmail_extra">
                            <div class="gmail_quote">
                              <div>  a) both loads have !invariant.load
                                metadata with the same operand, and</div>
                              <div>  b) the pointer operands are the
                                same SSA value (being must-alias is not
                                sufficient)</div>
                              <div>2) Add a new intrinsic "i8*
                                @llvm.invariant.barrier(i8*)" that
                                produces a new pointer that is different
                                for the purpose of !invariant.load.
                                (Some other optimizations are permitted
                                to look through the barrier.)</div>
                            </div>
                          </div>
                        </div>
                      </div>
                    </blockquote>
                    <blockquote type="cite">
                      <div>
                        <div dir="ltr">
                          <div class="gmail_extra">
                            <div class="gmail_quote">
                              <div><br>
                              </div>
                              <div>In particular, "new (&c)
                                MyOtherClass()" would be emitted as
                                something like this:</div>
                              <div><br>
                              </div>
                              <div>  %1 = call @operator new(size, %c)</div>
                              <div>  %2 = call
                                @llvm.invariant.barrier(%1)</div>
                              <div>  call
                                @MyOtherClass::MyOtherClass(%2)</div>
                              <div>  %vptr = load %2</div>
                              <div>  %known.vptr = icmp eq %vptr,
                                @MyOtherClass::vptr, !invariant.load
                                !MyBaseClass.vptr</div>
                              <div>  call @llvm.assume(%known.vptr)</div>
                            </div>
                          </div>
                        </div>
                      </div>
                    </blockquote>
                    <div><br>
                    </div>
                  </span>Hmm.  And all v-table loads have this invariant
                  metadata?</div>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>That's the idea (but it's not essential that they do,
              we just lose optimization power if not).</div>
            <div> </div>
            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
              <div style="word-wrap:break-word">
                <div>I am concerned about mixing files with and without
                  barriers.</div>
              </div>
            </blockquote>
            <div><br>
            </div>
            <div>I think we'd need to always generate the barrier (even
              at -O0, to support LTO between non-optimized and optimized
              code). I don't think we can support LTO between IR using
              the metadata and old IR that didn't contain the relevant
              barriers. How important is that use case? We were probably
              going to put this behind a -fstrict-something flag, at
              least to start off with, so we can create a transition
              period where we generate the barrier by default but don't
              generate the metadata if necessary.</div>
          </div>
        </div>
      </div>
      <br>
      <fieldset></fieldset>
      <br>
      </span><span class=""><pre>_______________________________________________
cfe-dev mailing list
<a href="mailto:cfe-dev@cs.uiuc.edu" target="_blank">cfe-dev@cs.uiuc.edu</a>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a>
</pre>
    </span></blockquote>
    <br>
  </div>

</blockquote></div><br></div></div>