[cfe-dev] C++11 and enhacned devirtualization

Fri Jul 17 14:05:39 PDT 2015

On 07/16/2015 02:38 PM, Richard Smith wrote:
> On Thu, Jul 16, 2015 at 2:03 PM, John McCall <rjmccall at apple.com 
> <mailto:rjmccall at apple.com>> wrote:
>
>>     On Jul 16, 2015, at 11:46 AM, Richard Smith
>>     <richard at metafoo.co.uk <mailto:richard at metafoo.co.uk>> wrote:
>>     On Thu, Jul 16, 2015 at 11:29 AM, John McCall <rjmccall at apple.com
>>     <mailto:rjmccall at apple.com>> wrote:
>>
>>         > On Jul 15, 2015, at 10:11 PM, Hal Finkel <hfinkel at anl.gov
>>         <mailto:hfinkel at anl.gov>> wrote:
>>         >
>>         > Hi everyone,
>>         >
>>         > C++11 added features that allow for certain parts of the
>>         class hierarchy to be closed, specifically the 'final'
>>         keyword and the semantics of anonymous namespaces, and I
>>         think we take advantage of these to enhance our ability to
>>         perform devirtualization. For example, given this situation:
>>         >
>>         > struct Base {
>>         >  virtual void foo() = 0;
>>         > };
>>         >
>>         > void external();
>>         > struct Final final : Base {
>>         >  void foo() {
>>         >    external();
>>         >  }
>>         > };
>>         >
>>         > void dispatch(Base *B) {
>>         >  B->foo();
>>         > }
>>         >
>>         > void opportunity(Final *F) {
>>         >  dispatch(F);
>>         > }
>>         >
>>         > When we optimize this code, we do the expected thing and
>>         inline 'dispatch' into 'opportunity' but we don't
>>         devirtualize the call to foo(). The fact that we know what
>>         the vtable of F is at that callsite is not exploited. To a
>>         lesser extent, we can do similar things for final virtual
>>         methods, and derived classes in anonymous namespaces (because
>>         Clang could determine whether or not a class (or method)
>>         there is effectively final).
>>         >
>>         > One possibility might be to @llvm.assume to say something
>>         about what the vtable ptr of F might be/contain should it be
>>         needed later when we emit the initial IR for 'opportunity'
>>         (and then teach the optimizer to use that information), but
>>         I'm not at all sure that's the best solution. Thoughts?
>>
>>         The problem with any sort of @llvm.assume-encoded information
>>         about memory contents is that C++ does actually allow you to
>>         replace objects in memory, up to and including stuff like:
>>
>>         {
>>           MyClass c;
>>
>>           // Reuse the storage temporarily. UB to access the object
>>         through ‘c’ now.
>>           c.~MyClass();
>>           auto c2 = new (&c) MyOtherClass();
>>
>>           // The storage has to contain a ‘MyClass’ when it goes out
>>         of scope.
>>           c2->~MyOtherClass();
>>           new (&c) MyClass();
>>         }
>>
>>         The standard frontend devirtualization optimizations are
>>         permitted under a couple of different language rules,
>>         specifically that:
>>         1. If you access an object through an l-value of a type, it
>>         has to dynamically be an object of that type (potentially a
>>         subobject).
>>         2. Object replacement as above only “forwards” existing
>>         formal references under specific conditions, e.g. the dynamic
>>         type has to be the same, ‘const’ members have to have the
>>         same value, etc.  Using an unforwarded reference (like the
>>         name of the local variable ‘c’ above) doesn’t formally refer
>>         to a valid object and thus has undefined behavior.
>>
>>         You can apply those rules much more broadly than the frontend
>>         does, of course; but those are the language tools you get.
>>
>>
>>     Right. Our current plan for modelling this is:
>>
>>     1) Change the meaning of the existing !invariant.load metadata
>>     (or add another parallel metadata kind) so that it allows
>>     load-load forwarding (even if the memory is not known to be
>>     unmodified between the loads) if:
>
>     invariant.load currently allows the load to be reordered pretty
>     aggressively, so I think you need a new metadata.
>
>
> Our thoughts were:
> 1) The existing !invariant.load is redundant because it's exactly 
> equivalent to a call to @llvm.invariant.start and a load.
> 2) The new semantics are a more strict form of the old semantics, so 
> no special action is required to upgrade old IR.
> ... so changing the meaning of the existing metadata seemed preferable 
> to adding a new, similar-but-not-quite-identical, form of the 
> metadata. But either way seems fine.
I'm going to argue pretty strongly in favour of the new form of 
metadata.  We've spent a lot of time getting !invariant.load working 
well for use cases like the "length" field in a Java array and I'd 
really hate to give that up.

(One way of framing this is that the current !invariant.load gives a 
guarantee that there can't be a @llvm.invariant.end call anywhere in the 
program and that any @llvm.invariant.start occurs outside the visible 
scope of the compilation unit (Module, LTO, what have you) and must have 
executed before any code contained in said module which can describe the 
memory location can execute.  FYI, that last bit of strange wording is 
to allow initialization inside a malloc like function which returns a 
noalias pointer.)

I'm definitely open to working together on a revised version of a more 
general invariant mechanism.  In particular, we don't have a good way of 
modelling Java's "final" fields* in the IR today since the 
initialization logic may be visible to the compiler.  Coming up with 
something which supports both use cases would be really useful.

* Let's ignore the fact that few Java final fields are actually final.  
That part of the problem is decidedly out of scope for LLVM.  :)

>>       a) both loads have !invariant.load metadata with the same
>>     operand, and
>>       b) the pointer operands are the same SSA value (being
>>     must-alias is not sufficient)
>>     2) Add a new intrinsic "i8* @llvm.invariant.barrier(i8*)" that
>>     produces a new pointer that is different for the purpose of
>>     !invariant.load. (Some other optimizations are permitted to look
>>     through the barrier.)
>>
>>     In particular, "new (&c) MyOtherClass()" would be emitted as
>>     something like this:
>>
>>       %1 = call @operator new(size, %c)
>>       %2 = call @llvm.invariant.barrier(%1)
>>       call @MyOtherClass::MyOtherClass(%2)
>>       %vptr = load %2
>>       %known.vptr = icmp eq %vptr, @MyOtherClass::vptr,
>>     !invariant.load !MyBaseClass.vptr
>>       call @llvm.assume(%known.vptr)
>
>     Hmm.  And all v-table loads have this invariant metadata?
>
>
> That's the idea (but it's not essential that they do, we just lose 
> optimization power if not).
>
>     I am concerned about mixing files with and without barriers.
>
>
> I think we'd need to always generate the barrier (even at -O0, to 
> support LTO between non-optimized and optimized code). I don't think 
> we can support LTO between IR using the metadata and old IR that 
> didn't contain the relevant barriers. How important is that use case? 
> We were probably going to put this behind a -fstrict-something flag, 
> at least to start off with, so we can create a transition period where 
> we generate the barrier by default but don't generate the metadata if 
> necessary.
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150717/a42fc9b1/attachment.html>