[cfe-dev] C++11 and enhacned devirtualization

Fri Jul 17 17:30:24 PDT 2015

On 07/17/2015 04:56 PM, Richard Smith wrote:
> On Fri, Jul 17, 2015 at 3:23 PM, John McCall <rjmccall at apple.com 
> <mailto:rjmccall at apple.com>> wrote:
>
>>     On Jul 17, 2015, at 2:49 PM, Richard Smith <richard at metafoo.co.uk
>>     <mailto:richard at metafoo.co.uk>> wrote:
>>     On Fri, Jul 17, 2015 at 2:05 PM, Philip Reames
>>     <listmail at philipreames.com <mailto:listmail at philipreames.com>> wrote:
>>
>>
>>
>>         On 07/16/2015 02:38 PM, Richard Smith wrote:
>>>         On Thu, Jul 16, 2015 at 2:03 PM, John McCall
>>>         <rjmccall at apple.com <mailto:rjmccall at apple.com>> wrote:
>>>
>>>>             On Jul 16, 2015, at 11:46 AM, Richard Smith
>>>>             <richard at metafoo.co.uk <mailto:richard at metafoo.co.uk>>
>>>>             wrote:
>>>>             On Thu, Jul 16, 2015 at 11:29 AM, John McCall
>>>>             <rjmccall at apple.com <mailto:rjmccall at apple.com>> wrote:
>>>>
>>>>                 > On Jul 15, 2015, at 10:11 PM, Hal Finkel
>>>>                 <hfinkel at anl.gov <mailto:hfinkel at anl.gov>> wrote:
>>>>                 >
>>>>                 > Hi everyone,
>>>>                 >
>>>>                 > C++11 added features that allow for certain parts
>>>>                 of the class hierarchy to be closed, specifically
>>>>                 the 'final' keyword and the semantics of anonymous
>>>>                 namespaces, and I think we take advantage of these
>>>>                 to enhance our ability to perform devirtualization.
>>>>                 For example, given this situation:
>>>>                 >
>>>>                 > struct Base {
>>>>                 >  virtual void foo() = 0;
>>>>                 > };
>>>>                 >
>>>>                 > void external();
>>>>                 > struct Final final : Base {
>>>>                 >  void foo() {
>>>>                 > external();
>>>>                 >  }
>>>>                 > };
>>>>                 >
>>>>                 > void dispatch(Base *B) {
>>>>                 > B->foo();
>>>>                 > }
>>>>                 >
>>>>                 > void opportunity(Final *F) {
>>>>                 > dispatch(F);
>>>>                 > }
>>>>                 >
>>>>                 > When we optimize this code, we do the expected
>>>>                 thing and inline 'dispatch' into 'opportunity' but
>>>>                 we don't devirtualize the call to foo(). The fact
>>>>                 that we know what the vtable of F is at that
>>>>                 callsite is not exploited. To a lesser extent, we
>>>>                 can do similar things for final virtual methods,
>>>>                 and derived classes in anonymous namespaces
>>>>                 (because Clang could determine whether or not a
>>>>                 class (or method) there is effectively final).
>>>>                 >
>>>>                 > One possibility might be to @llvm.assume to say
>>>>                 something about what the vtable ptr of F might
>>>>                 be/contain should it be needed later when we emit
>>>>                 the initial IR for 'opportunity' (and then teach
>>>>                 the optimizer to use that information), but I'm not
>>>>                 at all sure that's the best solution. Thoughts?
>>>>
>>>>                 The problem with any sort of @llvm.assume-encoded
>>>>                 information about memory contents is that C++ does
>>>>                 actually allow you to replace objects in memory, up
>>>>                 to and including stuff like:
>>>>
>>>>                 {
>>>>                   MyClass c;
>>>>
>>>>                   // Reuse the storage temporarily. UB to access
>>>>                 the object through ‘c’ now.
>>>>                 c.~MyClass();
>>>>                   auto c2 = new (&c) MyOtherClass();
>>>>
>>>>                   // The storage has to contain a ‘MyClass’ when it
>>>>                 goes out of scope.
>>>>                 c2->~MyOtherClass();
>>>>                   new (&c) MyClass();
>>>>                 }
>>>>
>>>>                 The standard frontend devirtualization
>>>>                 optimizations are permitted under a couple of
>>>>                 different language rules, specifically that:
>>>>                 1. If you access an object through an l-value of a
>>>>                 type, it has to dynamically be an object of that
>>>>                 type (potentially a subobject).
>>>>                 2. Object replacement as above only “forwards”
>>>>                 existing formal references under specific
>>>>                 conditions, e.g. the dynamic type has to be the
>>>>                 same, ‘const’ members have to have the same value,
>>>>                 etc.  Using an unforwarded reference (like the name
>>>>                 of the local variable ‘c’ above) doesn’t formally
>>>>                 refer to a valid object and thus has undefined
>>>>                 behavior.
>>>>
>>>>                 You can apply those rules much more broadly than
>>>>                 the frontend does, of course; but those are the
>>>>                 language tools you get.
>>>>
>>>>
>>>>             Right. Our current plan for modelling this is:
>>>>
>>>>             1) Change the meaning of the existing !invariant.load
>>>>             metadata (or add another parallel metadata kind) so
>>>>             that it allows load-load forwarding (even if the memory
>>>>             is not known to be unmodified between the loads) if:
>>>
>>>             invariant.load currently allows the load to be reordered
>>>             pretty aggressively, so I think you need a new metadata.
>>>
>>>
>>>         Our thoughts were:
>>>         1) The existing !invariant.load is redundant because it's
>>>         exactly equivalent to a call to @llvm.invariant.start and a
>>>         load.
>>>         2) The new semantics are a more strict form of the old
>>>         semantics, so no special action is required to upgrade old IR.
>>>         ... so changing the meaning of the existing metadata seemed
>>>         preferable to adding a new, similar-but-not-quite-identical,
>>>         form of the metadata. But either way seems fine.
>>         I'm going to argue pretty strongly in favour of the new form
>>         of metadata.  We've spent a lot of time getting
>>         !invariant.load working well for use cases like the "length"
>>         field in a Java array and I'd really hate to give that up.
>>
>>         (One way of framing this is that the current !invariant.load
>>         gives a guarantee that there can't be a @llvm.invariant.end
>>         call anywhere in the program and that any
>>         @llvm.invariant.start occurs outside the visible scope of the
>>         compilation unit (Module, LTO, what have you) and must have
>>         executed before any code contained in said module which can
>>         describe the memory location can execute.  FYI, that last bit
>>         of strange wording is to allow initialization inside a malloc
>>         like function which returns a noalias pointer.)
>>
>>
>>     I had overlooked that !invariant.load also applies for loads
>>     /before/ the invariant load. I agree that this is different both
>>     from what we're proposing and from what you can achieve with
>>     @llvm.invariant.start. I would expect that you can use our
>>     metadata for the length in a Java array -- it seems like it'd be
>>     straightforward for you to arrange that all loads of the array
>>     field have the metadata (and that you use the same operand on all
>>     of them) -- but there's no real motivation behind reusing the
>>     existing metadata besides simplicity and cleanliness.
>>
>>         I'm definitely open to working together on a revised version
>>         of a more general invariant mechanism. In particular, we
>>         don't have a good way of modelling Java's "final" fields* in
>>         the IR today since the initialization logic may be visible to
>>         the compiler.  Coming up with something which supports both
>>         use cases would be really useful.
>>
>>
>>     This seems like something that our proposed mechanism may be able
>>     to support; we intend to use it for const and reference data
>>     members in C++, though the semantics of those are not quite the same.
>
>     ObjC (and Swift, and probably a number of other languages) has a
>     optimization opportunity where there’s a global variable that’s
>     known to be constant after its initialization.  (For the
>     initiated, I’m talking here primarily about ivar offset
>     variables.)  However, that initialization is run lazily, and it’s
>     only at specific points within the program that we can guarantee
>     that it’s already been performed.  (Namely, before ivar accesses
>     or after message sends to the class (but not to instances, because
>     of nil).)  Those points usually guarantee the initialization of
>     more than one variable, and contrariwise, there are often several
>     such points that would each individually suffice to establish the
>     guarantee for a particular load, allowing it to be
>     hoisted/reordered/combined at will.
>
>     So e.g.
>
>       if (cond) {
>         // Here there’s an operation that proves to us that A, B, and
>     C are initialized.
>       } else {
>         // Here there’s an operation that proves it for just A and B.
>       }
>
>       for (;;) {
>         // Here we load A.  This should be hoist able out of this
>     loop, independently of whatever else happens in this loop.
>       }
>
>     This is actually the situation where ObjC currently uses
>     !invariant.load, except that we can only safely use it in specific
>     functions (ObjC method implementations) that guarantee
>     initialization before entry and which can never be inlined.
>
>     Now, I think something like invariant.start would help with this,
>     except that I’m concerned that we’d have to eagerly emit what
>     might be dozens of invariant.starts at every point that
>     established the guarantee, which would be pretty wasteful even for
>     optimized builds.  If we’re designing new metadata anyway, or
>     generalizing existing metadata, can we try to make this more
>     scalable, so that e.g. I can use a single intrinsic with a list of
>     the invariants it establishes, ideally in a way that’s sharable
>     between calls?
>
>
> It seems we have three different use cases:
>
> 1) This invariant applies to this load and all future loads of this 
> pointer (ObjC / Swift constants, Java final members)
> 2) This invariant applies to this load and all past and future loads 
> of this pointer (Java array length)
Slight tweak here: it's not "all past" *loads*.  It's all (past, 
present, future) instruction *positions* at which this location was/is 
dereferenceable.  There doesn't actually have to be a load there (yet).

It's specifically that property which allows aggressive hoisting (e.g. 
in LICM).

> 3) This invariant applies to this load and all (past and) future loads 
> of this pointer that also have metadata with the same operand (C++ 
> vptr, const members, reference members)
(I'm assuming that by "this pointer" you actually mean "this abstract 
memory location".)

For my use cases, I can replace (1) with (3) as long as I give all such 
loads the same operand, I still need (2) as a distinct notion though.  
Definition (1) and (3) don't include the ability to do aggressive 
hoisting.  If we could define it in a way it did, we might be able to 
combine all three.
>
> We could use (1) or (2) for C++, but that would require us to insert a 
> lot more invariant barriers (for all dynamic type changes, not just 
> for a change to a type with a constant member), and would be much less 
> forgiving of user errors. How general a system are you imagining?
(2) today works specifically because there is no such thing as a 
start/end for the invariantness.  If there were, we'd have to solve a 
potentially hard IPO problem for basic hoisting in LICM.

I feel we're going to end up needing something which supports the 
following combinations:
- No start, no end (2 above)
- Possible start, no end (1 above)
- Possible start, possible end, possible start, possible end, .... (what 
@llvm.invariant.start/end try to be and fail at)

(Note that for all of these, dereferenceability is orthogonal to the 
invariantness of the memory location.)

Whether those are all the same mechanism or not, who knows.

To throw yet another wrinkle in, the backend has a separate notion of 
invarantness.  It assumes (2), plus derefenceability within the *entire* 
function.  It's for this reason that many !invariant.loads aren't marked 
invariant in the backend.

p.s. Before this goes too much further, we should move this to it's own 
thread, and CC llvmdev rather than just cfe-dev.  Many interested folks 
(i.e. other frontends) might not be subscribed to cfe-dev.

Philip

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150717/0c2a8d54/attachment.html>