[cfe-dev] C++11 and enhacned devirtualization

Fri Jul 17 19:35:05 PDT 2015

On Fri, Jul 17, 2015 at 5:30 PM, Philip Reames <listmail at philipreames.com>
wrote:

>  On 07/17/2015 04:56 PM, Richard Smith wrote:
>
>  On Fri, Jul 17, 2015 at 3:23 PM, John McCall <rjmccall at apple.com> wrote:
>
>>    On Jul 17, 2015, at 2:49 PM, Richard Smith <richard at metafoo.co.uk>
>> wrote:
>>   On Fri, Jul 17, 2015 at 2:05 PM, Philip Reames <
>> listmail at philipreames.com> wrote:
>>
>>>
>>>
>>> On 07/16/2015 02:38 PM, Richard Smith wrote:
>>>
>>>  On Thu, Jul 16, 2015 at 2:03 PM, John McCall <rjmccall at apple.com>
>>> wrote:
>>>
>>>>    On Jul 16, 2015, at 11:46 AM, Richard Smith <richard at metafoo.co.uk>
>>>> wrote:
>>>>   On Thu, Jul 16, 2015 at 11:29 AM, John McCall <rjmccall at apple.com>
>>>> wrote:
>>>>
>>>>> > On Jul 15, 2015, at 10:11 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>>>>> >
>>>>> > Hi everyone,
>>>>> >
>>>>> > C++11 added features that allow for certain parts of the class
>>>>> hierarchy to be closed, specifically the 'final' keyword and the semantics
>>>>> of anonymous namespaces, and I think we take advantage of these to enhance
>>>>> our ability to perform devirtualization. For example, given this situation:
>>>>> >
>>>>> > struct Base {
>>>>> >  virtual void foo() = 0;
>>>>> > };
>>>>> >
>>>>> > void external();
>>>>> > struct Final final : Base {
>>>>> >  void foo() {
>>>>> >    external();
>>>>> >  }
>>>>> > };
>>>>> >
>>>>> > void dispatch(Base *B) {
>>>>> >  B->foo();
>>>>> > }
>>>>> >
>>>>> > void opportunity(Final *F) {
>>>>> >  dispatch(F);
>>>>> > }
>>>>> >
>>>>> > When we optimize this code, we do the expected thing and inline
>>>>> 'dispatch' into 'opportunity' but we don't devirtualize the call to foo().
>>>>> The fact that we know what the vtable of F is at that callsite is not
>>>>> exploited. To a lesser extent, we can do similar things for final virtual
>>>>> methods, and derived classes in anonymous namespaces (because Clang could
>>>>> determine whether or not a class (or method) there is effectively final).
>>>>> >
>>>>> > One possibility might be to @llvm.assume to say something about what
>>>>> the vtable ptr of F might be/contain should it be needed later when we emit
>>>>> the initial IR for 'opportunity' (and then teach the optimizer to use that
>>>>> information), but I'm not at all sure that's the best solution. Thoughts?
>>>>>
>>>>> The problem with any sort of @llvm.assume-encoded information about
>>>>> memory contents is that C++ does actually allow you to replace objects in
>>>>> memory, up to and including stuff like:
>>>>>
>>>>> {
>>>>>   MyClass c;
>>>>>
>>>>>   // Reuse the storage temporarily.  UB to access the object through
>>>>> ‘c’ now.
>>>>>   c.~MyClass();
>>>>>   auto c2 = new (&c) MyOtherClass();
>>>>>
>>>>>   // The storage has to contain a ‘MyClass’ when it goes out of scope.
>>>>>   c2->~MyOtherClass();
>>>>>   new (&c) MyClass();
>>>>> }
>>>>>
>>>>> The standard frontend devirtualization optimizations are permitted
>>>>> under a couple of different language rules, specifically that:
>>>>> 1. If you access an object through an l-value of a type, it has to
>>>>> dynamically be an object of that type (potentially a subobject).
>>>>> 2. Object replacement as above only “forwards” existing formal
>>>>> references under specific conditions, e.g. the dynamic type has to be the
>>>>> same, ‘const’ members have to have the same value, etc.  Using an
>>>>> unforwarded reference (like the name of the local variable ‘c’ above)
>>>>> doesn’t formally refer to a valid object and thus has undefined behavior.
>>>>>
>>>>> You can apply those rules much more broadly than the frontend does, of
>>>>> course; but those are the language tools you get.
>>>>
>>>>
>>>>  Right. Our current plan for modelling this is:
>>>>
>>>>  1) Change the meaning of the existing !invariant.load metadata (or
>>>> add another parallel metadata kind) so that it allows load-load forwarding
>>>> (even if the memory is not known to be unmodified between the loads) if:
>>>>
>>>>
>>>>   invariant.load currently allows the load to be reordered pretty
>>>> aggressively, so I think you need a new metadata.
>>>>
>>>
>>>  Our thoughts were:
>>> 1) The existing !invariant.load is redundant because it's exactly
>>> equivalent to a call to @llvm.invariant.start and a load.
>>> 2) The new semantics are a more strict form of the old semantics, so no
>>> special action is required to upgrade old IR.
>>> ... so changing the meaning of the existing metadata seemed preferable
>>> to adding a new, similar-but-not-quite-identical, form of the metadata. But
>>> either way seems fine.
>>>
>>>  I'm going to argue pretty strongly in favour of the new form of
>>> metadata.  We've spent a lot of time getting !invariant.load working well
>>> for use cases like the "length" field in a Java array and I'd really hate
>>> to give that up.
>>>
>>> (One way of framing this is that the current !invariant.load gives a
>>> guarantee that there can't be a @llvm.invariant.end call anywhere in the
>>> program and that any @llvm.invariant.start occurs outside the visible scope
>>> of the compilation unit (Module, LTO, what have you) and must have executed
>>> before any code contained in said module which can describe the memory
>>> location can execute.  FYI, that last bit of strange wording is to allow
>>> initialization inside a malloc like function which returns a noalias
>>> pointer.)
>>>
>>
>>  I had overlooked that !invariant.load also applies for loads /before/
>> the invariant load. I agree that this is different both from what we're
>> proposing and from what you can achieve with @llvm.invariant.start. I would
>> expect that you can use our metadata for the length in a Java array -- it
>> seems like it'd be straightforward for you to arrange that all loads of the
>> array field have the metadata (and that you use the same operand on all of
>> them) -- but there's no real motivation behind reusing the existing
>> metadata besides simplicity and cleanliness.
>>
>>   I'm definitely open to working together on a revised version of a more
>>> general invariant mechanism.  In particular, we don't have a good way of
>>> modelling Java's "final" fields* in the IR today since the initialization
>>> logic may be visible to the compiler.  Coming up with something which
>>> supports both use cases would be really useful.
>>>
>>
>>  This seems like something that our proposed mechanism may be able to
>> support; we intend to use it for const and reference data members in C++,
>> though the semantics of those are not quite the same.
>>
>>
>>   ObjC (and Swift, and probably a number of other languages) has a
>> optimization opportunity where there’s a global variable that’s known to be
>> constant after its initialization.  (For the initiated, I’m talking here
>> primarily about ivar offset variables.)  However, that initialization is
>> run lazily, and it’s only at specific points within the program that we can
>> guarantee that it’s already been performed.  (Namely, before ivar accesses
>> or after message sends to the class (but not to instances, because of
>> nil).)  Those points usually guarantee the initialization of more than one
>> variable, and contrariwise, there are often several such points that would
>> each individually suffice to establish the guarantee for a particular load,
>> allowing it to be hoisted/reordered/combined at will.
>>
>>  So e.g.
>>
>>    if (cond) {
>>     // Here there’s an operation that proves to us that A, B, and C are
>> initialized.
>>   } else {
>>     // Here there’s an operation that proves it for just A and B.
>>   }
>>
>>    for (;;) {
>>     // Here we load A.  This should be hoist able out of this loop,
>> independently of whatever else happens in this loop.
>>   }
>>
>>  This is actually the situation where ObjC currently uses
>> !invariant.load, except that we can only safely use it in specific
>> functions (ObjC method implementations) that guarantee initialization
>> before entry and which can never be inlined.
>>
>>  Now, I think something like invariant.start would help with this,
>> except that I’m concerned that we’d have to eagerly emit what might be
>> dozens of invariant.starts at every point that established the guarantee,
>> which would be pretty wasteful even for optimized builds.  If we’re
>> designing new metadata anyway, or generalizing existing metadata, can we
>> try to make this more scalable, so that e.g. I can use a single intrinsic
>> with a list of the invariants it establishes, ideally in a way that’s
>> sharable between calls?
>>
>
>  It seems we have three different use cases:
>
>  1) This invariant applies to this load and all future loads of this
> pointer (ObjC / Swift constants, Java final members)
> 2) This invariant applies to this load and all past and future loads of
> this pointer (Java array length)
>
> Slight tweak here: it's not "all past" *loads*.  It's all (past, present,
> future) instruction *positions* at which this location was/is
> dereferenceable.  There doesn't actually have to be a load there (yet).
>
> It's specifically that property which allows aggressive hoisting (e.g. in
> LICM).
>
>   3) This invariant applies to this load and all (past and) future loads
> of this pointer that also have metadata with the same operand (C++ vptr,
> const members, reference members)
>
> (I'm assuming that by "this pointer" you actually mean "this abstract
> memory location".)
>

No; that would remove the ability to have something like
@llvm.invariant.barrier. This is a property of the pointer value, not a
property of the storage location. In this regard, it seems fundamentally
different to the existing !invariant.load.

For my use cases, I can replace (1) with (3) as long as I give all such
> loads the same operand, I still need (2) as a distinct notion though.
> Definition (1) and (3) don't include the ability to do aggressive
> hoisting.  If we could define it in a way it did, we might be able to
> combine all three.
>
>
>  We could use (1) or (2) for C++, but that would require us to insert a
> lot more invariant barriers (for all dynamic type changes, not just for a
> change to a type with a constant member), and would be much less forgiving
> of user errors. How general a system are you imagining?
>
> (2) today works specifically because there is no such thing as a start/end
> for the invariantness.  If there were, we'd have to solve a potentially
> hard IPO problem for basic hoisting in LICM.
>
> I feel we're going to end up needing something which supports the
> following combinations:
> - No start, no end (2 above)
> - Possible start, no end (1 above)
> - Possible start, possible end, possible start, possible end, .... (what
> @llvm.invariant.start/end try to be and fail at)
>
> (Note that for all of these, dereferenceability is orthogonal to the
> invariantness of the memory location.)
>
> Whether those are all the same mechanism or not, who knows.
>
> To throw yet another wrinkle in, the backend has a separate notion of
> invarantness.  It assumes (2), plus derefenceability within the *entire*
> function.  It's for this reason that many !invariant.loads aren't marked
> invariant in the backend.
>
> p.s. Before this goes too much further, we should move this to it's own
> thread, and CC llvmdev rather than just cfe-dev.  Many interested folks
> (i.e. other frontends) might not be subscribed to cfe-dev.
>

A proper proposal is on the way (which will get its own thread on llvmdev);
we were only responding here because Hal happened to bring up the topic
before we were ready with our proposal :-)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150717/60f09bba/attachment.html>