[cfe-dev] C++11 and enhacned devirtualization

Wed Jul 22 15:06:33 PDT 2015

Just to not lose track, I started new thread with our proposal:
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2015-July/044246.html

Piotr

On Mon, Jul 20, 2015 at 5:22 PM, Richard Smith <richard at metafoo.co.uk>
wrote:

> On Mon, Jul 20, 2015 at 3:50 PM, John McCall <rjmccall at apple.com> wrote:
>
>> On Jul 17, 2015, at 4:56 PM, Richard Smith <richard at metafoo.co.uk> wrote:
>> On Fri, Jul 17, 2015 at 3:23 PM, John McCall <rjmccall at apple.com> wrote:
>>
>>> On Jul 17, 2015, at 2:49 PM, Richard Smith <richard at metafoo.co.uk>
>>> wrote:
>>> On Fri, Jul 17, 2015 at 2:05 PM, Philip Reames <
>>> listmail at philipreames.com> wrote:
>>>
>>>>
>>>>
>>>> On 07/16/2015 02:38 PM, Richard Smith wrote:
>>>>
>>>>  On Thu, Jul 16, 2015 at 2:03 PM, John McCall <rjmccall at apple.com>
>>>> wrote:
>>>>
>>>>>    On Jul 16, 2015, at 11:46 AM, Richard Smith <richard at metafoo.co.uk>
>>>>> wrote:
>>>>>   On Thu, Jul 16, 2015 at 11:29 AM, John McCall <rjmccall at apple.com>
>>>>> wrote:
>>>>>
>>>>>> > On Jul 15, 2015, at 10:11 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>>>>>> >
>>>>>> > Hi everyone,
>>>>>> >
>>>>>> > C++11 added features that allow for certain parts of the class
>>>>>> hierarchy to be closed, specifically the 'final' keyword and the semantics
>>>>>> of anonymous namespaces, and I think we take advantage of these to enhance
>>>>>> our ability to perform devirtualization. For example, given this situation:
>>>>>> >
>>>>>> > struct Base {
>>>>>> >  virtual void foo() = 0;
>>>>>> > };
>>>>>> >
>>>>>> > void external();
>>>>>> > struct Final final : Base {
>>>>>> >  void foo() {
>>>>>> >    external();
>>>>>> >  }
>>>>>> > };
>>>>>> >
>>>>>> > void dispatch(Base *B) {
>>>>>> >  B->foo();
>>>>>> > }
>>>>>> >
>>>>>> > void opportunity(Final *F) {
>>>>>> >  dispatch(F);
>>>>>> > }
>>>>>> >
>>>>>> > When we optimize this code, we do the expected thing and inline
>>>>>> 'dispatch' into 'opportunity' but we don't devirtualize the call to foo().
>>>>>> The fact that we know what the vtable of F is at that callsite is not
>>>>>> exploited. To a lesser extent, we can do similar things for final virtual
>>>>>> methods, and derived classes in anonymous namespaces (because Clang could
>>>>>> determine whether or not a class (or method) there is effectively final).
>>>>>> >
>>>>>> > One possibility might be to @llvm.assume to say something about
>>>>>> what the vtable ptr of F might be/contain should it be needed later when we
>>>>>> emit the initial IR for 'opportunity' (and then teach the optimizer to use
>>>>>> that information), but I'm not at all sure that's the best solution.
>>>>>> Thoughts?
>>>>>>
>>>>>> The problem with any sort of @llvm.assume-encoded information about
>>>>>> memory contents is that C++ does actually allow you to replace objects in
>>>>>> memory, up to and including stuff like:
>>>>>>
>>>>>> {
>>>>>>   MyClass c;
>>>>>>
>>>>>>   // Reuse the storage temporarily.  UB to access the object through
>>>>>> ‘c’ now.
>>>>>>   c.~MyClass();
>>>>>>   auto c2 = new (&c) MyOtherClass();
>>>>>>
>>>>>>   // The storage has to contain a ‘MyClass’ when it goes out of scope.
>>>>>>   c2->~MyOtherClass();
>>>>>>   new (&c) MyClass();
>>>>>> }
>>>>>>
>>>>>> The standard frontend devirtualization optimizations are permitted
>>>>>> under a couple of different language rules, specifically that:
>>>>>> 1. If you access an object through an l-value of a type, it has to
>>>>>> dynamically be an object of that type (potentially a subobject).
>>>>>> 2. Object replacement as above only “forwards” existing formal
>>>>>> references under specific conditions, e.g. the dynamic type has to be the
>>>>>> same, ‘const’ members have to have the same value, etc.  Using an
>>>>>> unforwarded reference (like the name of the local variable ‘c’ above)
>>>>>> doesn’t formally refer to a valid object and thus has undefined behavior.
>>>>>>
>>>>>> You can apply those rules much more broadly than the frontend does,
>>>>>> of course; but those are the language tools you get.
>>>>>
>>>>>
>>>>>  Right. Our current plan for modelling this is:
>>>>>
>>>>>  1) Change the meaning of the existing !invariant.load metadata (or
>>>>> add another parallel metadata kind) so that it allows load-load forwarding
>>>>> (even if the memory is not known to be unmodified between the loads) if:
>>>>>
>>>>>
>>>>>   invariant.load currently allows the load to be reordered pretty
>>>>> aggressively, so I think you need a new metadata.
>>>>>
>>>>
>>>>  Our thoughts were:
>>>> 1) The existing !invariant.load is redundant because it's exactly
>>>> equivalent to a call to @llvm.invariant.start and a load.
>>>> 2) The new semantics are a more strict form of the old semantics, so no
>>>> special action is required to upgrade old IR.
>>>> ... so changing the meaning of the existing metadata seemed preferable
>>>> to adding a new, similar-but-not-quite-identical, form of the metadata. But
>>>> either way seems fine.
>>>>
>>>> I'm going to argue pretty strongly in favour of the new form of
>>>> metadata.  We've spent a lot of time getting !invariant.load working well
>>>> for use cases like the "length" field in a Java array and I'd really hate
>>>> to give that up.
>>>>
>>>> (One way of framing this is that the current !invariant.load gives a
>>>> guarantee that there can't be a @llvm.invariant.end call anywhere in the
>>>> program and that any @llvm.invariant.start occurs outside the visible scope
>>>> of the compilation unit (Module, LTO, what have you) and must have executed
>>>> before any code contained in said module which can describe the memory
>>>> location can execute.  FYI, that last bit of strange wording is to allow
>>>> initialization inside a malloc like function which returns a noalias
>>>> pointer.)
>>>>
>>>
>>> I had overlooked that !invariant.load also applies for loads /before/
>>> the invariant load. I agree that this is different both from what we're
>>> proposing and from what you can achieve with @llvm.invariant.start. I would
>>> expect that you can use our metadata for the length in a Java array -- it
>>> seems like it'd be straightforward for you to arrange that all loads of the
>>> array field have the metadata (and that you use the same operand on all of
>>> them) -- but there's no real motivation behind reusing the existing
>>> metadata besides simplicity and cleanliness.
>>>
>>> I'm definitely open to working together on a revised version of a more
>>>> general invariant mechanism.  In particular, we don't have a good way of
>>>> modelling Java's "final" fields* in the IR today since the initialization
>>>> logic may be visible to the compiler.  Coming up with something which
>>>> supports both use cases would be really useful.
>>>>
>>>
>>> This seems like something that our proposed mechanism may be able to
>>> support; we intend to use it for const and reference data members in C++,
>>> though the semantics of those are not quite the same.
>>>
>>>
>>> ObjC (and Swift, and probably a number of other languages) has a
>>> optimization opportunity where there’s a global variable that’s known to be
>>> constant after its initialization.  (For the initiated, I’m talking here
>>> primarily about ivar offset variables.)  However, that initialization is
>>> run lazily, and it’s only at specific points within the program that we can
>>> guarantee that it’s already been performed.  (Namely, before ivar accesses
>>> or after message sends to the class (but not to instances, because of
>>> nil).)  Those points usually guarantee the initialization of more than one
>>> variable, and contrariwise, there are often several such points that would
>>> each individually suffice to establish the guarantee for a particular load,
>>> allowing it to be hoisted/reordered/combined at will.
>>>
>>> So e.g.
>>>
>>>   if (cond) {
>>>     // Here there’s an operation that proves to us that A, B, and C are
>>> initialized.
>>>   } else {
>>>     // Here there’s an operation that proves it for just A and B.
>>>   }
>>>
>>>   for (;;) {
>>>     // Here we load A.  This should be hoist able out of this loop,
>>> independently of whatever else happens in this loop.
>>>   }
>>>
>>> This is actually the situation where ObjC currently uses
>>> !invariant.load, except that we can only safely use it in specific
>>> functions (ObjC method implementations) that guarantee initialization
>>> before entry and which can never be inlined.
>>>
>>> Now, I think something like invariant.start would help with this, except
>>> that I’m concerned that we’d have to eagerly emit what might be dozens of
>>> invariant.starts at every point that established the guarantee, which would
>>> be pretty wasteful even for optimized builds.  If we’re designing new
>>> metadata anyway, or generalizing existing metadata, can we try to make this
>>> more scalable, so that e.g. I can use a single intrinsic with a list of the
>>> invariants it establishes, ideally in a way that’s sharable between calls?
>>>
>>
>> It seems we have three different use cases:
>>
>> 1) This invariant applies to this load and all future loads of this
>> pointer (ObjC / Swift constants, Java final members)
>> 2) This invariant applies to this load and all past and future loads of
>> this pointer (Java array length)
>>
>>
>> Hmm.  I’m not really seeing what you’re saying about past and future.
>> The difference is that the invariant holds, but only after a certain point;
>> reordering the load earlier across side-effects etc. is fine (great, even),
>> it just can’t cross a particular line.
>>
>> I assume the only reason that bounding the invariant isn’t important for
>> Philip's array-length is that the initialization is done opaquely, so that
>> the optimizer can’t see the pointer until it’s been properly initialized.
>> That is, the bound is enforced by SSA.
>>
>> I think the real difference here is whether SSA value identity can give
>> us sufficient information or not.
>>
>> For C++ vtables and const/reference members, it does, because the
>> semantics are dependent on “blessed” object references (the result of the
>> new-expression, the name of the local variable, etc.) which mostly have
>> undefined behavior if re-written.  Maybe you need to make some effort to
>> not have GEPs down to base classes break the optimization, but that’s
>> probably easy.
>>
>> For the Java cases, it still does, because the object becomes immutable
>> past a certain point (the return from the new operation), which also
>> narrows most/all of the subsequent accesses to a single value.  So you can
>> bless the object at that point; sure, you theoretically give up some
>> optimization opportunities if ‘this’ was stored aside during construction,
>> but it’s not a big deal.
>>
>> For my ObjC/Swift cases, it’s not good enough, because the addresses that
>> become immutable are global.  Each load is going to re-derive the address
>> from the global, so no amount of SSA value propagation is going to help;
>> thus the barrier has to be more like a memory barrier than just an SSA
>> barrier.  (The biggest distinguishing feature from actual atomic memory
>> barriers is that dominating barriers trivialize later barriers, regardless
>> of what happens in between.)
>>
>> (Of course, Swift has situations more like the C++ and Java
>> opportunities, too.)
>>
>> So maybe these are really different problems and there’s no useful shared
>> generalization.
>>
>
> I'm increasingly thinking that's the case; we now intend to propose adding
> new metadata rather than extending / generalizing !invariant.load.
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20150722/9dc4f98d/attachment.html>