[LLVMdev] [cfe-dev] Inlining and virtualization in Clang/LLVM

Thu Oct 4 10:44:00 PDT 2012

On Thu, Oct 4, 2012 at 7:23 PM, Matthieu Monrocq
<matthieu.monrocq at gmail.com> wrote:
> On Thu, Oct 4, 2012 at 7:37 AM, Nick Lewycky <nlewycky at google.com> wrote:
>> On 3 October 2012 13:30, David Blaikie <dblaikie at gmail.com> wrote:
>>>
>>> On Wed, Oct 3, 2012 at 12:01 PM, Matthieu Monrocq
>>> <matthieu.monrocq at gmail.com> wrote:
>>> > On Sun, Sep 30, 2012 at 5:04 PM, David Blaikie <dblaikie at gmail.com>
>>> > wrote:
>>> >> I'm not sure whether this is the exact problem at hand in your example,
>>> >> but one of the major hurdles llvm suffers when trying to devirtualize
>>> >> is the second point you made: it doesn't see the invariance of the
>>> >> table pointer post construction. In your specific example the
>>> >> constructors are trivial and inlinable so it I'm not sure why llvm
>>> >> would be having trouble proving the value of the vptr &
>>> >> devirtualizing. , perhaps due to them being static so the
>>> >> initialization is contingent on it being the first call (& llvm doesn't
>>> >> know that the vptr is constant after construction until destruction
>>> >> begins) & doesn't see the connection across all calls.
>>> >>
>>> >> So there are a few issues that need to be addressed to help this.
>>> >>
>>> >> One is some kind of constant range where the front end can promise not
>>> >> to modify certain values for a range (this might not be correct though
>>> >> - I remember some argument as to whether it was valid to destroy and
>>> >> then placement new the same type into that memory before the object
>>> >> goes out of scope - if so, then it's not obvious if the vptr is
>>> >> constant in any range) & secondly (at least for the case of non inline
>>> >> constructors) the ability to provide some kind of assert that, post
>>> >> construction, the vptr is equal to some known constant. (this assertion
>>> >> is currently possible with a branch to unreachable, except llvm throws
>>> >> out the unreachable code in SimplyCFG before any optimizations run - so
>>> >> we need to see if we can keep them around longer - there are some PRs
>>> >> filed for this but I haven't made much progress on it
>>> >> From: Matthieu Monrocq
>>> >> Sent: 9/29/2012 4:41 PM
>>> >> To: cfe-dev at cs.uiuc.edu; llvmdev
>>> >> Subject: [cfe-dev] Inlining and virtualization in Clang/LLVM
>>> >> Hello,
>>> >>
>>> >> at the moment the devirtualization of calls is performed in Clang (as
>>> >> far as I understand) whilst inlining and constant propagation are the
>>> >> optimizer's (LLVM) job.
>>> >>
>>> >> It is probably necessary for Clang to perform "some" devirtualization
>>> >> (the meaning of `final` is not known to LLVM), however all the stuff
>>> >> to determine what the dynamic type of a variable is seems redundant
>>> >> with LLVM, and is incomplete (in a way) when it's not perform *after*
>>> >> inlining and constant propagation.
>>> >>
>>> >>
>>> >> It seems to me therefore that because of this we are missing
>>> >> optimization opportunities. Suppose the following example program:
>>> >>
>>> >> #include <cstdio>
>>> >>
>>> >> struct Base { virtual void foo() = 0; };
>>> >>
>>> >> struct NothingDerived: Base { virtual void foo() {} };
>>> >>
>>> >> struct PrintDerived: Base { virtual void foo() { printf("Hello
>>> >> World!"); } };
>>> >>
>>> >> Base& select(int i) {
>>> >>     static NothingDerived nd;
>>> >>     static PrintDerived pd;
>>> >>     if (i % 2 == 0) { return nd; }
>>> >>     return pd;
>>> >> }
>>> >>
>>> >> int main() {
>>> >>     Base& b = select(0);
>>> >>     b.foo();
>>> >> }
>>> >>
>>> >>
>>> >> Which gives the following main function (using Try out LLVM and Clang):
>>> >>
>>> >> define i32 @main() uwtable {
>>> >> [...]
>>> >>
>>> >> _Z6selecti.exit:                                  ; preds = %13, %10,
>>> >> %7
>>> >>   %14 = load void (%struct.Base*)*** bitcast (%struct.NothingDerived*
>>> >> @_ZZ6selectiE2nd to void (%struct.Base*)***), align 8
>>> >>   %15 = load void (%struct.Base*)** %14, align 8
>>> >>   tail call void %15(%struct.Base* getelementptr inbounds
>>> >> (%struct.NothingDerived* @_ZZ6selectiE2nd, i64 0, i32 0))
>>> >>   ret i32 0
>>> >> }
>>> >>
>>> >>
>>> >> LLVM trivially sees through the select call and rightly deduce that we
>>> >> have to do with NothingDerived. However it does not go any step
>>> >> further and directly select NothingDerived::foo's function. Instead it
>>> >> dutifully performs all the bitcasting / pointer arithmetic necessary
>>> >> to access the pointer to function stored in the VTable and calls it
>>> >> through a pointer to function.
>>> >>
>>> >> I understand it would be awkward to have LLVM be aware of the virtual
>>> >> table implementation. Especially since even in C++ it varies from one
>>> >> implementation to another. However it seems to me that LLVM could
>>> >> still perform this optimization:
>>> >>
>>> >>  - LLVM having deduced the exact value to use (select(int)::nd) should
>>> >> be able to get directly to its v-ptr (the first field of Base)
>>> >>
>>> >> %struct.NothingDerived = type { %struct.Base }
>>> >> %struct.Base = type { i32 (...)** }
>>> >>
>>> >>
>>> >>  - the v-ptr (after construction) always point to the same v-table,
>>> >> which is a constant
>>> >>
>>> >> store i32 (...)** bitcast (i8** getelementptr inbounds ([3 x i8*]*
>>> >> @_ZTV14NothingDerived, i64 0, i64 2) to i32 (...)**), i32 (...)***
>>> >> getelementptr inbounds (%struct.NothingDerived* @_ZZ6selectiE2nd, i64
>>> >> 0, i32 0, i32 0), align 8
>>> >>
>>> >>
>>> >> - the offset in the v-table is "static"
>>> >>
>>> >> getelementptr inbounds (%struct.NothingDerived* @_ZZ6selectiE2nd, i64
>>> >> 0, i32 0)
>>> >>
>>> >>
>>> >> - the v-table being constant, what is stored at that offset is
>>> >> perfectly deducible as well
>>> >>
>>> >> @_ZTV14NothingDerived = linkonce_odr unnamed_addr constant [3 x i8*]
>>> >> [i8* null, i8* bitcast ({ i8*, i8*, i8* }* @_ZTI14NothingDerived to
>>> >> i8*), i8* bitcast (void (%struct.NothingDerived*)*
>>> >> @_ZN14NothingDerived3fooEv to i8*)]
>>> >>
>>> >>
>>> >> So the question is, what is lacking for LLVM to perform this
>>> >> optimization ?
>>> >>
>>> >> - Is it because of the loss of information in having the v-table
>>> >> stored as a "blob" of bytes ? (which means that Clang should pass more
>>> >> typed information, without changing the exact layout obviously given
>>> >> the ABI constraints)
>>> >>
>>> >> - Or is it something internal to LLVM (the information is somehow
>>> >> irremediably lost) ?
>>> >>
>>> >>
>>> >> I admit that reducing the virtual call overhead is probably not really
>>> >> worth it (in general), however devirtualizing calls also expose more
>>> >> inlining/context opportunities and it's hard (for me) to quantify what
>>> >> such an optimization could bring here. We should also consider the
>>> >> simplification in Clang (and other frontends) if LLVM could perform
>>> >> the job on its own.
>>> >>
>>> >> -- Matthieu
>>> >> _______________________________________________
>>> >> cfe-dev mailing list
>>> >> cfe-dev at cs.uiuc.edu
>>> >> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>> >
>>> > Hello David,
>>> >
>>> > I have been thinking a bit more about this and as I find the problem
>>> > quite interesting.
>>> >
>>> > If I get down one level (and revert to implementing objects in C):
>>> >
>>> > #include <stdio.h>
>>> >
>>> > typedef void (*Function)();
>>> >
>>> > void print() { printf("Hello, World!"); }
>>> > void nothing() {}
>>> >
>>> > Function get(int i) {
>>> >   if (i % 2 == 0) { return &print; }
>>> >   return ¬hing;
>>> > }
>>> >
>>> > int main() {
>>> >   Function f = get(2);
>>> >   (*f)();
>>> >   return 0;
>>> > }
>>> >
>>> > Which is admittedly quite similar, generates the following main:
>>> >
>>> > define i32 @main() nounwind uwtable {
>>> >   %1 = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds
>>> > ([14 x i8]* @.str, i64 0, i64 0)) nounwind
>>> >   ret i32 0
>>> > }
>>> >
>>> > This falls out naturally from SSA form, I think. No "constification"
>>> > is necessary.
>>> >
>>> > However in our case, it seems that the SSA form (which only references
>>> > the pointer to the structure itself, not the v-table pointer), is not
>>> > sufficient therefore to allow the optimizer to remember which value
>>> > the pointer should have.
>>> >
>>> > Of course, in general I agree that the compiler should be told
>>> > explicitly that the value cannot change (otherwise it would not know
>>> > that opaque calls are not accessing that particular value within the
>>> > structure); at least until the start of the destructor.
>>> >
>>> > I remember a discussion a while ago involving the design and
>>> > implementation of those "const-ranges" within the LLVM IR, do you
>>> > happen to know where it's at ?
>>>
>>> I'm not sure where that is, but I've +nlewycky because he's been
>>> thinking about this recently & there are some wrinkles as I alluded to
>>> in my reply. Saying that the vtable pointer is const from
>>> post-construction to pre-destruction isn't quite correct & would break
>>> certain obscure but valid programs. That being said, there is some
>>> kind of lifetime marker we could create & use to indicate the
>>> semantics of the vtable pointer without breaking those programs. (key
>>> here is that C++ cares about how you derive the pointer to the object,
>>> not just its value - if you explicitly destroy an object then
>>> placement new another object over the same space, the implementation
>>> is allowed to assume that the old pointers you had to the object still
>>> point to the same kind of object - but the one returned from placement
>>> new isn't, even though those pointers may have the same value...
>>> that's my understanding/vague description of the issue)
>>
>>
>> Right. I've got a proposal in PR13940 that Chris has asked me to mail out to
>> llvmdev. I should do that soon, but I wanted to wait until I would have the
>> time to reply to the comments that came up and possibly even implement the
>> result.
>>
>> Nick
>>
>
> @David: Thanks for bringing Nick here! Indeed I had not considered the
> (aweful) placement new issue. Aliasing rules do not really help us
> here, since they only consider the dynamic type of the object, and by
> calling the destructor explicitly then using placement new we end the
> lifetime of the object and then create a new object. I do wonder
> though if keeping a reference/pointer to that alive-dead-alive object
> is allowed.
>
> @Nick: I believe that your proposal could really help indeed.
> Ultimately, it might help trimming the devirtualization method in
> Clang to only take care about the "final" attribute.
>
> -- Matthieu

@David: I asked about this in
http://stackoverflow.com/questions/12732739/when-may-the-dynamic-type-of-a-referred-to-object-change
 Regarding your example I believe that it would be illegal to retain a
reference or pointer to an object that is "killed", because that
reference is stale, and the fact that you recreated another (similar)
object in its stead does not matter: after all the allocation routines
reuse storage quite often.

We'll see what the SO folks come up with, I tried but failed to locate
a full explanation in the Standard.

-- Matthieu