[cfe-dev] Inlining and virtualization in Clang/LLVM

Wed Oct 3 22:37:45 PDT 2012

On 3 October 2012 13:30, David Blaikie <dblaikie at gmail.com> wrote:

> On Wed, Oct 3, 2012 at 12:01 PM, Matthieu Monrocq
> <matthieu.monrocq at gmail.com> wrote:
> > On Sun, Sep 30, 2012 at 5:04 PM, David Blaikie <dblaikie at gmail.com>
> wrote:
> >> I'm not sure whether this is the exact problem at hand in your example,
> >> but one of the major hurdles llvm suffers when trying to devirtualize
> >> is the second point you made: it doesn't see the invariance of the
> >> table pointer post construction. In your specific example the
> >> constructors are trivial and inlinable so it I'm not sure why llvm
> >> would be having trouble proving the value of the vptr &
> >> devirtualizing. , perhaps due to them being static so the
> >> initialization is contingent on it being the first call (& llvm doesn't
> >> know that the vptr is constant after construction until destruction
> >> begins) & doesn't see the connection across all calls.
> >>
> >> So there are a few issues that need to be addressed to help this.
> >>
> >> One is some kind of constant range where the front end can promise not
> >> to modify certain values for a range (this might not be correct though
> >> - I remember some argument as to whether it was valid to destroy and
> >> then placement new the same type into that memory before the object
> >> goes out of scope - if so, then it's not obvious if the vptr is
> >> constant in any range) & secondly (at least for the case of non inline
> >> constructors) the ability to provide some kind of assert that, post
> >> construction, the vptr is equal to some known constant. (this assertion
> >> is currently possible with a branch to unreachable, except llvm throws
> >> out the unreachable code in SimplyCFG before any optimizations run - so
> >> we need to see if we can keep them around longer - there are some PRs
> >> filed for this but I haven't made much progress on it
> >> From: Matthieu Monrocq
> >> Sent: 9/29/2012 4:41 PM
> >> To: cfe-dev at cs.uiuc.edu; llvmdev
> >> Subject: [cfe-dev] Inlining and virtualization in Clang/LLVM
> >> Hello,
> >>
> >> at the moment the devirtualization of calls is performed in Clang (as
> >> far as I understand) whilst inlining and constant propagation are the
> >> optimizer's (LLVM) job.
> >>
> >> It is probably necessary for Clang to perform "some" devirtualization
> >> (the meaning of `final` is not known to LLVM), however all the stuff
> >> to determine what the dynamic type of a variable is seems redundant
> >> with LLVM, and is incomplete (in a way) when it's not perform *after*
> >> inlining and constant propagation.
> >>
> >>
> >> It seems to me therefore that because of this we are missing
> >> optimization opportunities. Suppose the following example program:
> >>
> >> #include <cstdio>
> >>
> >> struct Base { virtual void foo() = 0; };
> >>
> >> struct NothingDerived: Base { virtual void foo() {} };
> >>
> >> struct PrintDerived: Base { virtual void foo() { printf("Hello
> World!"); } };
> >>
> >> Base& select(int i) {
> >>     static NothingDerived nd;
> >>     static PrintDerived pd;
> >>     if (i % 2 == 0) { return nd; }
> >>     return pd;
> >> }
> >>
> >> int main() {
> >>     Base& b = select(0);
> >>     b.foo();
> >> }
> >>
> >>
> >> Which gives the following main function (using Try out LLVM and Clang):
> >>
> >> define i32 @main() uwtable {
> >> [...]
> >>
> >> _Z6selecti.exit:                                  ; preds = %13, %10, %7
> >>   %14 = load void (%struct.Base*)*** bitcast (%struct.NothingDerived*
> >> @_ZZ6selectiE2nd to void (%struct.Base*)***), align 8
> >>   %15 = load void (%struct.Base*)** %14, align 8
> >>   tail call void %15(%struct.Base* getelementptr inbounds
> >> (%struct.NothingDerived* @_ZZ6selectiE2nd, i64 0, i32 0))
> >>   ret i32 0
> >> }
> >>
> >>
> >> LLVM trivially sees through the select call and rightly deduce that we
> >> have to do with NothingDerived. However it does not go any step
> >> further and directly select NothingDerived::foo's function. Instead it
> >> dutifully performs all the bitcasting / pointer arithmetic necessary
> >> to access the pointer to function stored in the VTable and calls it
> >> through a pointer to function.
> >>
> >> I understand it would be awkward to have LLVM be aware of the virtual
> >> table implementation. Especially since even in C++ it varies from one
> >> implementation to another. However it seems to me that LLVM could
> >> still perform this optimization:
> >>
> >>  - LLVM having deduced the exact value to use (select(int)::nd) should
> >> be able to get directly to its v-ptr (the first field of Base)
> >>
> >> %struct.NothingDerived = type { %struct.Base }
> >> %struct.Base = type { i32 (...)** }
> >>
> >>
> >>  - the v-ptr (after construction) always point to the same v-table,
> >> which is a constant
> >>
> >> store i32 (...)** bitcast (i8** getelementptr inbounds ([3 x i8*]*
> >> @_ZTV14NothingDerived, i64 0, i64 2) to i32 (...)**), i32 (...)***
> >> getelementptr inbounds (%struct.NothingDerived* @_ZZ6selectiE2nd, i64
> >> 0, i32 0, i32 0), align 8
> >>
> >>
> >> - the offset in the v-table is "static"
> >>
> >> getelementptr inbounds (%struct.NothingDerived* @_ZZ6selectiE2nd, i64
> 0, i32 0)
> >>
> >>
> >> - the v-table being constant, what is stored at that offset is
> >> perfectly deducible as well
> >>
> >> @_ZTV14NothingDerived = linkonce_odr unnamed_addr constant [3 x i8*]
> >> [i8* null, i8* bitcast ({ i8*, i8*, i8* }* @_ZTI14NothingDerived to
> >> i8*), i8* bitcast (void (%struct.NothingDerived*)*
> >> @_ZN14NothingDerived3fooEv to i8*)]
> >>
> >>
> >> So the question is, what is lacking for LLVM to perform this
> optimization ?
> >>
> >> - Is it because of the loss of information in having the v-table
> >> stored as a "blob" of bytes ? (which means that Clang should pass more
> >> typed information, without changing the exact layout obviously given
> >> the ABI constraints)
> >>
> >> - Or is it something internal to LLVM (the information is somehow
> >> irremediably lost) ?
> >>
> >>
> >> I admit that reducing the virtual call overhead is probably not really
> >> worth it (in general), however devirtualizing calls also expose more
> >> inlining/context opportunities and it's hard (for me) to quantify what
> >> such an optimization could bring here. We should also consider the
> >> simplification in Clang (and other frontends) if LLVM could perform
> >> the job on its own.
> >>
> >> -- Matthieu
> >> _______________________________________________
> >> cfe-dev mailing list
> >> cfe-dev at cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
> >
> > Hello David,
> >
> > I have been thinking a bit more about this and as I find the problem
> > quite interesting.
> >
> > If I get down one level (and revert to implementing objects in C):
> >
> > #include <stdio.h>
> >
> > typedef void (*Function)();
> >
> > void print() { printf("Hello, World!"); }
> > void nothing() {}
> >
> > Function get(int i) {
> >   if (i % 2 == 0) { return &print; }
> >   return ¬hing;
> > }
> >
> > int main() {
> >   Function f = get(2);
> >   (*f)();
> >   return 0;
> > }
> >
> > Which is admittedly quite similar, generates the following main:
> >
> > define i32 @main() nounwind uwtable {
> >   %1 = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds
> > ([14 x i8]* @.str, i64 0, i64 0)) nounwind
> >   ret i32 0
> > }
> >
> > This falls out naturally from SSA form, I think. No "constification"
> > is necessary.
> >
> > However in our case, it seems that the SSA form (which only references
> > the pointer to the structure itself, not the v-table pointer), is not
> > sufficient therefore to allow the optimizer to remember which value
> > the pointer should have.
> >
> > Of course, in general I agree that the compiler should be told
> > explicitly that the value cannot change (otherwise it would not know
> > that opaque calls are not accessing that particular value within the
> > structure); at least until the start of the destructor.
> >
> > I remember a discussion a while ago involving the design and
> > implementation of those "const-ranges" within the LLVM IR, do you
> > happen to know where it's at ?
>
> I'm not sure where that is, but I've +nlewycky because he's been
> thinking about this recently & there are some wrinkles as I alluded to
> in my reply. Saying that the vtable pointer is const from
> post-construction to pre-destruction isn't quite correct & would break
> certain obscure but valid programs. That being said, there is some
> kind of lifetime marker we could create & use to indicate the
> semantics of the vtable pointer without breaking those programs. (key
> here is that C++ cares about how you derive the pointer to the object,
> not just its value - if you explicitly destroy an object then
> placement new another object over the same space, the implementation
> is allowed to assume that the old pointers you had to the object still
> point to the same kind of object - but the one returned from placement
> new isn't, even though those pointers may have the same value...
> that's my understanding/vague description of the issue)
>

Right. I've got a proposal in PR13940 that Chris has asked me to mail out
to llvmdev. I should do that soon, but I wanted to wait until I would have
the time to reply to the comments that came up and possibly even implement
the result.

Nick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20121003/1e242a48/attachment.html>