[cfe-dev] Inlining and virtualization in Clang/LLVM

Wed Oct 3 12:01:54 PDT 2012

On Sun, Sep 30, 2012 at 5:04 PM, David Blaikie <dblaikie at gmail.com> wrote:
> I'm not sure whether this is the exact problem at hand in your example,
> but one of the major hurdles llvm suffers when trying to devirtualize
> is the second point you made: it doesn't see the invariance of the
> table pointer post construction. In your specific example the
> constructors are trivial and inlinable so it I'm not sure why llvm
> would be having trouble proving the value of the vptr &
> devirtualizing. , perhaps due to them being static so the
> initialization is contingent on it being the first call (& llvm doesn't
> know that the vptr is constant after construction until destruction
> begins) & doesn't see the connection across all calls.
>
> So there are a few issues that need to be addressed to help this.
>
> One is some kind of constant range where the front end can promise not
> to modify certain values for a range (this might not be correct though
> - I remember some argument as to whether it was valid to destroy and
> then placement new the same type into that memory before the object
> goes out of scope - if so, then it's not obvious if the vptr is
> constant in any range) & secondly (at least for the case of non inline
> constructors) the ability to provide some kind of assert that, post
> construction, the vptr is equal to some known constant. (this assertion
> is currently possible with a branch to unreachable, except llvm throws
> out the unreachable code in SimplyCFG before any optimizations run - so
> we need to see if we can keep them around longer - there are some PRs
> filed for this but I haven't made much progress on it
> From: Matthieu Monrocq
> Sent: 9/29/2012 4:41 PM
> To: cfe-dev at cs.uiuc.edu; llvmdev
> Subject: [cfe-dev] Inlining and virtualization in Clang/LLVM
> Hello,
>
> at the moment the devirtualization of calls is performed in Clang (as
> far as I understand) whilst inlining and constant propagation are the
> optimizer's (LLVM) job.
>
> It is probably necessary for Clang to perform "some" devirtualization
> (the meaning of `final` is not known to LLVM), however all the stuff
> to determine what the dynamic type of a variable is seems redundant
> with LLVM, and is incomplete (in a way) when it's not perform *after*
> inlining and constant propagation.
>
>
> It seems to me therefore that because of this we are missing
> optimization opportunities. Suppose the following example program:
>
> #include <cstdio>
>
> struct Base { virtual void foo() = 0; };
>
> struct NothingDerived: Base { virtual void foo() {} };
>
> struct PrintDerived: Base { virtual void foo() { printf("Hello World!"); } };
>
> Base& select(int i) {
>     static NothingDerived nd;
>     static PrintDerived pd;
>     if (i % 2 == 0) { return nd; }
>     return pd;
> }
>
> int main() {
>     Base& b = select(0);
>     b.foo();
> }
>
>
> Which gives the following main function (using Try out LLVM and Clang):
>
> define i32 @main() uwtable {
> [...]
>
> _Z6selecti.exit:                                  ; preds = %13, %10, %7
>   %14 = load void (%struct.Base*)*** bitcast (%struct.NothingDerived*
> @_ZZ6selectiE2nd to void (%struct.Base*)***), align 8
>   %15 = load void (%struct.Base*)** %14, align 8
>   tail call void %15(%struct.Base* getelementptr inbounds
> (%struct.NothingDerived* @_ZZ6selectiE2nd, i64 0, i32 0))
>   ret i32 0
> }
>
>
> LLVM trivially sees through the select call and rightly deduce that we
> have to do with NothingDerived. However it does not go any step
> further and directly select NothingDerived::foo's function. Instead it
> dutifully performs all the bitcasting / pointer arithmetic necessary
> to access the pointer to function stored in the VTable and calls it
> through a pointer to function.
>
> I understand it would be awkward to have LLVM be aware of the virtual
> table implementation. Especially since even in C++ it varies from one
> implementation to another. However it seems to me that LLVM could
> still perform this optimization:
>
>  - LLVM having deduced the exact value to use (select(int)::nd) should
> be able to get directly to its v-ptr (the first field of Base)
>
> %struct.NothingDerived = type { %struct.Base }
> %struct.Base = type { i32 (...)** }
>
>
>  - the v-ptr (after construction) always point to the same v-table,
> which is a constant
>
> store i32 (...)** bitcast (i8** getelementptr inbounds ([3 x i8*]*
> @_ZTV14NothingDerived, i64 0, i64 2) to i32 (...)**), i32 (...)***
> getelementptr inbounds (%struct.NothingDerived* @_ZZ6selectiE2nd, i64
> 0, i32 0, i32 0), align 8
>
>
> - the offset in the v-table is "static"
>
> getelementptr inbounds (%struct.NothingDerived* @_ZZ6selectiE2nd, i64 0, i32 0)
>
>
> - the v-table being constant, what is stored at that offset is
> perfectly deducible as well
>
> @_ZTV14NothingDerived = linkonce_odr unnamed_addr constant [3 x i8*]
> [i8* null, i8* bitcast ({ i8*, i8*, i8* }* @_ZTI14NothingDerived to
> i8*), i8* bitcast (void (%struct.NothingDerived*)*
> @_ZN14NothingDerived3fooEv to i8*)]
>
>
> So the question is, what is lacking for LLVM to perform this optimization ?
>
> - Is it because of the loss of information in having the v-table
> stored as a "blob" of bytes ? (which means that Clang should pass more
> typed information, without changing the exact layout obviously given
> the ABI constraints)
>
> - Or is it something internal to LLVM (the information is somehow
> irremediably lost) ?
>
>
> I admit that reducing the virtual call overhead is probably not really
> worth it (in general), however devirtualizing calls also expose more
> inlining/context opportunities and it's hard (for me) to quantify what
> such an optimization could bring here. We should also consider the
> simplification in Clang (and other frontends) if LLVM could perform
> the job on its own.
>
> -- Matthieu
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

Hello David,

I have been thinking a bit more about this and as I find the problem
quite interesting.

If I get down one level (and revert to implementing objects in C):

#include <stdio.h>

typedef void (*Function)();

void print() { printf("Hello, World!"); }
void nothing() {}

Function get(int i) {
  if (i % 2 == 0) { return &print; }
  return ¬hing;
}

int main() {
  Function f = get(2);
  (*f)();
  return 0;
}

Which is admittedly quite similar, generates the following main:

define i32 @main() nounwind uwtable {
  %1 = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds
([14 x i8]* @.str, i64 0, i64 0)) nounwind
  ret i32 0
}

This falls out naturally from SSA form, I think. No "constification"
is necessary.

However in our case, it seems that the SSA form (which only references
the pointer to the structure itself, not the v-table pointer), is not
sufficient therefore to allow the optimizer to remember which value
the pointer should have.

Of course, in general I agree that the compiler should be told
explicitly that the value cannot change (otherwise it would not know
that opaque calls are not accessing that particular value within the
structure); at least until the start of the destructor.

I remember a discussion a while ago involving the design and
implementation of those "const-ranges" within the LLVM IR, do you
happen to know where it's at ?

-- Matthieu