[cfe-dev] Inlining and virtualization in Clang/LLVM

Sun Sep 30 08:04:01 PDT 2012

I'm not sure whether this is the exact problem at hand in your example,
but one of the major hurdles llvm suffers when trying to devirtualize
is the second point you made: it doesn't see the invariance of the
table pointer post construction. In your specific example the
constructors are trivial and inlinable so it I'm not sure why llvm
would be having trouble proving the value of the vptr &
devirtualizing. , perhaps due to them being static so the
initialization is contingent on it being the first call (& llvm doesn't
know that the vptr is constant after construction until destruction
begins) & doesn't see the connection across all calls.

So there are a few issues that need to be addressed to help this.

One is some kind of constant range where the front end can promise not
to modify certain values for a range (this might not be correct though
- I remember some argument as to whether it was valid to destroy and
then placement new the same type into that memory before the object
goes out of scope - if so, then it's not obvious if the vptr is
constant in any range) & secondly (at least for the case of non inline
constructors) the ability to provide some kind of assert that, post
construction, the vptr is equal to some known constant. (this assertion
is currently possible with a branch to unreachable, except llvm throws
out the unreachable code in SimplyCFG before any optimizations run - so
we need to see if we can keep them around longer - there are some PRs
filed for this but I haven't made much progress on it
From: Matthieu Monrocq
Sent: 9/29/2012 4:41 PM
To: cfe-dev at cs.uiuc.edu; llvmdev
Subject: [cfe-dev] Inlining and virtualization in Clang/LLVM
Hello,

at the moment the devirtualization of calls is performed in Clang (as
far as I understand) whilst inlining and constant propagation are the
optimizer's (LLVM) job.

It is probably necessary for Clang to perform "some" devirtualization
(the meaning of `final` is not known to LLVM), however all the stuff
to determine what the dynamic type of a variable is seems redundant
with LLVM, and is incomplete (in a way) when it's not perform *after*
inlining and constant propagation.

It seems to me therefore that because of this we are missing
optimization opportunities. Suppose the following example program:

#include <cstdio>

struct Base { virtual void foo() = 0; };

struct NothingDerived: Base { virtual void foo() {} };

struct PrintDerived: Base { virtual void foo() { printf("Hello World!"); } };

Base& select(int i) {
    static NothingDerived nd;
    static PrintDerived pd;
    if (i % 2 == 0) { return nd; }
    return pd;
}

int main() {
    Base& b = select(0);
    b.foo();
}

Which gives the following main function (using Try out LLVM and Clang):

define i32 @main() uwtable {
[...]

_Z6selecti.exit:                                  ; preds = %13, %10, %7
  %14 = load void (%struct.Base*)*** bitcast (%struct.NothingDerived*
@_ZZ6selectiE2nd to void (%struct.Base*)***), align 8
  %15 = load void (%struct.Base*)** %14, align 8
  tail call void %15(%struct.Base* getelementptr inbounds
(%struct.NothingDerived* @_ZZ6selectiE2nd, i64 0, i32 0))
  ret i32 0
}

LLVM trivially sees through the select call and rightly deduce that we
have to do with NothingDerived. However it does not go any step
further and directly select NothingDerived::foo's function. Instead it
dutifully performs all the bitcasting / pointer arithmetic necessary
to access the pointer to function stored in the VTable and calls it
through a pointer to function.

I understand it would be awkward to have LLVM be aware of the virtual
table implementation. Especially since even in C++ it varies from one
implementation to another. However it seems to me that LLVM could
still perform this optimization:

 - LLVM having deduced the exact value to use (select(int)::nd) should
be able to get directly to its v-ptr (the first field of Base)

%struct.NothingDerived = type { %struct.Base }
%struct.Base = type { i32 (...)** }

 - the v-ptr (after construction) always point to the same v-table,
which is a constant

store i32 (...)** bitcast (i8** getelementptr inbounds ([3 x i8*]*
@_ZTV14NothingDerived, i64 0, i64 2) to i32 (...)**), i32 (...)***
getelementptr inbounds (%struct.NothingDerived* @_ZZ6selectiE2nd, i64
0, i32 0, i32 0), align 8

- the offset in the v-table is "static"

getelementptr inbounds (%struct.NothingDerived* @_ZZ6selectiE2nd, i64 0, i32 0)

- the v-table being constant, what is stored at that offset is
perfectly deducible as well

@_ZTV14NothingDerived = linkonce_odr unnamed_addr constant [3 x i8*]
[i8* null, i8* bitcast ({ i8*, i8*, i8* }* @_ZTI14NothingDerived to
i8*), i8* bitcast (void (%struct.NothingDerived*)*
@_ZN14NothingDerived3fooEv to i8*)]

So the question is, what is lacking for LLVM to perform this optimization ?

- Is it because of the loss of information in having the v-table
stored as a "blob" of bytes ? (which means that Clang should pass more
typed information, without changing the exact layout obviously given
the ABI constraints)

- Or is it something internal to LLVM (the information is somehow
irremediably lost) ?

I admit that reducing the virtual call overhead is probably not really
worth it (in general), however devirtualizing calls also expose more
inlining/context opportunities and it's hard (for me) to quantify what
such an optimization could bring here. We should also consider the
simplification in Clang (and other frontends) if LLVM could perform
the job on its own.

-- Matthieu
_______________________________________________
cfe-dev mailing list
cfe-dev at cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev