[cfe-dev] C++ devirtualization
Reid Kleckner
rnk at google.com
Fri Apr 25 11:06:57 PDT 2014
The last time we discussed this, this test case came up:
char alignas(A, B) buffer[max(sizeof(A), sizeof(B))];
A *a = reinterpret_cast<A*>(buffer);
B *b = reinterpret_cast<B*>(buffer);
new(buffer) A;
a->vfn();
new(buffer) B;
b->vfn();
Imagine those placement news are buried in an external function call.
Applying your transformation of adding stores breaks the program, if I'm
not mistaken:
char alignas(A, B) buffer[max(sizeof(A), sizeof(B))];
A *a = reinterpret_cast<A*>(buffer);
B *b = reinterpret_cast<B*>(buffer);
new(buffer) A;
void *vptr = a->vptr;
a->vfn();
new(buffer) B;
a->vptr = vptr; // Nope, we changed it. a is dead, long live b.
b->vfn();
On Wed, Apr 23, 2014 at 9:02 AM, Matthieu Monrocq <
matthieu.monrocq at gmail.com> wrote:
> TL;DR: where I wonder about the viability of a seemingly simple change to
> the code generation in Clang that seem to trick LLVM in doing a lot more
> devirtualization work than it's doing today... without any change to LLVM
> itself.
>
>
> Hello,
>
> I was reading Jan Hubicka's serie of articles on devirtualization in gcc,
> of which I posted the latest opus to Reddit [1] where you can find links to
> the 5 parts.
>
> In Part 2, Jan brings up a discussion on llvm-dev [2] where were debated
> the best ways to bring devirtualizations in the backend, for example in the
> presence of opaque functions:
>
> #include <cstdio>
>
> struct Base { virtual void virtualCall() = 0; };
> struct Derived: Base { virtual void virtualCall() override {
> printf("Hello, World!\n"); } };
>
> void inlinable(Base& b) { b.virtualCall(); }
> void opaque(Base& b);
>
> void func() {
> Derived d;
>
> inlinable(d); // statement A
> opaque(d); // Unknown implementation, LLVM assumes it may change the
> virtual pointer
>
> inlinable(d); // statement B
> }
>
> Now, the difference of treatment between the statement A and the statement
> B is that, after inlining, LLVM will de-virtualize the call in A but not in
> B, as can be seen thanks to Coliru [3]:
>
> define void @_Z4funcv() #0 {
> ; Derived d;
> %d = alloca %struct.Derived, align 8
> %1 = getelementptr inbounds %struct.Derived* %d, i64 0, i32 0, i32 0
> store i32 (...)** bitcast (i8** getelementptr inbounds ([3 x i8*]*
> @_ZTV7Derived, i64 0, i64 2) to i32 (...)**), i32 (...)*** %1, align 8,
> !tbaa !1
> %2 = getelementptr inbounds %struct.Derived* %d, i64 0, i32 0
> %3 = bitcast %struct.Derived* %d to void (%struct.Base*)***
>
> ; inlinable(d); // statement A
> %puts.i = call i32 @puts(i8* getelementptr inbounds ([14 x i8]* @str,
> i64 0, i64 0))
>
> ; opaque(d);
> call void @_Z6opaqueR4Base(%struct.Base* %2)
>
> ; inlinable(d); // statement B
> %4 = load void (%struct.Base*)*** %3, align 8, !tbaa !1
> %5 = load void (%struct.Base*)** %4, align 8
> call void %5(%struct.Base* %2)
>
> ret void
> }
>
>
> This is typical of the rift between the C++-aware front-end (Clang) and
> the C++-unaware middle-end (LLVM):
>
> - Clang knows that C++ does not allow a change of dynamic type, and thus
> of v-ptr, during the lifetime of an object (after construction ended and
> before destruction begins); but cannot do anything because it knows not how
> to inline
>
> - LLVM knows how to inline but is unaware of C++ rules and that those
> obscure bitcasts are all about virtual calls
>
>
> Now, as mentioned in [2] the discussion on how to make LLVM more "aware"
> of what is going on is complicated; I have seen several proposals over
> time, the merit of various intrinsics being discussed and no approach being
> taken: it's a hard problem.
>
>
> In this very special case, however, it seems to me that Clang could reach
> out to LLVM. Suppose that Clang introduced a simple store instruction right
> after the "opaque" call ?
>
> If we simulate the Itanium ABI using a more C-ish approach [4]:
>
> #include <cstdio>
>
> struct Base;
>
> struct BaseVTableLayout {
> void (*virtualCall)(Base*);
> };
>
> struct Base { BaseVTableLayout const* vptr; };
> struct Derived: Base {};
>
> void Derived_virtualCall(Base*) { printf("Hello, World!\n"); }
>
> BaseVTableLayout const BaseVTable = { 0 };
> BaseVTableLayout const DerivedVTable = { &Derived_virtualCall };
>
> void inlinable(Base* b) { b->vptr->virtualCall(b); }
>
> void opaque(Base* b);
>
> void func() {
> Derived d;
> d.vptr = &DerivedVTable;
>
> inlinable(&d); // statement A
>
> opaque(&d); // Unknown implementation, LLVM assumes it may change the
> virtual pointer
>
> inlinable(&d); // statement B
> }
>
>
> We see that the exact same behaviour is exhibited by LLVM: in fact the IR
> is nearly a carbon-copy, except for the name of the globals.
>
> In this C-ish approach, though, introducing the "redundant" store is
> dead-easy. In fact, we can even be completely naive about it:
>
> void func() {
> Derived d;
> d.vptr = &DerivedVTable;
>
> inlinable(&d); // statement A
> d.vptr = &DerivedVTable; // inform LLVM that the function is not
> allowed to change the virtual pointer under our feet
>
> opaque(&d); // Unknown implementation, LLVM assumes it may change the
> virtual pointer
> d.vptr = &DerivedVTable; // inform LLVM that the function is not
> allowed to change the virtual pointer under our feet
>
> inlinable(&d); // statement B
> d.vptr = &DerivedVTable; // inform LLVM that the function is not
> allowed to change the virtual pointer under our feet
> }
>
>
> And let's look at what LLVM generates:
>
> define void @_Z4funcv() #1 {
> ; Derived d;
> %d = alloca %struct.Base, align 8
> %1 = getelementptr inbounds %struct.Base* %d, i64 0, i32 0
>
> ; inlinable(d); // statement A
> %puts.i = call i32 @puts(i8* getelementptr inbounds ([14 x i8]* @str,
> i64 0, i64 0)) #3
>
> ; opaque(d);
> store %struct.BaseVTableLayout* bitcast ({ void (%struct.Base*)* }*
> @_ZL13DerivedVTable to %struct.BaseVTableLayout*),
> %struct.BaseVTableLayout** %1, align 8, !tbaa !1
>
> call void @_Z6opaqueP4Base(%struct.Base* %d)
>
> store %struct.BaseVTableLayout* bitcast ({ void (%struct.Base*)* }*
> @_ZL13DerivedVTable to %struct.BaseVTableLayout*),
> %struct.BaseVTableLayout** %1, align 8, !tbaa !1
>
> ; inlinable(d); // statement B
> %puts.i1 = call i32 @puts(i8* getelementptr inbounds ([14 x i8]* @str,
> i64 0, i64 0)) #3
>
> ret void
>
> }
>
>
> Bingo! The indirect virtual call in statement B has been inlined!
>
>
> But let's not stop there. There are plenty of occasions where Clang does
> not know the dynamic type and yet that can benefit from being
> simple-minded. Suppose that we start from:
>
> void funky(Base* b) {
> inlinable(b); // statement A
>
> opaque(b); // Unknown implementation, LLVM assumes it may change the
> virtual pointer
>
> inlinable(b); // statement B
> }
>
> void func() {
> Derived d;
> d.vptr = &DerivedVTable;
>
> funky(&d);
> }
>
>
> Then let's apply this "the dynamic type will not change" rule [5]:
>
> void funky(Base* b) {
> BaseVTableLayout const* vptr = b->vptr;
> inlinable(b); // statement A
> b->vptr = vptr; // inform LLVM that the function is not allowed to
> change the virtual pointer under our feet
>
> opaque(b); // Unknown implementation, LLVM assumes it may change the
> virtual pointer
> b->vptr = vptr; // inform LLVM that the function is not allowed to
> change the virtual pointer under our feet
>
> inlinable(b); // statement B
> b->vptr = vptr; // inform LLVM that the function is not allowed to
> change the virtual pointer under our feet
> }
>
> void func() {
> Derived d;
> d.vptr = &DerivedVTable;
>
> funky(&d);
> }
>
>
> And behold:
>
> define void @_Z4funcv() #1 {
> ; Derived d;
> %d = alloca %struct.Base, align 8
> %1 = getelementptr inbounds %struct.Base* %d, i64 0, i32 0
>
> ; inlinable(d); // statement A
> %puts.i = call i32 @puts(i8* getelementptr inbounds ([14 x i8]* @str,
> i64 0, i64 0)) #3
>
> ; opaque(d);
> store %struct.BaseVTableLayout* bitcast ({ void (%struct.Base*)* }*
> @_ZL13DerivedVTable to %struct.BaseVTableLayout*),
> %struct.BaseVTableLayout** %1, align 8, !tbaa !1
>
> call void @_Z6opaqueP4Base(%struct.Base* %d)
>
> store %struct.BaseVTableLayout* bitcast ({ void (%struct.Base*)* }*
> @_ZL13DerivedVTable to %struct.BaseVTableLayout*),
> %struct.BaseVTableLayout** %1, align 8, !tbaa !1
>
> ; inlinable(d); // statement B
> %puts.i1 = call i32 @puts(i8* getelementptr inbounds ([14 x i8]* @str,
> i64 0, i64 0)) #3
>
> ret void
> }
>
> For those wondering, in the back row, yes that is the exact same LLVM IR
> than before; it just saw throw our the call to "funky".
>
>
> And for more fun, trying it in a loop:
>
> int add(Base** array, size_t size) {
> int total = 0;
> for (size_t i = 0; i != size; ++i) {
> Base* b = array[i];
>
> opaque(b);
>
> total += b->vptr->virtualCall(b);
> }
> return total;
> }
>
> int func() {
> Derived d0;
> d0.vptr = &DerivedVTable;
> Derived d1;
> d1.vptr = &DerivedVTable;
> Derived d2;
> d2.vptr = &DerivedVTable;
>
> Base* b[3] = { &d0, &d1, &d2 };
>
> return add(b, 3);
> }
>
> The IR looks ugly [6]: "add" is inlined in "func" and the loop is
> unrolled, but LLVM does not know that "opaque" cannot change the v-ptr, and
> thus a lot of virtual calls ensue. Fun fact: I started by forgetting to put
> the call to "opaque" and LLVM directly optimized "func" to "ret 126".
>
> So, let's apply our simple "the dynamic type will not change" rule like
> before [7]:
>
> int add(Base** array, size_t size) {
> int total = 0;
> for (size_t i = 0; i != size; ++i) {
> Base* b = array[i];
> BaseVTableLayout const* vptr = b->vptr;
>
> opaque(b);
> b->vptr = vptr;
>
> total += b->vptr->virtualCall(b);
> b->vptr = vptr;
> }
> return total;
> }
>
>
> And now I dare present the IR:
>
> define i32 @_Z4funcv() #1 {
> .lr.ph.i:
> %d0 = alloca %struct.Base, align 8
> %d1 = alloca %struct.Base, align 8
> %d2 = alloca %struct.Base, align 8
>
> %0 = getelementptr inbounds %struct.Base* %d0, i64 0, i32 0
>
> store %struct.BaseVTableLayout* bitcast ({ i32 (%struct.Base*)* }*
> @_ZL13DerivedVTable to %struct.BaseVTableLayout*),
> %struct.BaseVTableLayout** %0, align 8, !tbaa !5
>
> %1 = getelementptr inbounds %struct.Base* %d1, i64 0, i32 0
>
> store %struct.BaseVTableLayout* bitcast ({ i32 (%struct.Base*)* }*
> @_ZL13DerivedVTable to %struct.BaseVTableLayout*),
> %struct.BaseVTableLayout** %1, align 8, !tbaa !5
>
> %2 = getelementptr inbounds %struct.Base* %d2, i64 0, i32 0
>
> store %struct.BaseVTableLayout* bitcast ({ i32 (%struct.Base*)* }*
> @_ZL13DerivedVTable to %struct.BaseVTableLayout*),
> %struct.BaseVTableLayout** %2, align 8, !tbaa !5
>
> call void @_Z6opaqueP4Base(%struct.Base* %d0)
>
> store %struct.BaseVTableLayout* bitcast ({ i32 (%struct.Base*)* }*
> @_ZL13DerivedVTable to %struct.BaseVTableLayout*),
> %struct.BaseVTableLayout** %0, align 8, !tbaa !5
>
> call void @_Z6opaqueP4Base(%struct.Base* %d1)
>
> store %struct.BaseVTableLayout* bitcast ({ i32 (%struct.Base*)* }*
> @_ZL13DerivedVTable to %struct.BaseVTableLayout*),
> %struct.BaseVTableLayout** %1, align 8, !tbaa !5
>
> call void @_Z6opaqueP4Base(%struct.Base* %d2)
>
> ret i32 126
> }
>
> Where one note that the additions of 42 were folded together.
>
>
>
> Conclusion/Digest:
>
> The attempt to implement devirtualization in
> CodeGenFunction::CanDevirtualizeMemberFunctionCall only get us so far. It
> seems reasonable to take advantage of the "final" keyword and other
> language-level constructs, but Clang does not (and should not) implement
> inlining and constant propagation, and thus does not cross the bridge.
> Meta-data proposals are still pending, and I still have hope for them, in
> the mean time though a stop-gap alternative seems interesting.
>
> A simple addition to the CodeGen process, expressing the rule **the
> dynamic type of an object is not changed by a function call** in LLVM IR
> instructions (caching the vptr of an instance as soon as it comes into
> being and restoring it after each function call involving this instance),
> seems to be sufficient to enable the devirtualization of calls by LLVM
> whenever inlining and constant propagation make it possible.
>
> The only downside that I could fathom are the redundant store
> instructions: they are the witnesses that LLVM did not consider them
> redundant and therefore could not have devirtualized calls without them.
> Proper meta-data would solve this issue I suppose.
>
> Still, in my naivete, I believe those redundant stores to cost less than
> the devirtualization opportunities currently being missed.
>
>
> I would be very interested in hearing the community feedback on this.
>
>
>
> [1]
> http://www.reddit.com/r/cpp/comments/23rjda/honza_hubi%C4%8Dkas_blog_devirtualization_in_c/
> [2] http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-December/046072.html
> [3] http://coliru.stacked-crooked.com/a/0dc6a12cb5f27543
> [4] http://coliru.stacked-crooked.com/a/705f607005017c43
> [5] http://coliru.stacked-crooked.com/a/a34b6674255ad66c
> [6] http://coliru.stacked-crooked.com/a/93a493d80e7bcc3b
> [7] http://coliru.stacked-crooked.com/a/0f0d8470c1b066d0
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20140425/5676c75a/attachment.html>
More information about the cfe-dev
mailing list