[cfe-dev] C++ devirtualization

Matthieu Monrocq matthieu.monrocq at gmail.com
Wed Apr 23 09:02:34 PDT 2014


TL;DR: where I wonder about the viability of a seemingly simple change to
the code generation in Clang that seem to trick LLVM in doing a lot more
devirtualization work than it's doing today... without any change to LLVM
itself.


Hello,

I was reading Jan Hubicka's serie of articles on devirtualization in gcc,
of which I posted the latest opus to Reddit [1] where you can find links to
the 5 parts.

In Part 2, Jan brings up a discussion on llvm-dev [2] where were debated
the best ways to bring devirtualizations in the backend, for example in the
presence of opaque functions:

#include <cstdio>

struct Base { virtual void virtualCall() = 0; };
struct Derived: Base { virtual void virtualCall() override { printf("Hello,
World!\n"); } };

void inlinable(Base& b) { b.virtualCall(); }
void opaque(Base& b);

void func() {
    Derived d;

    inlinable(d); // statement A
    opaque(d); // Unknown implementation, LLVM assumes it may change the
virtual pointer

    inlinable(d); // statement B
}

Now, the difference of treatment between the statement A and the statement
B is that, after inlining, LLVM will de-virtualize the call in A but not in
B, as can be seen thanks to Coliru [3]:

define void @_Z4funcv() #0 {
  ; Derived d;
  %d = alloca %struct.Derived, align 8
  %1 = getelementptr inbounds %struct.Derived* %d, i64 0, i32 0, i32 0
  store i32 (...)** bitcast (i8** getelementptr inbounds ([3 x i8*]*
@_ZTV7Derived, i64 0, i64 2) to i32 (...)**), i32 (...)*** %1, align 8,
!tbaa !1
  %2 = getelementptr inbounds %struct.Derived* %d, i64 0, i32 0
  %3 = bitcast %struct.Derived* %d to void (%struct.Base*)***

  ; inlinable(d); // statement A
  %puts.i = call i32 @puts(i8* getelementptr inbounds ([14 x i8]* @str, i64
0, i64 0))

  ; opaque(d);
  call void @_Z6opaqueR4Base(%struct.Base* %2)

  ; inlinable(d); // statement B
  %4 = load void (%struct.Base*)*** %3, align 8, !tbaa !1
  %5 = load void (%struct.Base*)** %4, align 8
  call void %5(%struct.Base* %2)

  ret void
}


This is typical of the rift between the C++-aware front-end (Clang) and the
C++-unaware middle-end (LLVM):

- Clang knows that C++ does not allow a change of dynamic type, and thus of
v-ptr, during the lifetime of an object (after construction ended and
before destruction begins); but cannot do anything because it knows not how
to inline

- LLVM knows how to inline but is unaware of C++ rules and that those
obscure bitcasts are all about virtual calls


Now, as mentioned in [2] the discussion on how to make LLVM more "aware" of
what is going on is complicated; I have seen several proposals over time,
the merit of various intrinsics being discussed and no approach being
taken: it's a hard problem.


In this very special case, however, it seems to me that Clang could reach
out to LLVM. Suppose that Clang introduced a simple store instruction right
after the "opaque" call ?

If we simulate the Itanium ABI using a more C-ish approach [4]:

#include <cstdio>

struct Base;

struct BaseVTableLayout {
    void (*virtualCall)(Base*);
};

struct Base { BaseVTableLayout const* vptr; };
struct Derived: Base {};

void Derived_virtualCall(Base*) { printf("Hello, World!\n"); }

BaseVTableLayout const BaseVTable = { 0 };
BaseVTableLayout const DerivedVTable = { &Derived_virtualCall };

void inlinable(Base* b) { b->vptr->virtualCall(b); }

void opaque(Base* b);

void func() {
    Derived d;
    d.vptr = &DerivedVTable;

    inlinable(&d); // statement A

    opaque(&d); // Unknown implementation, LLVM assumes it may change the
virtual pointer

    inlinable(&d); // statement B
}


We see that the exact same behaviour is exhibited by LLVM: in fact the IR
is nearly a carbon-copy, except for the name of the globals.

In this C-ish approach, though, introducing the "redundant" store is
dead-easy. In fact, we can even be completely naive about it:

void func() {
    Derived d;
    d.vptr = &DerivedVTable;

    inlinable(&d); // statement A
    d.vptr = &DerivedVTable; // inform LLVM that the function is not
allowed to change the virtual pointer under our feet

    opaque(&d); // Unknown implementation, LLVM assumes it may change the
virtual pointer
    d.vptr = &DerivedVTable; // inform LLVM that the function is not
allowed to change the virtual pointer under our feet

    inlinable(&d); // statement B
    d.vptr = &DerivedVTable; // inform LLVM that the function is not
allowed to change the virtual pointer under our feet
}


And let's look at what LLVM generates:

define void @_Z4funcv() #1 {
  ; Derived d;
  %d = alloca %struct.Base, align 8
  %1 = getelementptr inbounds %struct.Base* %d, i64 0, i32 0

  ; inlinable(d); // statement A
  %puts.i = call i32 @puts(i8* getelementptr inbounds ([14 x i8]* @str, i64
0, i64 0)) #3

  ; opaque(d);
  store %struct.BaseVTableLayout* bitcast ({ void (%struct.Base*)* }*
@_ZL13DerivedVTable to %struct.BaseVTableLayout*),
%struct.BaseVTableLayout** %1, align 8, !tbaa !1

  call void @_Z6opaqueP4Base(%struct.Base* %d)

  store %struct.BaseVTableLayout* bitcast ({ void (%struct.Base*)* }*
@_ZL13DerivedVTable to %struct.BaseVTableLayout*),
%struct.BaseVTableLayout** %1, align 8, !tbaa !1

  ; inlinable(d); // statement B
  %puts.i1 = call i32 @puts(i8* getelementptr inbounds ([14 x i8]* @str,
i64 0, i64 0)) #3

  ret void

}


Bingo! The indirect virtual call in statement B has been inlined!


But let's not stop there. There are plenty of occasions where Clang does
not know the dynamic type and yet that can benefit from being
simple-minded. Suppose that we start from:

void funky(Base* b) {
    inlinable(b); // statement A

    opaque(b); // Unknown implementation, LLVM assumes it may change the
virtual pointer

    inlinable(b); // statement B
}

void func() {
    Derived d;
    d.vptr = &DerivedVTable;

    funky(&d);
}


Then let's apply this "the dynamic type will not change" rule [5]:

void funky(Base* b) {
    BaseVTableLayout const* vptr = b->vptr;
    inlinable(b); // statement A
    b->vptr = vptr; // inform LLVM that the function is not allowed to
change the virtual pointer under our feet

    opaque(b); // Unknown implementation, LLVM assumes it may change the
virtual pointer
    b->vptr = vptr; // inform LLVM that the function is not allowed to
change the virtual pointer under our feet

    inlinable(b); // statement B
    b->vptr = vptr; // inform LLVM that the function is not allowed to
change the virtual pointer under our feet
}

void func() {
    Derived d;
    d.vptr = &DerivedVTable;

    funky(&d);
}


And behold:

define void @_Z4funcv() #1 {
  ; Derived d;
  %d = alloca %struct.Base, align 8
  %1 = getelementptr inbounds %struct.Base* %d, i64 0, i32 0

  ; inlinable(d); // statement A
  %puts.i = call i32 @puts(i8* getelementptr inbounds ([14 x i8]* @str, i64
0, i64 0)) #3

  ; opaque(d);
  store %struct.BaseVTableLayout* bitcast ({ void (%struct.Base*)* }*
@_ZL13DerivedVTable to %struct.BaseVTableLayout*),
%struct.BaseVTableLayout** %1, align 8, !tbaa !1

  call void @_Z6opaqueP4Base(%struct.Base* %d)

  store %struct.BaseVTableLayout* bitcast ({ void (%struct.Base*)* }*
@_ZL13DerivedVTable to %struct.BaseVTableLayout*),
%struct.BaseVTableLayout** %1, align 8, !tbaa !1

  ; inlinable(d); // statement B
  %puts.i1 = call i32 @puts(i8* getelementptr inbounds ([14 x i8]* @str,
i64 0, i64 0)) #3

  ret void
}

For those wondering, in the back row, yes that is the exact same LLVM IR
than before; it just saw throw our the call to "funky".


And for more fun, trying it in a loop:

int add(Base** array, size_t size) {
    int total = 0;
    for (size_t i = 0; i != size; ++i) {
        Base* b = array[i];

        opaque(b);

        total += b->vptr->virtualCall(b);
    }
    return total;
}

int func() {
    Derived d0;
    d0.vptr = &DerivedVTable;
    Derived d1;
    d1.vptr = &DerivedVTable;
    Derived d2;
    d2.vptr = &DerivedVTable;

    Base* b[3] = { &d0, &d1, &d2 };

    return add(b, 3);
}

The IR looks ugly [6]: "add" is inlined in "func" and the loop is unrolled,
but LLVM does not know that "opaque" cannot change the v-ptr, and thus a
lot of virtual calls ensue. Fun fact: I started by forgetting to put the
call to "opaque" and LLVM directly optimized "func" to "ret 126".

So, let's apply our simple "the dynamic type will not change" rule like
before [7]:

int add(Base** array, size_t size) {
    int total = 0;
    for (size_t i = 0; i != size; ++i) {
        Base* b = array[i];
        BaseVTableLayout const* vptr = b->vptr;

        opaque(b);
        b->vptr = vptr;

        total += b->vptr->virtualCall(b);
        b->vptr = vptr;
    }
    return total;
}


And now I dare present the IR:

define i32 @_Z4funcv() #1 {
.lr.ph.i:
  %d0 = alloca %struct.Base, align 8
  %d1 = alloca %struct.Base, align 8
  %d2 = alloca %struct.Base, align 8

  %0 = getelementptr inbounds %struct.Base* %d0, i64 0, i32 0

  store %struct.BaseVTableLayout* bitcast ({ i32 (%struct.Base*)* }*
@_ZL13DerivedVTable to %struct.BaseVTableLayout*),
%struct.BaseVTableLayout** %0, align 8, !tbaa !5

  %1 = getelementptr inbounds %struct.Base* %d1, i64 0, i32 0

  store %struct.BaseVTableLayout* bitcast ({ i32 (%struct.Base*)* }*
@_ZL13DerivedVTable to %struct.BaseVTableLayout*),
%struct.BaseVTableLayout** %1, align 8, !tbaa !5

  %2 = getelementptr inbounds %struct.Base* %d2, i64 0, i32 0

  store %struct.BaseVTableLayout* bitcast ({ i32 (%struct.Base*)* }*
@_ZL13DerivedVTable to %struct.BaseVTableLayout*),
%struct.BaseVTableLayout** %2, align 8, !tbaa !5

  call void @_Z6opaqueP4Base(%struct.Base* %d0)

  store %struct.BaseVTableLayout* bitcast ({ i32 (%struct.Base*)* }*
@_ZL13DerivedVTable to %struct.BaseVTableLayout*),
%struct.BaseVTableLayout** %0, align 8, !tbaa !5

  call void @_Z6opaqueP4Base(%struct.Base* %d1)

  store %struct.BaseVTableLayout* bitcast ({ i32 (%struct.Base*)* }*
@_ZL13DerivedVTable to %struct.BaseVTableLayout*),
%struct.BaseVTableLayout** %1, align 8, !tbaa !5

  call void @_Z6opaqueP4Base(%struct.Base* %d2)

  ret i32 126
}

Where one note that the additions of 42 were folded together.



Conclusion/Digest:

The attempt to implement devirtualization in
CodeGenFunction::CanDevirtualizeMemberFunctionCall only get us so far. It
seems reasonable to take advantage of the "final" keyword and other
language-level constructs, but Clang does not (and should not) implement
inlining and constant propagation, and thus does not cross the bridge.
Meta-data proposals are still pending, and I still have hope for them, in
the mean time though a stop-gap alternative seems interesting.

A simple addition to the CodeGen process, expressing the rule **the dynamic
type of an object is not changed by a function call** in LLVM IR
instructions (caching the vptr of an instance as soon as it comes into
being and restoring it after each function call involving this instance),
seems to be sufficient to enable the devirtualization of calls by LLVM
whenever inlining and constant propagation make it possible.

The only downside that I could fathom are the redundant store instructions:
they are the witnesses that LLVM did not consider them redundant and
therefore could not have devirtualized calls without them. Proper meta-data
would solve this issue I suppose.

Still, in my naivete, I believe those redundant stores to cost less than
the devirtualization opportunities currently being missed.


I would be very interested in hearing the community feedback on this.



 [1]
http://www.reddit.com/r/cpp/comments/23rjda/honza_hubi%C4%8Dkas_blog_devirtualization_in_c/
 [2] http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-December/046072.html
 [3] http://coliru.stacked-crooked.com/a/0dc6a12cb5f27543
 [4] http://coliru.stacked-crooked.com/a/705f607005017c43
 [5] http://coliru.stacked-crooked.com/a/a34b6674255ad66c
 [6] http://coliru.stacked-crooked.com/a/93a493d80e7bcc3b
 [7] http://coliru.stacked-crooked.com/a/0f0d8470c1b066d0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20140423/b00341a4/attachment.html>


More information about the cfe-dev mailing list