[cfe-dev] C++ devirtualization

Fri Apr 25 11:06:57 PDT 2014

The last time we discussed this, this test case came up:

  char alignas(A, B) buffer[max(sizeof(A), sizeof(B))];
  A *a = reinterpret_cast<A*>(buffer);
  B *b = reinterpret_cast<B*>(buffer);
  new(buffer) A;
  a->vfn();
  new(buffer) B;
  b->vfn();

Imagine those placement news are buried in an external function call.

Applying your transformation of adding stores breaks the program, if I'm
not mistaken:

  char alignas(A, B) buffer[max(sizeof(A), sizeof(B))];
  A *a = reinterpret_cast<A*>(buffer);
  B *b = reinterpret_cast<B*>(buffer);
  new(buffer) A;
  void *vptr = a->vptr;
  a->vfn();
  new(buffer) B;
  a->vptr = vptr; // Nope, we changed it.  a is dead, long live b.
  b->vfn();

On Wed, Apr 23, 2014 at 9:02 AM, Matthieu Monrocq <
matthieu.monrocq at gmail.com> wrote:

> TL;DR: where I wonder about the viability of a seemingly simple change to
> the code generation in Clang that seem to trick LLVM in doing a lot more
> devirtualization work than it's doing today... without any change to LLVM
> itself.
>
>
> Hello,
>
> I was reading Jan Hubicka's serie of articles on devirtualization in gcc,
> of which I posted the latest opus to Reddit [1] where you can find links to
> the 5 parts.
>
> In Part 2, Jan brings up a discussion on llvm-dev [2] where were debated
> the best ways to bring devirtualizations in the backend, for example in the
> presence of opaque functions:
>
> #include <cstdio>
>
> struct Base { virtual void virtualCall() = 0; };
> struct Derived: Base { virtual void virtualCall() override {
> printf("Hello, World!\n"); } };
>
> void inlinable(Base& b) { b.virtualCall(); }
> void opaque(Base& b);
>
> void func() {
>     Derived d;
>
>     inlinable(d); // statement A
>     opaque(d); // Unknown implementation, LLVM assumes it may change the
> virtual pointer
>
>     inlinable(d); // statement B
> }
>
> Now, the difference of treatment between the statement A and the statement
> B is that, after inlining, LLVM will de-virtualize the call in A but not in
> B, as can be seen thanks to Coliru [3]:
>
> define void @_Z4funcv() #0 {
>   ; Derived d;
>   %d = alloca %struct.Derived, align 8
>   %1 = getelementptr inbounds %struct.Derived* %d, i64 0, i32 0, i32 0
>   store i32 (...)** bitcast (i8** getelementptr inbounds ([3 x i8*]*
> @_ZTV7Derived, i64 0, i64 2) to i32 (...)**), i32 (...)*** %1, align 8,
> !tbaa !1
>   %2 = getelementptr inbounds %struct.Derived* %d, i64 0, i32 0
>   %3 = bitcast %struct.Derived* %d to void (%struct.Base*)***
>
>   ; inlinable(d); // statement A
>   %puts.i = call i32 @puts(i8* getelementptr inbounds ([14 x i8]* @str,
> i64 0, i64 0))
>
>   ; opaque(d);
>   call void @_Z6opaqueR4Base(%struct.Base* %2)
>
>   ; inlinable(d); // statement B
>   %4 = load void (%struct.Base*)*** %3, align 8, !tbaa !1
>   %5 = load void (%struct.Base*)** %4, align 8
>   call void %5(%struct.Base* %2)
>
>   ret void
> }
>
>
> This is typical of the rift between the C++-aware front-end (Clang) and
> the C++-unaware middle-end (LLVM):
>
> - Clang knows that C++ does not allow a change of dynamic type, and thus
> of v-ptr, during the lifetime of an object (after construction ended and
> before destruction begins); but cannot do anything because it knows not how
> to inline
>
> - LLVM knows how to inline but is unaware of C++ rules and that those
> obscure bitcasts are all about virtual calls
>
>
> Now, as mentioned in [2] the discussion on how to make LLVM more "aware"
> of what is going on is complicated; I have seen several proposals over
> time, the merit of various intrinsics being discussed and no approach being
> taken: it's a hard problem.
>
>
> In this very special case, however, it seems to me that Clang could reach
> out to LLVM. Suppose that Clang introduced a simple store instruction right
> after the "opaque" call ?
>
> If we simulate the Itanium ABI using a more C-ish approach [4]:
>
> #include <cstdio>
>
> struct Base;
>
> struct BaseVTableLayout {
>     void (*virtualCall)(Base*);
> };
>
> struct Base { BaseVTableLayout const* vptr; };
> struct Derived: Base {};
>
> void Derived_virtualCall(Base*) { printf("Hello, World!\n"); }
>
> BaseVTableLayout const BaseVTable = { 0 };
> BaseVTableLayout const DerivedVTable = { &Derived_virtualCall };
>
> void inlinable(Base* b) { b->vptr->virtualCall(b); }
>
> void opaque(Base* b);
>
> void func() {
>     Derived d;
>     d.vptr = &DerivedVTable;
>
>     inlinable(&d); // statement A
>
>     opaque(&d); // Unknown implementation, LLVM assumes it may change the
> virtual pointer
>
>     inlinable(&d); // statement B
> }
>
>
> We see that the exact same behaviour is exhibited by LLVM: in fact the IR
> is nearly a carbon-copy, except for the name of the globals.
>
> In this C-ish approach, though, introducing the "redundant" store is
> dead-easy. In fact, we can even be completely naive about it:
>
> void func() {
>     Derived d;
>     d.vptr = &DerivedVTable;
>
>     inlinable(&d); // statement A
>     d.vptr = &DerivedVTable; // inform LLVM that the function is not
> allowed to change the virtual pointer under our feet
>
>     opaque(&d); // Unknown implementation, LLVM assumes it may change the
> virtual pointer
>     d.vptr = &DerivedVTable; // inform LLVM that the function is not
> allowed to change the virtual pointer under our feet
>
>     inlinable(&d); // statement B
>     d.vptr = &DerivedVTable; // inform LLVM that the function is not
> allowed to change the virtual pointer under our feet
> }
>
>
> And let's look at what LLVM generates:
>
> define void @_Z4funcv() #1 {
>   ; Derived d;
>   %d = alloca %struct.Base, align 8
>   %1 = getelementptr inbounds %struct.Base* %d, i64 0, i32 0
>
>   ; inlinable(d); // statement A
>   %puts.i = call i32 @puts(i8* getelementptr inbounds ([14 x i8]* @str,
> i64 0, i64 0)) #3
>
>   ; opaque(d);
>   store %struct.BaseVTableLayout* bitcast ({ void (%struct.Base*)* }*
> @_ZL13DerivedVTable to %struct.BaseVTableLayout*),
> %struct.BaseVTableLayout** %1, align 8, !tbaa !1
>
>   call void @_Z6opaqueP4Base(%struct.Base* %d)
>
>   store %struct.BaseVTableLayout* bitcast ({ void (%struct.Base*)* }*
> @_ZL13DerivedVTable to %struct.BaseVTableLayout*),
> %struct.BaseVTableLayout** %1, align 8, !tbaa !1
>
>   ; inlinable(d); // statement B
>   %puts.i1 = call i32 @puts(i8* getelementptr inbounds ([14 x i8]* @str,
> i64 0, i64 0)) #3
>
>   ret void
>
> }
>
>
> Bingo! The indirect virtual call in statement B has been inlined!
>
>
> But let's not stop there. There are plenty of occasions where Clang does
> not know the dynamic type and yet that can benefit from being
> simple-minded. Suppose that we start from:
>
> void funky(Base* b) {
>     inlinable(b); // statement A
>
>     opaque(b); // Unknown implementation, LLVM assumes it may change the
> virtual pointer
>
>     inlinable(b); // statement B
> }
>
> void func() {
>     Derived d;
>     d.vptr = &DerivedVTable;
>
>     funky(&d);
> }
>
>
> Then let's apply this "the dynamic type will not change" rule [5]:
>
> void funky(Base* b) {
>     BaseVTableLayout const* vptr = b->vptr;
>     inlinable(b); // statement A
>     b->vptr = vptr; // inform LLVM that the function is not allowed to
> change the virtual pointer under our feet
>
>     opaque(b); // Unknown implementation, LLVM assumes it may change the
> virtual pointer
>     b->vptr = vptr; // inform LLVM that the function is not allowed to
> change the virtual pointer under our feet
>
>     inlinable(b); // statement B
>     b->vptr = vptr; // inform LLVM that the function is not allowed to
> change the virtual pointer under our feet
> }
>
> void func() {
>     Derived d;
>     d.vptr = &DerivedVTable;
>
>     funky(&d);
> }
>
>
> And behold:
>
> define void @_Z4funcv() #1 {
>   ; Derived d;
>   %d = alloca %struct.Base, align 8
>   %1 = getelementptr inbounds %struct.Base* %d, i64 0, i32 0
>
>   ; inlinable(d); // statement A
>   %puts.i = call i32 @puts(i8* getelementptr inbounds ([14 x i8]* @str,
> i64 0, i64 0)) #3
>
>   ; opaque(d);
>   store %struct.BaseVTableLayout* bitcast ({ void (%struct.Base*)* }*
> @_ZL13DerivedVTable to %struct.BaseVTableLayout*),
> %struct.BaseVTableLayout** %1, align 8, !tbaa !1
>
>   call void @_Z6opaqueP4Base(%struct.Base* %d)
>
>   store %struct.BaseVTableLayout* bitcast ({ void (%struct.Base*)* }*
> @_ZL13DerivedVTable to %struct.BaseVTableLayout*),
> %struct.BaseVTableLayout** %1, align 8, !tbaa !1
>
>   ; inlinable(d); // statement B
>   %puts.i1 = call i32 @puts(i8* getelementptr inbounds ([14 x i8]* @str,
> i64 0, i64 0)) #3
>
>   ret void
> }
>
> For those wondering, in the back row, yes that is the exact same LLVM IR
> than before; it just saw throw our the call to "funky".
>
>
> And for more fun, trying it in a loop:
>
> int add(Base** array, size_t size) {
>     int total = 0;
>     for (size_t i = 0; i != size; ++i) {
>         Base* b = array[i];
>
>         opaque(b);
>
>         total += b->vptr->virtualCall(b);
>     }
>     return total;
> }
>
> int func() {
>     Derived d0;
>     d0.vptr = &DerivedVTable;
>     Derived d1;
>     d1.vptr = &DerivedVTable;
>     Derived d2;
>     d2.vptr = &DerivedVTable;
>
>     Base* b[3] = { &d0, &d1, &d2 };
>
>     return add(b, 3);
> }
>
> The IR looks ugly [6]: "add" is inlined in "func" and the loop is
> unrolled, but LLVM does not know that "opaque" cannot change the v-ptr, and
> thus a lot of virtual calls ensue. Fun fact: I started by forgetting to put
> the call to "opaque" and LLVM directly optimized "func" to "ret 126".
>
> So, let's apply our simple "the dynamic type will not change" rule like
> before [7]:
>
> int add(Base** array, size_t size) {
>     int total = 0;
>     for (size_t i = 0; i != size; ++i) {
>         Base* b = array[i];
>         BaseVTableLayout const* vptr = b->vptr;
>
>         opaque(b);
>         b->vptr = vptr;
>
>         total += b->vptr->virtualCall(b);
>         b->vptr = vptr;
>     }
>     return total;
> }
>
>
> And now I dare present the IR:
>
> define i32 @_Z4funcv() #1 {
> .lr.ph.i:
>   %d0 = alloca %struct.Base, align 8
>   %d1 = alloca %struct.Base, align 8
>   %d2 = alloca %struct.Base, align 8
>
>   %0 = getelementptr inbounds %struct.Base* %d0, i64 0, i32 0
>
>   store %struct.BaseVTableLayout* bitcast ({ i32 (%struct.Base*)* }*
> @_ZL13DerivedVTable to %struct.BaseVTableLayout*),
> %struct.BaseVTableLayout** %0, align 8, !tbaa !5
>
>   %1 = getelementptr inbounds %struct.Base* %d1, i64 0, i32 0
>
>   store %struct.BaseVTableLayout* bitcast ({ i32 (%struct.Base*)* }*
> @_ZL13DerivedVTable to %struct.BaseVTableLayout*),
> %struct.BaseVTableLayout** %1, align 8, !tbaa !5
>
>   %2 = getelementptr inbounds %struct.Base* %d2, i64 0, i32 0
>
>   store %struct.BaseVTableLayout* bitcast ({ i32 (%struct.Base*)* }*
> @_ZL13DerivedVTable to %struct.BaseVTableLayout*),
> %struct.BaseVTableLayout** %2, align 8, !tbaa !5
>
>   call void @_Z6opaqueP4Base(%struct.Base* %d0)
>
>   store %struct.BaseVTableLayout* bitcast ({ i32 (%struct.Base*)* }*
> @_ZL13DerivedVTable to %struct.BaseVTableLayout*),
> %struct.BaseVTableLayout** %0, align 8, !tbaa !5
>
>   call void @_Z6opaqueP4Base(%struct.Base* %d1)
>
>   store %struct.BaseVTableLayout* bitcast ({ i32 (%struct.Base*)* }*
> @_ZL13DerivedVTable to %struct.BaseVTableLayout*),
> %struct.BaseVTableLayout** %1, align 8, !tbaa !5
>
>   call void @_Z6opaqueP4Base(%struct.Base* %d2)
>
>   ret i32 126
> }
>
> Where one note that the additions of 42 were folded together.
>
>
>
> Conclusion/Digest:
>
> The attempt to implement devirtualization in
> CodeGenFunction::CanDevirtualizeMemberFunctionCall only get us so far. It
> seems reasonable to take advantage of the "final" keyword and other
> language-level constructs, but Clang does not (and should not) implement
> inlining and constant propagation, and thus does not cross the bridge.
> Meta-data proposals are still pending, and I still have hope for them, in
> the mean time though a stop-gap alternative seems interesting.
>
> A simple addition to the CodeGen process, expressing the rule **the
> dynamic type of an object is not changed by a function call** in LLVM IR
> instructions (caching the vptr of an instance as soon as it comes into
> being and restoring it after each function call involving this instance),
> seems to be sufficient to enable the devirtualization of calls by LLVM
> whenever inlining and constant propagation make it possible.
>
> The only downside that I could fathom are the redundant store
> instructions: they are the witnesses that LLVM did not consider them
> redundant and therefore could not have devirtualized calls without them.
> Proper meta-data would solve this issue I suppose.
>
> Still, in my naivete, I believe those redundant stores to cost less than
> the devirtualization opportunities currently being missed.
>
>
> I would be very interested in hearing the community feedback on this.
>
>
>
>  [1]
> http://www.reddit.com/r/cpp/comments/23rjda/honza_hubi%C4%8Dkas_blog_devirtualization_in_c/
>  [2] http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-December/046072.html
>  [3] http://coliru.stacked-crooked.com/a/0dc6a12cb5f27543
>  [4] http://coliru.stacked-crooked.com/a/705f607005017c43
>  [5] http://coliru.stacked-crooked.com/a/a34b6674255ad66c
>  [6] http://coliru.stacked-crooked.com/a/93a493d80e7bcc3b
>  [7] http://coliru.stacked-crooked.com/a/0f0d8470c1b066d0
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20140425/5676c75a/attachment.html>