[llvm-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR

Mon Feb 29 13:53:31 PST 2016

Hi all,

I'd like to make a proposal to implement the new vtable ABI described in
PR26723, which I'll call the relative ABI. That bug gives more details and
justification for that ABI.

The user interface for the new ABI would be that -fwhole-program-vtables
would take an optional value indicating which aspects of the program have
whole-program scope. For example, the existing implementation of whole-program
vcall optimization allows external code to call into translation units
compiled with -fwhole-program-vtables, but does not allow external code to
derive from classes defined in such translation units, so you could request
the current behaviour with "-fwhole-program-vtables=derive", which means
that derived classes are not allowed from outside the program. To request
the new ABI, you can specify "-fwhole-program-vtables=call,derive",
which means that calls and derived classes are both not allowed from
outside the program. "-fwhole-program-vtables" would be short for
"-fwhole-program-vtables=call,derive,anythingelseweaddinfuture".

I'll also make the observation that the new ABI does not require LTO or
whole-program visibility at compile time; to decide whether to use the new
ABI for a class, we just need to check that it and its bases are not in the
whole-program-vtables blacklist.

At the same time, I'd like to change how virtual calls are represented in
the IR. This is for a few reasons:

1) Would allow whole-program virtual call optimization to work well with the
   relative ABI. This ABI would complicate the IR at call sites and make it
   harder to do matching and rewriting.

2) Simplifies the whole-program virtual call optimization pass. Currently we
   need to walk uses in the IR in order to determine the slot and callees for
   each call site. This can all be avoided with a simpler representation.

3) Would make it easier to implement dead virtual function stripping. This would
   involve reshaping any vtable initializers and rewriting call
   sites. Implementing this correctly is harder than it needs to be because
   of the current representation.

My proposal is to add the following new intrinsics:

i32 @llvm.vtable.slot.offset(metadata, i32)

This intrinsic takes a bitset name B and an offset I. It returns the byte
offset of the I'th virtual function pointer in each of the vtables in B.

i8* @llvm.vtable.load(i8*, i32)

This intrinsic takes a virtual table pointer and a byte offset, and loads
a virtual function pointer from the virtual table at the given offset.

i8* @llvm.vtable.load.relative(i8*, i32)

This intrinsic is the same as above, but it uses the relative ABI.

{i8*, i1} @llvm.vtable.checked.load(metadata %name, i8*, i32)
{i8*, i1} @llvm.vtable.checked.load.relative(metadata %name, i8*, i32)

These intrinsics would be used to implement CFI. They are similar to the
unchecked intrinsics, but if the second element of the result is non-zero,
the program may call the first element of the result as a function pointer
without causing an indirect function call to any function other than one
potentially loaded from one of the constant globals of which %name is a member.

To minimize the impact on existing passes, the intrinsics would be lowered
early during the regular pipeline when LTO is disabled, or early in the LTO
pipeline when LTO is enabled. Clang would not use the llvm.vtable.slot.offset
intrinsic when LTO is disabled, as bitset information would be unavailable.

To give the optimizer permission to reshape vtable initializers for a
particular class, the vtable would be added to a special named metadata node
named 'llvm.vtable.slots'. The presence of this metadata would guarantee
that all loads beyond a given byte offset (this range would not include the
RTTI pointer for example) are done using the above intrinsics.

We will also take advantage of the ABI break to split the class's virtual
table group at virtual table boundaries into separate globals instead of
emitting all virtual tables in the group into a single global. This will
not only simplify the implementation of dead virtual function stripping,
but also reduce code size overhead for CFI. (CFI works best if vtables for
a base class can be laid out near vtables for derived class; the current
ABI makes this harder to achieve.)

Example (using the relative ABI):

struct A {
  virtual void f();
  virtual void g();
};

struct B {
  virtual void h();
};

struct C : A, B {
  virtual void f();
  virtual void g();
  virtual void h();
};

void fcall(A *a) {
  a->f();
}

void gcall(A *a) {
  a->g();
}

typedef void (A::*mfp)();

mfp getmfp() {
  return &A::g;
}

void callmfp(A *a, mfp m) {
  (a->*m)();
}

In IR:

@A_vtable = {i8*, i8*, i32, i32} {0, @A::rtti, @A::f - (@A_vtable + 16), @A::g - (@A_vtable + 16)}
@B_vtable = {i8*, i8*, i32} {0, @B::rtti, @B::h - (@B_vtable + 16)}
@C_vtable0 = {i8*, i8*, i32, i32, i32} {0, @C::rtti, @C::f - (@C_vtable0 + 16), @C::g - (@C_vtable0 + 16), @C::h - (@C_vtable0 + 16)}
@C_vtable1 = {i8*, i8*, i32} {-8, @C::rtti, @C::h - (@C_vtable1 + 16)}

define void @fcall(%A* %a) {
  %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 0)
  %vtable = load i8* %a
  %fp = i8* @llvm.vtable.load.relative(%vtable, %slot)
  %casted_fp = bitcast i8* %fp to void (%A*)
  call void %casted_fp(%a)
}

define void @gcall(%A* %a) {
  %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 1)
  %vtable = load i8* %a
  %fp = i8* @llvm.vtable.load.relative(%vtable, %slot)
  %casted_fp = bitcast i8* %fp to void (%A*)
  call void %casted_fp(%a)
}

define {i8*, i8*} @getmfp() {
  %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 1)
  %slotp1 = add %slot, 1
  %result = insertvalue {i8*, i8*} {i8* 0, i8* 0}, 0, %slotp1
  ret {i8*, i8*} %result
}

define @callmfp(%A* %a, {i8*, i8*} %m) {
  ; assuming the call is virtual and no this adjustment
  %slot = extractvalue i8* %m, 0
  %slotm1 = sub %slot, 1
  %vtable = load i8* %a
  %fp = i8* @llvm.vtable.load.relative(%vtable, %slotm1)
  %casted_fp = bitcast i8* %fp to void (%A*)
  call void %casted_fp(%a)
}

!0 = {!"A", @A_vtable, 16}
!1 = {!"B", @B_vtable, 16}
!2 = {!"A", @C_vtable0, 16}
!3 = {!"B", @C_vtable1, 16}
!4 = {!"C", @C_vtable0, 16}
!llvm.bitsets = {!0, !1, !2, !3, !4}

!5 = {@A_vtable, 16}
!6 = {@B_vtable, 16}
!7 = {@C_vtable0, 16}
!8 = {@C_vtable1, 16}
!llvm.vtable.slots = {!5, !6, !7, !8}

Thanks,
-- 
Peter