[cfe-dev] RFC: A new ABI for virtual calls, and a change to the virtual call representation in the IR
Sean Silva via cfe-dev
cfe-dev at lists.llvm.org
Mon Feb 29 15:38:35 PST 2016
Using relative offsets applies to more than just vtables. It would do
wonders for constant strings too.
-- Sean Silva
On Mon, Feb 29, 2016 at 1:53 PM, Peter Collingbourne via cfe-dev <
cfe-dev at lists.llvm.org> wrote:
> Hi all,
>
> I'd like to make a proposal to implement the new vtable ABI described in
> PR26723, which I'll call the relative ABI. That bug gives more details and
> justification for that ABI.
>
> The user interface for the new ABI would be that -fwhole-program-vtables
> would take an optional value indicating which aspects of the program have
> whole-program scope. For example, the existing implementation of
> whole-program
> vcall optimization allows external code to call into translation units
> compiled with -fwhole-program-vtables, but does not allow external code to
> derive from classes defined in such translation units, so you could request
> the current behaviour with "-fwhole-program-vtables=derive", which means
> that derived classes are not allowed from outside the program. To request
> the new ABI, you can specify "-fwhole-program-vtables=call,derive",
> which means that calls and derived classes are both not allowed from
> outside the program. "-fwhole-program-vtables" would be short for
> "-fwhole-program-vtables=call,derive,anythingelseweaddinfuture".
>
> I'll also make the observation that the new ABI does not require LTO or
> whole-program visibility at compile time; to decide whether to use the new
> ABI for a class, we just need to check that it and its bases are not in the
> whole-program-vtables blacklist.
>
> At the same time, I'd like to change how virtual calls are represented in
> the IR. This is for a few reasons:
>
> 1) Would allow whole-program virtual call optimization to work well with
> the
> relative ABI. This ABI would complicate the IR at call sites and make it
> harder to do matching and rewriting.
>
> 2) Simplifies the whole-program virtual call optimization pass. Currently
> we
> need to walk uses in the IR in order to determine the slot and callees
> for
> each call site. This can all be avoided with a simpler representation.
>
> 3) Would make it easier to implement dead virtual function stripping. This
> would
> involve reshaping any vtable initializers and rewriting call
> sites. Implementing this correctly is harder than it needs to be because
> of the current representation.
>
> My proposal is to add the following new intrinsics:
>
> i32 @llvm.vtable.slot.offset(metadata, i32)
>
> This intrinsic takes a bitset name B and an offset I. It returns the byte
> offset of the I'th virtual function pointer in each of the vtables in B.
>
> i8* @llvm.vtable.load(i8*, i32)
>
> This intrinsic takes a virtual table pointer and a byte offset, and loads
> a virtual function pointer from the virtual table at the given offset.
>
> i8* @llvm.vtable.load.relative(i8*, i32)
>
> This intrinsic is the same as above, but it uses the relative ABI.
>
> {i8*, i1} @llvm.vtable.checked.load(metadata %name, i8*, i32)
> {i8*, i1} @llvm.vtable.checked.load.relative(metadata %name, i8*, i32)
>
> These intrinsics would be used to implement CFI. They are similar to the
> unchecked intrinsics, but if the second element of the result is non-zero,
> the program may call the first element of the result as a function pointer
> without causing an indirect function call to any function other than one
> potentially loaded from one of the constant globals of which %name is a
> member.
>
> To minimize the impact on existing passes, the intrinsics would be lowered
> early during the regular pipeline when LTO is disabled, or early in the LTO
> pipeline when LTO is enabled. Clang would not use the
> llvm.vtable.slot.offset
> intrinsic when LTO is disabled, as bitset information would be unavailable.
>
> To give the optimizer permission to reshape vtable initializers for a
> particular class, the vtable would be added to a special named metadata
> node
> named 'llvm.vtable.slots'. The presence of this metadata would guarantee
> that all loads beyond a given byte offset (this range would not include the
> RTTI pointer for example) are done using the above intrinsics.
>
> We will also take advantage of the ABI break to split the class's virtual
> table group at virtual table boundaries into separate globals instead of
> emitting all virtual tables in the group into a single global. This will
> not only simplify the implementation of dead virtual function stripping,
> but also reduce code size overhead for CFI. (CFI works best if vtables for
> a base class can be laid out near vtables for derived class; the current
> ABI makes this harder to achieve.)
>
> Example (using the relative ABI):
>
> struct A {
> virtual void f();
> virtual void g();
> };
>
> struct B {
> virtual void h();
> };
>
> struct C : A, B {
> virtual void f();
> virtual void g();
> virtual void h();
> };
>
> void fcall(A *a) {
> a->f();
> }
>
> void gcall(A *a) {
> a->g();
> }
>
> typedef void (A::*mfp)();
>
> mfp getmfp() {
> return &A::g;
> }
>
> void callmfp(A *a, mfp m) {
> (a->*m)();
> }
>
> In IR:
>
> @A_vtable = {i8*, i8*, i32, i32} {0, @A::rtti, @A::f - (@A_vtable + 16),
> @A::g - (@A_vtable + 16)}
> @B_vtable = {i8*, i8*, i32} {0, @B::rtti, @B::h - (@B_vtable + 16)}
> @C_vtable0 = {i8*, i8*, i32, i32, i32} {0, @C::rtti, @C::f - (@C_vtable0 +
> 16), @C::g - (@C_vtable0 + 16), @C::h - (@C_vtable0 + 16)}
> @C_vtable1 = {i8*, i8*, i32} {-8, @C::rtti, @C::h - (@C_vtable1 + 16)}
>
> define void @fcall(%A* %a) {
> %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 0)
> %vtable = load i8* %a
> %fp = i8* @llvm.vtable.load.relative(%vtable, %slot)
> %casted_fp = bitcast i8* %fp to void (%A*)
> call void %casted_fp(%a)
> }
>
> define void @gcall(%A* %a) {
> %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 1)
> %vtable = load i8* %a
> %fp = i8* @llvm.vtable.load.relative(%vtable, %slot)
> %casted_fp = bitcast i8* %fp to void (%A*)
> call void %casted_fp(%a)
> }
>
> define {i8*, i8*} @getmfp() {
> %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 1)
> %slotp1 = add %slot, 1
> %result = insertvalue {i8*, i8*} {i8* 0, i8* 0}, 0, %slotp1
> ret {i8*, i8*} %result
> }
>
> define @callmfp(%A* %a, {i8*, i8*} %m) {
> ; assuming the call is virtual and no this adjustment
> %slot = extractvalue i8* %m, 0
> %slotm1 = sub %slot, 1
> %vtable = load i8* %a
> %fp = i8* @llvm.vtable.load.relative(%vtable, %slotm1)
> %casted_fp = bitcast i8* %fp to void (%A*)
> call void %casted_fp(%a)
> }
>
> !0 = {!"A", @A_vtable, 16}
> !1 = {!"B", @B_vtable, 16}
> !2 = {!"A", @C_vtable0, 16}
> !3 = {!"B", @C_vtable1, 16}
> !4 = {!"C", @C_vtable0, 16}
> !llvm.bitsets = {!0, !1, !2, !3, !4}
>
> !5 = {@A_vtable, 16}
> !6 = {@B_vtable, 16}
> !7 = {@C_vtable0, 16}
> !8 = {@C_vtable1, 16}
> !llvm.vtable.slots = {!5, !6, !7, !8}
>
> Thanks,
> --
> Peter
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160229/581df192/attachment.html>
More information about the cfe-dev
mailing list