<div dir="ltr">Using relative offsets applies to more than just vtables. It would do wonders for constant strings too.<div><br></div><div>-- Sean Silva</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Feb 29, 2016 at 1:53 PM, Peter Collingbourne via cfe-dev <span dir="ltr"><<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi all,<br>

<br>

I'd like to make a proposal to implement the new vtable ABI described in<br>

PR26723, which I'll call the relative ABI. That bug gives more details and<br>

justification for that ABI.<br>

<br>

The user interface for the new ABI would be that -fwhole-program-vtables<br>

would take an optional value indicating which aspects of the program have<br>

whole-program scope. For example, the existing implementation of whole-program<br>

vcall optimization allows external code to call into translation units<br>

compiled with -fwhole-program-vtables, but does not allow external code to<br>

derive from classes defined in such translation units, so you could request<br>

the current behaviour with "-fwhole-program-vtables=derive", which means<br>

that derived classes are not allowed from outside the program. To request<br>

the new ABI, you can specify "-fwhole-program-vtables=call,derive",<br>

which means that calls and derived classes are both not allowed from<br>

outside the program. "-fwhole-program-vtables" would be short for<br>

"-fwhole-program-vtables=call,derive,anythingelseweaddinfuture".<br>

<br>

I'll also make the observation that the new ABI does not require LTO or<br>

whole-program visibility at compile time; to decide whether to use the new<br>

ABI for a class, we just need to check that it and its bases are not in the<br>

whole-program-vtables blacklist.<br>

<br>

At the same time, I'd like to change how virtual calls are represented in<br>

the IR. This is for a few reasons:<br>

<br>

1) Would allow whole-program virtual call optimization to work well with the<br>

   relative ABI. This ABI would complicate the IR at call sites and make it<br>

   harder to do matching and rewriting.<br>

<br>

2) Simplifies the whole-program virtual call optimization pass. Currently we<br>

   need to walk uses in the IR in order to determine the slot and callees for<br>

   each call site. This can all be avoided with a simpler representation.<br>

<br>

3) Would make it easier to implement dead virtual function stripping. This would<br>

   involve reshaping any vtable initializers and rewriting call<br>

   sites. Implementing this correctly is harder than it needs to be because<br>

   of the current representation.<br>

<br>

My proposal is to add the following new intrinsics:<br>

<br>

i32 @llvm.vtable.slot.offset(metadata, i32)<br>

<br>

This intrinsic takes a bitset name B and an offset I. It returns the byte<br>

offset of the I'th virtual function pointer in each of the vtables in B.<br>

<br>

i8* @llvm.vtable.load(i8*, i32)<br>

<br>

This intrinsic takes a virtual table pointer and a byte offset, and loads<br>

a virtual function pointer from the virtual table at the given offset.<br>

<br>

i8* @llvm.vtable.load.relative(i8*, i32)<br>

<br>

This intrinsic is the same as above, but it uses the relative ABI.<br>

<br>

{i8*, i1} @llvm.vtable.checked.load(metadata %name, i8*, i32)<br>

{i8*, i1} @llvm.vtable.checked.load.relative(metadata %name, i8*, i32)<br>

<br>

These intrinsics would be used to implement CFI. They are similar to the<br>

unchecked intrinsics, but if the second element of the result is non-zero,<br>

the program may call the first element of the result as a function pointer<br>

without causing an indirect function call to any function other than one<br>

potentially loaded from one of the constant globals of which %name is a member.<br>

<br>

To minimize the impact on existing passes, the intrinsics would be lowered<br>

early during the regular pipeline when LTO is disabled, or early in the LTO<br>

pipeline when LTO is enabled. Clang would not use the llvm.vtable.slot.offset<br>

intrinsic when LTO is disabled, as bitset information would be unavailable.<br>

<br>

To give the optimizer permission to reshape vtable initializers for a<br>

particular class, the vtable would be added to a special named metadata node<br>

named 'llvm.vtable.slots'. The presence of this metadata would guarantee<br>

that all loads beyond a given byte offset (this range would not include the<br>

RTTI pointer for example) are done using the above intrinsics.<br>

<br>

We will also take advantage of the ABI break to split the class's virtual<br>

table group at virtual table boundaries into separate globals instead of<br>

emitting all virtual tables in the group into a single global. This will<br>

not only simplify the implementation of dead virtual function stripping,<br>

but also reduce code size overhead for CFI. (CFI works best if vtables for<br>

a base class can be laid out near vtables for derived class; the current<br>

ABI makes this harder to achieve.)<br>

<br>

Example (using the relative ABI):<br>

<br>

struct A {<br>

  virtual void f();<br>

  virtual void g();<br>

};<br>

<br>

struct B {<br>

  virtual void h();<br>

};<br>

<br>

struct C : A, B {<br>

  virtual void f();<br>

  virtual void g();<br>

  virtual void h();<br>

};<br>

<br>

void fcall(A *a) {<br>

  a->f();<br>

}<br>

<br>

void gcall(A *a) {<br>

  a->g();<br>

}<br>

<br>

typedef void (A::*mfp)();<br>

<br>

mfp getmfp() {<br>

  return &A::g;<br>

}<br>

<br>

void callmfp(A *a, mfp m) {<br>

  (a->*m)();<br>

}<br>

<br>

In IR:<br>

<br>

@A_vtable = {i8*, i8*, i32, i32} {0, @A::rtti, @A::f - (@A_vtable + 16), @A::g - (@A_vtable + 16)}<br>

@B_vtable = {i8*, i8*, i32} {0, @B::rtti, @B::h - (@B_vtable + 16)}<br>

@C_vtable0 = {i8*, i8*, i32, i32, i32} {0, @C::rtti, @C::f - (@C_vtable0 + 16), @C::g - (@C_vtable0 + 16), @C::h - (@C_vtable0 + 16)}<br>

@C_vtable1 = {i8*, i8*, i32} {-8, @C::rtti, @C::h - (@C_vtable1 + 16)}<br>

<br>

define void @fcall(%A* %a) {<br>

  %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 0)<br>

  %vtable = load i8* %a<br>

  %fp = i8* @llvm.vtable.load.relative(%vtable, %slot)<br>

  %casted_fp = bitcast i8* %fp to void (%A*)<br>

  call void %casted_fp(%a)<br>

}<br>

<br>

define void @gcall(%A* %a) {<br>

  %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 1)<br>

  %vtable = load i8* %a<br>

  %fp = i8* @llvm.vtable.load.relative(%vtable, %slot)<br>

  %casted_fp = bitcast i8* %fp to void (%A*)<br>

  call void %casted_fp(%a)<br>

}<br>

<br>

define {i8*, i8*} @getmfp() {<br>

  %slot = call i32 @llvm.vtable.slot.offset(!"A", i32 1)<br>

  %slotp1 = add %slot, 1<br>

  %result = insertvalue {i8*, i8*} {i8* 0, i8* 0}, 0, %slotp1<br>

  ret {i8*, i8*} %result<br>

}<br>

<br>

define @callmfp(%A* %a, {i8*, i8*} %m) {<br>

  ; assuming the call is virtual and no this adjustment<br>

  %slot = extractvalue i8* %m, 0<br>

  %slotm1 = sub %slot, 1<br>

  %vtable = load i8* %a<br>

  %fp = i8* @llvm.vtable.load.relative(%vtable, %slotm1)<br>

  %casted_fp = bitcast i8* %fp to void (%A*)<br>

  call void %casted_fp(%a)<br>

}<br>

<br>

!0 = {!"A", @A_vtable, 16}<br>

!1 = {!"B", @B_vtable, 16}<br>

!2 = {!"A", @C_vtable0, 16}<br>

!3 = {!"B", @C_vtable1, 16}<br>

!4 = {!"C", @C_vtable0, 16}<br>

!llvm.bitsets = {!0, !1, !2, !3, !4}<br>

<br>

!5 = {@A_vtable, 16}<br>

!6 = {@B_vtable, 16}<br>

!7 = {@C_vtable0, 16}<br>

!8 = {@C_vtable1, 16}<br>

!llvm.vtable.slots = {!5, !6, !7, !8}<br>

<br>

Thanks,<br>

<span class="HOEnZb"><font color="#888888">--<br>

Peter<br>

_______________________________________________<br>

cfe-dev mailing list<br>

<a href="mailto:cfe-dev@lists.llvm.org">cfe-dev@lists.llvm.org</a><br>

<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>

</font></span></blockquote></div><br></div>