[LLVMdev] IR extension proposal: bitset constants

Peter Collingbourne peter at pcc.me.uk
Tue Jan 27 13:07:54 PST 2015


Hi all,

I would like to propose a mechanism that allows IR modules to co-operatively
build a pointer set corresponding to addresses within a given set of
globals. The specific use case I have in mind is to provide a mechanism
for a C++ program to efficiently verify (at each call site) that a vtable
pointer is in the set of valid vtable pointers for the class or its derived
classes. One way of doing this is for a toolchain component to build, for each
class, a bit set that maps to the memory region allocated for the vtables,
such that each 1 bit in the bit set maps to a valid vtable for that class,
and lay out the vtables next to each other, to minimize the total size of
the bit sets. Call sites would perform a range check followed by a load of
the appropriate bit from the bit set.

To give a concrete example, suppose we have the following classes:

struct A { virtual void f(); };
struct B : A { virtual void f(), g(); };
struct C : A { virtual void f(), h(); };

with the following vtables:

_ZTV1A = { &A::rtti, &A::f };
_ZTV1B = { &B::rtti, &B::f, &B::g };
_ZTV1C = { &C::rtti, &C::f, &C::h };

The set of valid vtables for static class A is {&_ZTV1A[1], &_ZTV1B[1],
&_ZTV1C[1]}, for B is {&_ZTV1B[1]} and for C is {&_ZTV1C[1]}. The toolchain
would lay out _ZTV1A, _ZTV1B and _ZTV1C consecutively, and generate bit sets
like this:

A = {0,1,0,1,0,0,1,0}
B = {0,0,0,1,0,0,0,0}
C = {0,0,0,0,0,0,1,0}

A call site (say the static type of the call is A) will check whether the
object's vtable falls in the range [_ZTV1A, _ZTV1C], and is sizeof(void
*)-aligned. It would then load the (vtable-_ZTV1A)/sizeof(void *)'th entry
in A's bitset, which will be 1 if the vtable is valid for A.

We are now faced with a number of implementation questions: how do we represent
these (potentially partial) bit sets in LLVM, and how do we arrange to lay
out the globals and build the complete bit sets? Remember that classes B
and C may be in different translation units, so we do not necessarily have
a complete picture of A's bit set at A's compile time.

What I propose is that the parts of the bit sets be represented as arrays
of pointers, which will be resolved through some mechanism (either at
LTO time, or at regular (non-LTO) link time) to bit sets corresponding to
those addresses. This mechanism would also be responsible for laying out
the referenced globals (i.e. the vtables) as close together as possible. We
introduce a new flag on globals, the so-called "bitset" flag, which causes
the global to be transformed using this mechanism. Such a global cannot be
accessed in the normal way. It may only be accessed using an intrinsic that
tests whether a given pointer is in the set.

To return to our concrete example, a translation unit defining the vtable
for A may also define the following global:

@_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1A, 0, 1)]

A translation unit defining B would define:

@_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1B, 0, 1)]
@_ZBS1B = appending bitset constant [1 x i8*] [gep(@_ZTV1B, 0, 1)]

And C:

@_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1C, 0, 1)]
@_ZBS1C = appending bitset constant [1 x i8*] [gep(@_ZTV1C, 0, 1)]

A later mechanism would combine the bitset globals, lay out the referenced
globals and build the bit sets. In the LTO case, the IR linker can already
do the combination step if the globals are given appending linkage. A
later compiler pass (which we will call the "bitset lowering pass") would lay out
the globals (by combining them into a single large global, and RAUWing the
original globals with aliases that gep into the large global) and build the
bit sets. In the non-LTO case, we could also invent an object-file-specific
mechanism for the backend to tell the linker to combine and build the bit sets,
for example by placing the bitset globals in a specifically named section,
and place the referenced globals in individual sections so that the linker
can rearrange them.

As previously mentioned, an intrinsic would be used to test entries in the
bit set at call sites. For example, the relevant IR for the following C++ code:

A *some_object = ...;
some_object->f();

might look like this:

%vtable = load i8** %some_object
%p = call i1 @llvm.bitset.test(i8* %vtable, i8* @_ZBS1A) readnone
br i1 %p, label %continue, label %trap

In the LTO case, such intrinsic calls would be lowered by the bitset lowering pass
into the appropriate range/bit set check code, which might look like this:
(pseudocode for 64-bit platform)

if %vtable & 7 != 0 || %vtable < global_start_addr(_ZBS1A) || %vtable > global_end_addr(_ZBS1A) {
  %p = 0
} else {
  %p = bitset(_ZBS1A)[(%vtable - global_start_addr(_ZBS1A)) >> 3]
}

In the non-LTO case, we would generate similar code, but with appropriate
relocations so that the required constants (global start/end address, bitset
address, mask, shift amount) would receive appropriate linker-determined
values.

The first step I propose to make in this direction is a patch that adds
support for the bitset flag on globals, and introduces the bitset lowering pass. A
later step would involve building backend support, and linker support in lld.

Comments appreciated.

Thanks,
-- 
Peter



More information about the llvm-dev mailing list