[LLVMdev] IR extension proposal: bitset constants

Sean Silva chisophugis at gmail.com
Tue Feb 3 16:03:45 PST 2015


One other thing: if this can be used for control-flow integrity, I assume
it has to have good knowledge of the available indirect call targets. Could
we also use this information for devirtualization?

-- Sean Silva

On Tue, Jan 27, 2015 at 1:07 PM, Peter Collingbourne <peter at pcc.me.uk>
wrote:

> Hi all,
>
> I would like to propose a mechanism that allows IR modules to
> co-operatively
> build a pointer set corresponding to addresses within a given set of
> globals. The specific use case I have in mind is to provide a mechanism
> for a C++ program to efficiently verify (at each call site) that a vtable
> pointer is in the set of valid vtable pointers for the class or its derived
> classes. One way of doing this is for a toolchain component to build, for
> each
> class, a bit set that maps to the memory region allocated for the vtables,
> such that each 1 bit in the bit set maps to a valid vtable for that class,
> and lay out the vtables next to each other, to minimize the total size of
> the bit sets. Call sites would perform a range check followed by a load of
> the appropriate bit from the bit set.
>
> To give a concrete example, suppose we have the following classes:
>
> struct A { virtual void f(); };
> struct B : A { virtual void f(), g(); };
> struct C : A { virtual void f(), h(); };
>
> with the following vtables:
>
> _ZTV1A = { &A::rtti, &A::f };
> _ZTV1B = { &B::rtti, &B::f, &B::g };
> _ZTV1C = { &C::rtti, &C::f, &C::h };
>
> The set of valid vtables for static class A is {&_ZTV1A[1], &_ZTV1B[1],
> &_ZTV1C[1]}, for B is {&_ZTV1B[1]} and for C is {&_ZTV1C[1]}. The toolchain
> would lay out _ZTV1A, _ZTV1B and _ZTV1C consecutively, and generate bit
> sets
> like this:
>
> A = {0,1,0,1,0,0,1,0}
> B = {0,0,0,1,0,0,0,0}
> C = {0,0,0,0,0,0,1,0}
>
> A call site (say the static type of the call is A) will check whether the
> object's vtable falls in the range [_ZTV1A, _ZTV1C], and is sizeof(void
> *)-aligned. It would then load the (vtable-_ZTV1A)/sizeof(void *)'th entry
> in A's bitset, which will be 1 if the vtable is valid for A.
>
> We are now faced with a number of implementation questions: how do we
> represent
> these (potentially partial) bit sets in LLVM, and how do we arrange to lay
> out the globals and build the complete bit sets? Remember that classes B
> and C may be in different translation units, so we do not necessarily have
> a complete picture of A's bit set at A's compile time.
>
> What I propose is that the parts of the bit sets be represented as arrays
> of pointers, which will be resolved through some mechanism (either at
> LTO time, or at regular (non-LTO) link time) to bit sets corresponding to
> those addresses. This mechanism would also be responsible for laying out
> the referenced globals (i.e. the vtables) as close together as possible. We
> introduce a new flag on globals, the so-called "bitset" flag, which causes
> the global to be transformed using this mechanism. Such a global cannot be
> accessed in the normal way. It may only be accessed using an intrinsic that
> tests whether a given pointer is in the set.
>
> To return to our concrete example, a translation unit defining the vtable
> for A may also define the following global:
>
> @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1A, 0, 1)]
>
> A translation unit defining B would define:
>
> @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1B, 0, 1)]
> @_ZBS1B = appending bitset constant [1 x i8*] [gep(@_ZTV1B, 0, 1)]
>
> And C:
>
> @_ZBS1A = appending bitset constant [1 x i8*] [gep(@_ZTV1C, 0, 1)]
> @_ZBS1C = appending bitset constant [1 x i8*] [gep(@_ZTV1C, 0, 1)]
>
> A later mechanism would combine the bitset globals, lay out the referenced
> globals and build the bit sets. In the LTO case, the IR linker can already
> do the combination step if the globals are given appending linkage. A
> later compiler pass (which we will call the "bitset lowering pass") would
> lay out
> the globals (by combining them into a single large global, and RAUWing the
> original globals with aliases that gep into the large global) and build the
> bit sets. In the non-LTO case, we could also invent an object-file-specific
> mechanism for the backend to tell the linker to combine and build the bit
> sets,
> for example by placing the bitset globals in a specifically named section,
> and place the referenced globals in individual sections so that the linker
> can rearrange them.
>
> As previously mentioned, an intrinsic would be used to test entries in the
> bit set at call sites. For example, the relevant IR for the following C++
> code:
>
> A *some_object = ...;
> some_object->f();
>
> might look like this:
>
> %vtable = load i8** %some_object
> %p = call i1 @llvm.bitset.test(i8* %vtable, i8* @_ZBS1A) readnone
> br i1 %p, label %continue, label %trap
>
> In the LTO case, such intrinsic calls would be lowered by the bitset
> lowering pass
> into the appropriate range/bit set check code, which might look like this:
> (pseudocode for 64-bit platform)
>
> if %vtable & 7 != 0 || %vtable < global_start_addr(_ZBS1A) || %vtable >
> global_end_addr(_ZBS1A) {
>   %p = 0
> } else {
>   %p = bitset(_ZBS1A)[(%vtable - global_start_addr(_ZBS1A)) >> 3]
> }
>
> In the non-LTO case, we would generate similar code, but with appropriate
> relocations so that the required constants (global start/end address,
> bitset
> address, mask, shift amount) would receive appropriate linker-determined
> values.
>
> The first step I propose to make in this direction is a patch that adds
> support for the bitset flag on globals, and introduces the bitset lowering
> pass. A
> later step would involve building backend support, and linker support in
> lld.
>
> Comments appreciated.
>
> Thanks,
> --
> Peter
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150203/8bd51a04/attachment.html>


More information about the llvm-dev mailing list