[LLVMdev] GSoC 2012 proposal : Integrate Baggy Bounds Checking into SAFECode

Baozeng sploving1 at gmail.com
Wed Apr 4 06:04:06 PDT 2012


Dear LLVM developers:
    Here is my another proposal of LLVM. Any suggestion would be welcome!

*   **
*

***                Integrate Baggy Bounds Checking into SAFECode***

* *

*Abstract:** * Baggy Bounds Checking (BBC) is an efficient bounds checking
technique that pad and align objects to powers of two and enable allocation
bounds. It uses a contiguous array as bounds table to enable efficient
bounds lookup and thus has low overhead at runtime. This project is aims to
integrate BBC into SAFECode.   **

* *

*Motivation***

BBC is presented in the paper [1], which uses a binary buddy allocator that
pads and aligns objects to powers of two. Thus it can just use only a
single byte stored in a contiguous array to compute the base and the size
of a pointer. It is an efficient defense against out-of-bounds errors and
the authors claim that it is more than two times faster than the fasted
previous techniques. By integrating BBC into SAFECode, users can use
SAFECode in a more efficient way to check spatial errors.



*Project Detail*

  BBC enforces allocation bounds instead of object bounds. It constrains
allocation sizes and alignment to powers of two. For instance,

 x = malloc (200). BBC allocs the size of x to be 256 instead:

 x = malloc (256).

 Previous techniques that recorded a pointer to the start of the object and
its size in bounds table, which requires at least eight bytes. By enabling
allocation bounds, BBC can use only a single byte to encode bounds
information. This single byte is set to be log2(size). We use ‘e’ to stand
for this byte. As for the above instance, e = 8. The base and the size of a
pointer *p* can be computed as the following:

*size = 1 << e;*

*base = p & ~(size -1);*

  BBC uses a contiguous array to implement bounds table. Each entry of this
table stores the above single byte. Also, application memory is divided
into aligned *slots* with *slot_size* bytes. The bound table has an entry
for each slot. When checking whether a pointer is out of bound or not, BBC
can enable optimized bounds checks. For instance,* p* is a pointer, we want
to check whether the following pointer arithmetic is safe or not: q = p +
i. An explicit bonds check is as the following:

First we need compute the size and the base of *p: *

*Size = 1 << table[p>>slot_size]*

*Base= p & ~(size-1)*

Then we check whether *q *satisfies this condition:

*q >= base && q < base + size*

BBC optimizes it as the following:

*(p^q) >> table[p>>slot_size] ==0 *

It means that if *q *is in bound, it should have the same prefix with only
the e least significant bits modified, where e is the binary logarithm of
the allocation size, stored in the p>>slot_size entry of the bonds table.

  Here I give an example to explain how BBC works. (We set slot_size to be
16)

p =malloc (100); // It returns the address 0x12345600

BCD pads and aligns the size to be 128

p = malloc (128); So the upper bound of p is 0x12345680 and the lower bound
of p is 0x12345600. *table [p>>slot_size] = table[p>>16] = log2128 = 7*

There is pointer arithmetic:

q = p + 144 = 0x12345690.

We check: (p^q) >>table[p>>slot_size] = 0x00000090>>7 != 0 So q is out of
bound.



 Currently some initial work of BBC has been done in SAFECode. But there is
still a lot of work to do. The padding and alignment transform works with
LLVM 2.x and needs to be updated to LLVM 3. There are no run time checks of
BBC, including pointer arithmetic and array indexing, in the trunk of
SAFECode now. Also I will extend BBC to check pointer dereferences and
access to scalar fields in structure. So this project is to complete BBC at
runtime in SAFECode.

* *

*Timeline*

The following plan is scheduled as weeks.

[1-2] make padding and alignment transform work with LLVM 3.

[3-5] Add run-time checks, including each pointer arithmetic and array
indexing operation.

[6-9] Extend BBC to include the following checks: access to scalar fields
in structures and pointer dereferences.

[10-11] Test and improvement, make BBC works well with the test-suite.

[12] Document for BBC support in SAFECode.



*Project experience*

* *

In GSoC2009, I took part in a project: support Scilab language on SWIG [2].
I added a backend module in SWIG, so that it can support all the C features
for Scilab language: variables, functions, constants, enums, structs,
unions, pointers and arrays.



In GSoC2010, I also successfully finished a project called“epfs”[3] , which
means embedding Python from Scilab. This project introduces a mechanism to
load and use Python code from Scilab.

* *

I have about one year’s experience for LLVM. I use it mainly to implement
control flow integrity for Operating Systems and thus improve system
security. I recently submitted a patch for *Target.h* file to improve
compatibility with SWIG, which has been applied on the trunk.



* *

*Biography*

Name:* *Baozeng Ding

University:* *Institute of Software, Chinese Academy of Science

Email: sploving1 at gmail.com

IRC name: sploving



*References***

[1].    P. Akritidis,M. Costa,M. Castro, and S. Hand. Baggy Bounds
Checking: An Efficient and Backwards-Compatible Defenseagainst
Out-of-Bounds Errors. In Proceedings of the 18th USENIX Security Symposium,
August 2009.

[2].
http://code.google.com/p/google-summer-of-code-2009-swig/downloads/list

[3].   http://forge.scilab.org/index.php/p/epfs/

* *


-- 
     Best Regards,
                                                                 Baozeng
Ding

OSTG,NFS,ISCAS
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120404/f2dfaaa1/attachment.html>


More information about the llvm-dev mailing list