[llvm-commits] Updated SAFECode Patch

Fri Jul 22 11:07:22 PDT 2011

On Jul 22, 2011, at 8:34 AM, John Criswell wrote:

>> * Why do you have an implementation of a SplayTree?  Why is it better than a hash table or other data structure?
> 
> For the run-time checks, the question we are asking is whether a pointer 
> falls within a valid object.  Therefore, what we need is a container 
> that contains a set of ranges (lower and upper bounds on memory objects) 
> and can be searched (i.e., given a value, does it fall within a range 
> within the container).
> 
> Hash tables don't do this (as far as I know).  So, we are left with tree 
> data structures.  The splay tree has a caching property which is useful 
> for memory safety checks because there's usually a small working set of 
> memory objects that are being checked.  The splay tree moves the ranges 
> for these objects to the top of the tree, and so they're found more quickly.
> 
> In short, splay trees are the best lookup data structure for ranges of 
> which we know.

ADT/IntervalMap.h is a fairly recent addition that does almost exactly this.

It uses a B+-tree instead of a splay tree to get a more compact representation and locality of reference.

It doesn't modify the data structure when searching, so you don't get any LRU effects. Instead, it has iterators that are very fast when moving small distances.

If you are storing (pointer x pointer) -> pointer, you get 12-way inner nodes and 8-way leaf nodes, each node using 3 cache lines. This is a very shallow tree, so when you repeat a small number of searches, most of the busy nodes are going to fit in L1 cache.

I measured it as about twice as fast as std::map when used by the register allocator.

/jakob