[LLVMdev] Forward: Discussion about custom memory allocators for STL

Sat May 17 21:38:34 PDT 2008

On May 17, 2008, at 9:48 AM, Roman Levenstein wrote:
> There is a discussion thread on llvm-commits list about a  
> possibility of using custom memory allocators for STL to improve the  
> performance and reduce the memory pressure of STL containers, e.g.  
> std::set.
> I thought that this discussion may be interesting for a wider  
> audience and therefore I re-post the messages on llvm-dev as well.
>
> It would be interesting to hear what others think about
> - custom allocators,

I think custom allocators can be a great thing, as long as they are  
specific to instances of containers, not a general replacement for  
malloc/free.  That way they can be used on demand when profiling shows  
they are important.

Ideally, I would be able to do something like:

   MyRegion R;
   std:set<x> (R);
   std::map<y,z> (R);

etc, and have the nodes for both containers share the same pool.  We  
already have some things like this that we use in clang, check out  
llvm/support/allocators.h.  It is not the standard STL allocator  
interface though, we need a simple adaptor.

> - reasons for current inefficiency of many STL containers on Evan's/ 
> Chris configs and

The reason is pretty obvious when you know what to look for.  On the  
mac, std::allocator<t> just calls malloc/free.  On linux and  
apparently everywhere else, std::allocator<T> is implemented in terms  
of a strange pool allocator.  [IMO, the mac way of doing things is the  
right way to go, but we can discuss philosophy elsewhere.  Here we  
will focus on the ramifications of this reality]

The end result of this is that std::set/map on the mac ends up calling  
malloc/free for each node in the red/black tree, and it allocates a  
metric butt load of nodes (one per each item inserted).  Further, each  
query of the set incurs traversals of the tree.  In the case of the  
mac, because malloc/free is used, the allocations are spread all  
throughout the heap and traversing a tree touches all sorts of random  
pages of memory in very cache-unfriendly ways. I think the performance  
delta of mac vs linux is primarily due to this locality issue, not the  
raw performance of malloc on the two systems.

On linux (for example) the custom implementation of std::set happens  
to get lucky, and generally gives better locality than mapping  
directly to malloc/free, just because only a very small subset of the  
objects on the heap go through std::allocator (f.e. none of the  
SDNodes or LLVM IR objects go through std::allocator).  This implicit  
partitioning of the heap leads to accidental improved performance, but  
the performance is still pretty bad if you do have multiple node-based  
containers live at once (though that is rare in llvm).  I did a whole  
phd thesis about partitioning memory objects, so I know that  
partitioning data structures into their own regions can be a big  
win :).  It just happens that std::allocator on linux gets lucky on  
this for LLVM, since we generally have few STL containers live at once  
(and when we beat on them, we beat on them heavily).

> - possibility of using 3rd party libs like Boost (or their parts) in  
> LLVM

We have already imported pieces of boost.  The general rule is that it  
is ok to use specific pieces, but they have to be converted to the  
LLVM style and way of doing things.  Also, things with huge chains of  
dependencies sometimes have to be simplified.

-Chris