[LLVMdev] Forward: Discussion about custom memory allocators for STL

Sun May 18 01:36:19 PDT 2008

Hi Chris,

Thanks a lot for a detailed opinion and explanation!
It really answers the original question, without going to far 
into political discussions about boost and generic allocators
pros/cons aspects. 

----- Ursprüngliche Mail ----
> Von: Chris Lattner <sabre at nondot.org>
> An: LLVM Developers Mailing List <llvmdev at cs.uiuc.edu>
> Gesendet: Sonntag, den 18. Mai 2008, 08:38:34 Uhr
> Betreff: Re: [LLVMdev] Forward: Discussion about custom memory allocators for STL
> 
> On May 17, 2008, at 9:48 AM, Roman Levenstein wrote:
> > There is a discussion thread on llvm-commits list about a  
> > possibility of using custom memory allocators for STL to improve the  
> > performance and reduce the memory pressure of STL containers, e.g.  
> > std::set.
> > I thought that this discussion may be interesting for a wider  
> > audience and therefore I re-post the messages on llvm-dev as well.
> >
> > It would be interesting to hear what others think about
> > - custom allocators,
> 
> I think custom allocators can be a great thing, as long as they are  
> specific to instances of containers, not a general replacement for  
> malloc/free.  That way they can be used on demand when profiling shows  
> they are important.

Absolutely!

> Ideally, I would be able to do something like:
> 
>    MyRegion R;
>    std:set(R);
>    std::map(R);

This is exactly the way, how I intended to use custom STL allocators.
Doing it selectively on the container intance basis is very important.

> etc, and have the nodes for both containers share the same pool.  
> We  already have some things like this that we use in clang, check out  
> llvm/support/allocators.h.  It is not the standard STL allocator  
> interface though, we need a simple adaptor.

When it comes to non-STL memory allocators, I'd say that many points
mentioned by Barry Kelly are very true. Having allocators that easily allow
for fast allocation and easy bulk de-allocation e.g. of all objects created 
during the compilation of certain scopes (like block of statements, 
function, etc). But in those cases, allocators are usually bound not to the
containers, but to object types. Usually it is done by defining their
class-specific new/delete operators that would use dedicated type specific 
pools. I've used this approach for compilers implementation very 
successfully, as I mentioned in the original mail on llvm-commits.

> > - reasons for current inefficiency of many STL containers on Evan's/ 
> > Chris configs and
> 
> The reason is pretty obvious when you know what to look for.  On the  
> mac, std::allocatorjust calls malloc/free.  On linux and  
> apparently everywhere else, std::allocatoris implemented in terms  
> of a strange pool allocator.  [IMO, the mac way of doing things is the  
> right way to go, but we can discuss philosophy elsewhere.  Here we  
> will focus on the ramifications of this reality]
> 
> The end result of this is that std::set/map on the mac ends up calling  
> malloc/free for each node in the red/black tree, and it allocates a  
> metric butt load of nodes (one per each item inserted).  Further, each  
> query of the set incurs traversals of the tree.  In the case of the  
> mac, because malloc/free is used, the allocations are spread all  
> throughout the heap and traversing a tree touches all sorts of random  
> pages of memory in very cache-unfriendly ways. I think the performance  
> delta of mac vs linux is primarily due to this locality issue, not the  
> raw performance of malloc on the two systems.
> 
> On linux (for example) the custom implementation of std::set happens  
> to get lucky, and generally gives better locality than mapping  
> directly to malloc/free, just because only a very small subset of the  
> objects on the heap go through std::allocator (f.e. none of the  
> SDNodes or LLVM IR objects go through std::allocator).  This implicit  
> partitioning of the heap leads to accidental improved performance, but  
> the performance is still pretty bad if you do have multiple node-based  
> containers live at once (though that is rare in llvm).  I did a whole  
> phd thesis about partitioning memory objects, so I know that  
> partitioning data structures into their own regions can be a big  
> win :).  It just happens that std::allocator on linux gets lucky on  
> this for LLVM, since we generally have few STL containers live at once  
> (and when we beat on them, we beat on them heavily).

In principle, I buy your explanation. I just want again to refer to the figures from 
my original mail:

> So, we can see, that performance-wise the difference is not that huge.
> But if we look at the number of new/delete calls, then it is quite different:
> 1) without STL standard allocator - Total of only 847(!!!) mallocs for
> all of the allocators together,  while adding 1000000 nodes for each
> of them.
> 2) with STL standard allocator - Total of 2000628 mallocs for all of
> the allocators together, while adding 1000000 nodes for each of them.So, it looks like on Linux the allocator used by STL is  still using malloc/free rather extensively.
Even if pool allocator is used by STL by default, it is not saving too much of malloc/free calls, as 
you can see from above figures. Or do you think that this amount of malloc/free calls is produced
by a pool allocator of STL?

> > - possibility of using 3rd party libs like Boost (or their parts) in  
> > LLVM
> 
> We have already imported pieces of boost.  

BTW, which ones?

> The general rule is that it  is ok to use specific pieces, 

Totally agree with you.

> but they have to be converted to the  LLVM style and way of doing things.  
> Also, things with huge chains of  dependencies sometimes have to be simplified.

One comment here:
While I agree with this approach for most libraries, I think that boost is (almost) becoming 
a second-important libray for C++, almost as important as the STL. Therefore, 
modifying/re-formating/converting it it is not such a great idea, eventually, 
since everyone using it assumes that its implementation and semantics is exactly the 
same on each platform.

- Roman

      __________________________________________________________
Gesendet von Yahoo! Mail.
Dem pfiffigeren Posteingang.
http://de.overview.mail.yahoo.com