[LLVMdev] Forward: Discussion about custom memory allocators for STL

Sat May 17 09:48:58 PDT 2008

Hi,

There is a discussion thread on llvm-commits list about a possibility of using custom memory allocators for STL to improve the performance and reduce the memory pressure of STL containers, e.g. std::set.
I thought that this discussion may be interesting for a wider audience and therefore I re-post the messages on llvm-dev as well.

It would be interesting to hear what others think about 
 - custom allocators, 
 - reasons for current inefficiency of many STL containers on Evan's/Chris configs and 
 - possibility of using 3rd party libs like Boost (or their parts) in LLVM

Thanks,
 - Roman

----- Forwarded Mail ----
> From: Roman Levenstein <romix.llvm at googlemail.com>
> To: CVS Commit Messages for LLVM repository <llvm-commits at cs.uiuc.edu>
> Send: Friday,  16. May 2008, 19:20:29
> Subject: Re: [llvm-commits] Speeding up RegAllocLinearScan on big test-cases
> 
> Hi,
> 
> 2008/5/7 Evan Cheng :
> > Can we hook up the llvm pool allocator to std::set and use it for the
> >  register allocator? It's simple and it made a huge difference on Mac
> >  OS X when we switched all LiveInterval VNInfo allocations to it.
> >
> >  Evan
> 
> Yes. We can hook up the llvm pool allocator to std::set. I have a working
> implementation.
> 
> 
> >  On May 7, 2008, at 1:24 AM, Roman Levenstein wrote:
> >
> >  > Hi,
> >  >
> >  > 2008/5/7 Bill Wendling :
> >  >> On Tue, May 6, 2008 at 3:31 PM, Evan Cheng 
> >  >> wrote:
> >  >>> On May 6, 2008, at 6:13 AM, Roman Levenstein
> >  >>
> >  >>
> >  >>>> But one thing about std::set that could be eventually interesting
> >  >>>> at
> >  >>>> many places
> >  >>>> is the following:
> >  >>>> - in many situations we know the maximal size of the set in
> >  >>>> advance.
> >  >>>> For example,
> >  >>>>  in this patch, the set can contain at most all live intervals. In
> >  >>>> the scheduler,
> >  >>>>  the availableQueue can contain at most all SUnits. It means that
> >  >>>> if we would
> >  >>>>  be able to allocate the memory for the maximum possible number of
> >  >>>> elements
> >  >>>>  in advance, then there is no need for any additional memory
> >  >>>> allocation.
> >  >>>>
> >  >>>> - I think, a custom STL allocator could be written that could do
> >  >>>> exactly this. It would
> >  >>>> reserve memory for the maximum number of elements (of the equal
> >  >>>> size?)and
> >  >>>> maintain a free list of cells. Then we can have a very efficient
> >  >>>> allocation and sets
> >  >>>> that do no produce to much malloc/free pressure. The same idea can
> >  >>>> be used
> >  >>>> also for some other STL containers.
> >  >>>>
> >  >>>> What do you think?
> >  >>>
> >  >>> I am not sure. I have little experience with custom STL allocators.
> >  >>> Perhaps others will chime in. For now, using std::set is fine.
> >  >>>
> >  >> I created custom allocators before. They aren't that bad. You just
> >  >> have to get the functionality correct (the allocators I created
> >  >> called
> >  >> a malloc function that already had this functionality). The major
> >  >> caveat is that if you don't have a well-tested memory manager
> >  >> available, this can lead to nasty bugs. I would stick with std::set
> >  >> unless it becomes a major albatross around our necks.
> >  >
> >  > I have a lot of experience with custom allocators. I extensively used
> >  > them in the optimizing C compiler that  I have written few years ago
> >  > for an embedded target. Initially I was using simple malloc/free and
> >  > new/delete. Then I moved to GC, but at the end I switched to
> >  > custom memory allocators. I can only save, that they had very positive
> >  > impact on the performance of my compiler.
> >  >
> >  > Custom memory allocators are not such a black art, as it may seem at
> >  > first glance, actually. And there are quite well proven allocators.
> >  > Usually, they
> >  > are not that complex and it is rather easy to see if they are correct.
> >  > Normally, they are used for node-based STL containers or for most
> >  > typical nodes created by the compiler (e.g. SDNode, SDOperand,
> >  > SUnit, etc).
> >  > And they can really speed-up things, e.g. if they use pool
> >  > allocation or
> >  > segregated storage or if they free all the objects at once.
> >  > For example, imagine that we use such an allocator for SUnit nodes.
> >  > Then
> >  > it may reserve memory for N SUnit objects and very quickly allocate
> >  > it.
> >  > Once scheduling is over, all such objects can be freed at once, just
> >  > by
> >  > cleaning/deleting the custom allocator.
> >  >
> >  > I'm currently experimenting with 4-5 different custom STL allocators
> >  > I have found on the Internet. Once I have representative figures to
> >  > compare them again STL's standard allocators and after I cleaned up the 
> code,
> >  > I'll report about it to this mailing list.
> 
> OK. So, I've tested 6 allocators. One of them is the standard STL
> allocator. Other allocators
> we found by me on the Internet and made STL-compliant via a special
> templatized adapter class.
> I don't want to go in detail at this moment, since the code is not
> quite polished yet, but I'd
> mention that bump_allocator is an STL-compliant version of LLVM's pool
> allocator.
> 
> My test checks their performance for allocating node-based containers,
> i.e. std::list and std::set.
> I try to insert 1000000 nodes into the list and set using different allocators.
> While doing it, I observe the following picture:
> 
> ***Tests with ***
> Sort (ss_allocator):0.517465
> Sort (fsb_allocator):0.693605
> Sort (bump_allocator):0.5398639999
> Sort (fastalloc_allocator):0.5254200001
> Sort (boost_fastalloc_allocator):0.520713
> Sort (default allocator):0.631207
> 
> ***Tests with ***
> Insertion (ss_allocator):0.8642740001
> Insertion (fsb_allocator):0.932031
> Insertion (bump_allocator):0.9571639999
> Insertion (fast_allocator):0.950616
> Insertion (boost_fast_allocator):0.9666030001
> Insertion (default_allocator):1.210076
> 
> So, we can see, that performance-wise the difference is not that huge.
> But if we look at the number of new/delete calls, then it is quite different:
> 1) without STL standard allocator - Total of only 847(!!!) mallocs for
> all of the allocators together,  while adding 1000000 nodes for each
> of them.
> 2) with STL standard allocator - Total of 2000628 mallocs for all of
> the allocators together, while adding 1000000 nodes for each of them.
> 
> So,  the standard allocator of STL produces a huge number of
> new/delete calls. And other allocators reduce it
> by almost 4 orders of magnitude. But, as mentioned before, it DOES NOT
> result in a big performance difference on
> my Ubuntu/Linux/x86 machine, which indicates that mallocs are very
> efficient here. But for you it seems to be very different...
> 
> So the question is: why does STL allocator perform so poorly on
> PowerPC that you seem to use?
> Could it be that the malloc or STL implementation of your OS/compiler
> is particularly bad?
> Would it be possible for you to try using a custom malloc
> implementation that would replace that
> system malloc? E.g. could you try to link LLVM with Doug Lea's malloc,
> which is a standard implementation on
> the Linux/x86 systems? Or do you have other explanations for this?
> 
> BTW, boost seems to provide a nice pool allocator. And it is
> "production grade" and "bullet-proof" compared
> to many others. Would it be too bad, if LLVM would use it? This does
> not mean that the whole boost should
> be available. There is a special tool provided with boost that
> extracts only those subsets of it, that are required.
> And boost has a BSD-like license.
> 
> -Roman
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

      __________________________________________________________
Gesendet von Yahoo! Mail.
Dem pfiffigeren Posteingang.
http://de.overview.mail.yahoo.com