[LLVMdev] Forward: Discussion about custom memory allocators for STL
Roman Levenstein
romixlev at yahoo.com
Sat May 17 09:48:58 PDT 2008
Hi,
There is a discussion thread on llvm-commits list about a possibility of using custom memory allocators for STL to improve the performance and reduce the memory pressure of STL containers, e.g. std::set.
I thought that this discussion may be interesting for a wider audience and therefore I re-post the messages on llvm-dev as well.
It would be interesting to hear what others think about
- custom allocators,
- reasons for current inefficiency of many STL containers on Evan's/Chris configs and
- possibility of using 3rd party libs like Boost (or their parts) in LLVM
Thanks,
- Roman
----- Forwarded Mail ----
> From: Roman Levenstein <romix.llvm at googlemail.com>
> To: CVS Commit Messages for LLVM repository <llvm-commits at cs.uiuc.edu>
> Send: Friday, 16. May 2008, 19:20:29
> Subject: Re: [llvm-commits] Speeding up RegAllocLinearScan on big test-cases
>
> Hi,
>
> 2008/5/7 Evan Cheng :
> > Can we hook up the llvm pool allocator to std::set and use it for the
> > register allocator? It's simple and it made a huge difference on Mac
> > OS X when we switched all LiveInterval VNInfo allocations to it.
> >
> > Evan
>
> Yes. We can hook up the llvm pool allocator to std::set. I have a working
> implementation.
>
>
> > On May 7, 2008, at 1:24 AM, Roman Levenstein wrote:
> >
> > > Hi,
> > >
> > > 2008/5/7 Bill Wendling :
> > >> On Tue, May 6, 2008 at 3:31 PM, Evan Cheng
> > >> wrote:
> > >>> On May 6, 2008, at 6:13 AM, Roman Levenstein
> > >>
> > >>
> > >>>> But one thing about std::set that could be eventually interesting
> > >>>> at
> > >>>> many places
> > >>>> is the following:
> > >>>> - in many situations we know the maximal size of the set in
> > >>>> advance.
> > >>>> For example,
> > >>>> in this patch, the set can contain at most all live intervals. In
> > >>>> the scheduler,
> > >>>> the availableQueue can contain at most all SUnits. It means that
> > >>>> if we would
> > >>>> be able to allocate the memory for the maximum possible number of
> > >>>> elements
> > >>>> in advance, then there is no need for any additional memory
> > >>>> allocation.
> > >>>>
> > >>>> - I think, a custom STL allocator could be written that could do
> > >>>> exactly this. It would
> > >>>> reserve memory for the maximum number of elements (of the equal
> > >>>> size?)and
> > >>>> maintain a free list of cells. Then we can have a very efficient
> > >>>> allocation and sets
> > >>>> that do no produce to much malloc/free pressure. The same idea can
> > >>>> be used
> > >>>> also for some other STL containers.
> > >>>>
> > >>>> What do you think?
> > >>>
> > >>> I am not sure. I have little experience with custom STL allocators.
> > >>> Perhaps others will chime in. For now, using std::set is fine.
> > >>>
> > >> I created custom allocators before. They aren't that bad. You just
> > >> have to get the functionality correct (the allocators I created
> > >> called
> > >> a malloc function that already had this functionality). The major
> > >> caveat is that if you don't have a well-tested memory manager
> > >> available, this can lead to nasty bugs. I would stick with std::set
> > >> unless it becomes a major albatross around our necks.
> > >
> > > I have a lot of experience with custom allocators. I extensively used
> > > them in the optimizing C compiler that I have written few years ago
> > > for an embedded target. Initially I was using simple malloc/free and
> > > new/delete. Then I moved to GC, but at the end I switched to
> > > custom memory allocators. I can only save, that they had very positive
> > > impact on the performance of my compiler.
> > >
> > > Custom memory allocators are not such a black art, as it may seem at
> > > first glance, actually. And there are quite well proven allocators.
> > > Usually, they
> > > are not that complex and it is rather easy to see if they are correct.
> > > Normally, they are used for node-based STL containers or for most
> > > typical nodes created by the compiler (e.g. SDNode, SDOperand,
> > > SUnit, etc).
> > > And they can really speed-up things, e.g. if they use pool
> > > allocation or
> > > segregated storage or if they free all the objects at once.
> > > For example, imagine that we use such an allocator for SUnit nodes.
> > > Then
> > > it may reserve memory for N SUnit objects and very quickly allocate
> > > it.
> > > Once scheduling is over, all such objects can be freed at once, just
> > > by
> > > cleaning/deleting the custom allocator.
> > >
> > > I'm currently experimenting with 4-5 different custom STL allocators
> > > I have found on the Internet. Once I have representative figures to
> > > compare them again STL's standard allocators and after I cleaned up the
> code,
> > > I'll report about it to this mailing list.
>
> OK. So, I've tested 6 allocators. One of them is the standard STL
> allocator. Other allocators
> we found by me on the Internet and made STL-compliant via a special
> templatized adapter class.
> I don't want to go in detail at this moment, since the code is not
> quite polished yet, but I'd
> mention that bump_allocator is an STL-compliant version of LLVM's pool
> allocator.
>
> My test checks their performance for allocating node-based containers,
> i.e. std::list and std::set.
> I try to insert 1000000 nodes into the list and set using different allocators.
> While doing it, I observe the following picture:
>
> ***Tests with ***
> Sort (ss_allocator):0.517465
> Sort (fsb_allocator):0.693605
> Sort (bump_allocator):0.5398639999
> Sort (fastalloc_allocator):0.5254200001
> Sort (boost_fastalloc_allocator):0.520713
> Sort (default allocator):0.631207
>
> ***Tests with ***
> Insertion (ss_allocator):0.8642740001
> Insertion (fsb_allocator):0.932031
> Insertion (bump_allocator):0.9571639999
> Insertion (fast_allocator):0.950616
> Insertion (boost_fast_allocator):0.9666030001
> Insertion (default_allocator):1.210076
>
> So, we can see, that performance-wise the difference is not that huge.
> But if we look at the number of new/delete calls, then it is quite different:
> 1) without STL standard allocator - Total of only 847(!!!) mallocs for
> all of the allocators together, while adding 1000000 nodes for each
> of them.
> 2) with STL standard allocator - Total of 2000628 mallocs for all of
> the allocators together, while adding 1000000 nodes for each of them.
>
> So, the standard allocator of STL produces a huge number of
> new/delete calls. And other allocators reduce it
> by almost 4 orders of magnitude. But, as mentioned before, it DOES NOT
> result in a big performance difference on
> my Ubuntu/Linux/x86 machine, which indicates that mallocs are very
> efficient here. But for you it seems to be very different...
>
> So the question is: why does STL allocator perform so poorly on
> PowerPC that you seem to use?
> Could it be that the malloc or STL implementation of your OS/compiler
> is particularly bad?
> Would it be possible for you to try using a custom malloc
> implementation that would replace that
> system malloc? E.g. could you try to link LLVM with Doug Lea's malloc,
> which is a standard implementation on
> the Linux/x86 systems? Or do you have other explanations for this?
>
> BTW, boost seems to provide a nice pool allocator. And it is
> "production grade" and "bullet-proof" compared
> to many others. Would it be too bad, if LLVM would use it? This does
> not mean that the whole boost should
> be available. There is a special tool provided with boost that
> extracts only those subsets of it, that are required.
> And boost has a BSD-like license.
>
> -Roman
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
__________________________________________________________
Gesendet von Yahoo! Mail.
Dem pfiffigeren Posteingang.
http://de.overview.mail.yahoo.com
More information about the llvm-dev
mailing list