[llvm-commits] Speeding up RegAllocLinearScan on big test-cases

Mon May 19 01:34:13 PDT 2008

Hi,

2008/5/18 Chris Lattner <clattner at apple.com>:
>
>
>
> On May 16, 2008, at 8:20 AM, Roman Levenstein wrote:
>
> So, we can see, that performance-wise the difference is not that huge.
> But if we look at the number of new/delete calls, then it is quite
> different:
> 1) without STL standard allocator - Total of only 847(!!!) mallocs for
> all of the allocators together,  while adding 1000000 nodes for each
> of them.
> 2) with STL standard allocator - Total of 2000628 mallocs for all of
> the allocators together, while adding 1000000 nodes for each of them.
>
> So,  the standard allocator of STL produces a huge number of
> new/delete calls. And other allocators reduce it
> by almost 4 orders of magnitude. But, as mentioned before, it DOES NOT
> result in a big performance difference on
> my Ubuntu/Linux/x86 machine, which indicates that mallocs are very
> efficient here. But for you it seems to be very different...
>
> Hey Roman,

> Also, I don't know how your setup is arranged.

I have written a special small program to test different allocators. I
have there test_list() and test_set() methods, that test std::list and
std::set respectively.
Each of them creates all the empty lists/sets first, sets different
allocators to handle their respective lists/sets and then tries to add
1000000 elements to them.
Once it is done, the test function:
- sorts lists.
- or deletes some elements from set (e.g. every 4th o so), intermixed
with insertion.

> In the case of
> LLVM, the heap is significantly fragmented before a pass starts beating on
> (f.e.) a std::map.  When the nodes go through malloc, this means that the
> nodes for one map end up scattered throughout the heap.  My guess is that
> this makes traversals far more expensive than when using a pool allocator
> that makes them contiguous.

May be. In my case, there are no preallocated data structures on the
heap and therefore it is not very fragmented.

> it would be interesting to measure the locality effect of using new/delete.

Well, it goes eventually too far for my small investigation :-) What I
could do, is to add a small initialization pass to my test program
that would try to allocate a lot of objects and try free some of them
randomly. It should probably create a fragmented heap. And then I can
start my experiments, as they are currently done. If your assumption
holds, I should see that STL's standard allocator would perform much
worse than before.

I can also post my sources to the mailing list later, so that you or
Evan could test it on OS X to see if we have a similar picture on the
same inputs.
Does it sound reasonable?

Anyway, I think, that it is clear that using a custom allocator for a
concrete instance of the std::set is most likely a win, at least when
it comes to the number of mallocs involved and eventually in
performance. I'll try to clean-up the code for the allocators before I
post them to the mailing list.

BTW, the reason for talking about boost was not its performance, but
more its cross-platform support. The problem with some of the
allocators I've found so far was the fact that I'm not sure about
their ability to support such features like:
 - different alignments
 - exception safety
 - 64bit clean implementations
 - different compilers
and many others. I mean, they are pretty much simple and basically
work for most typical cases. But they are not quite "bullet-proof"
compared to e.g. Boost pool allocators. On the other hand, the current
BumpPtrAllocator in LLVM is also not quite safe in this regard. So, we
don't loose anything ;-)

> Relevant reading :)
> http://llvm.org/pubs/2005-05-21-PLDI-PoolAlloc.html

Yep. I've read it after it appeared ;-) Very good reading!

-Roman