[llvm-commits] Speeding up RegAllocLinearScan on big test-cases

Mon May 19 03:13:00 PDT 2008

Hi,

2008/5/19 Roman Levenstein <romix.llvm at googlemail.com>:
> Hi,
>
>  2008/5/18 Chris Lattner <clattner at apple.com>:
>
> >
>  >
>  >
>  > On May 16, 2008, at 8:20 AM, Roman Levenstein wrote:
>  >
>  > So, we can see, that performance-wise the difference is not that huge.
>  > But if we look at the number of new/delete calls, then it is quite
>  > different:
>  > 1) without STL standard allocator - Total of only 847(!!!) mallocs for
>  > all of the allocators together,  while adding 1000000 nodes for each
>  > of them.
>  > 2) with STL standard allocator - Total of 2000628 mallocs for all of
>  > the allocators together, while adding 1000000 nodes for each of them.
>  >
>  > So,  the standard allocator of STL produces a huge number of
>  > new/delete calls. And other allocators reduce it
>  > by almost 4 orders of magnitude. But, as mentioned before, it DOES NOT
>  > result in a big performance difference on
>  > my Ubuntu/Linux/x86 machine, which indicates that mallocs are very
>  > efficient here. But for you it seems to be very different...
>  >
>  > Hey Roman,
>
>
> > Also, I don't know how your setup is arranged.
>
>  I have written a special small program to test different allocators. I
>  have there test_list() and test_set() methods, that test std::list and
>  std::set respectively.
>  Each of them creates all the empty lists/sets first, sets different
>  allocators to handle their respective lists/sets and then tries to add
>  1000000 elements to them.
>  Once it is done, the test function:
>  - sorts lists.
>  - or deletes some elements from set (e.g. every 4th o so), intermixed
>  with insertion.
>
>
>  > In the case of
>  > LLVM, the heap is significantly fragmented before a pass starts beating on
>  > (f.e.) a std::map.  When the nodes go through malloc, this means that the
>  > nodes for one map end up scattered throughout the heap.  My guess is that
>  > this makes traversals far more expensive than when using a pool allocator
>  > that makes them contiguous.
>
>  May be. In my case, there are no preallocated data structures on the
>  heap and therefore it is not very fragmented.
>
>
>  > it would be interesting to measure the locality effect of using new/delete.
>
>  Well, it goes eventually too far for my small investigation :-) What I
>  could do, is to add a small initialization pass to my test program
>  that would try to allocate a lot of objects and try free some of them
>  randomly. It should probably create a fragmented heap. And then I can
>  start my experiments, as they are currently done.

I implemented this proposal.

> If your assumption holds, I should see that STL's standard allocator would perform much  worse than before.

Chris, you're absolutely right!!!

The standard STL allocators is now much slower after a special prepass
creates artificially a lot of fragmented memory. When I redo the same
tests that I did before, the performance figures for STl containers
using standard allocators are about 2 times worse than for all other
allocators(whose performance is unaffected by the heap fragmentation).
I can also nicely see that the performance of containers using the
standard STL allocator constantly degrades as memory fragmentation
increases.

-Roman