[llvm-commits] Speeding up RegAllocLinearScan on big test-cases
Roman Levenstein
romix.llvm at googlemail.com
Mon May 19 03:13:00 PDT 2008
Hi,
2008/5/19 Roman Levenstein <romix.llvm at googlemail.com>:
> Hi,
>
> 2008/5/18 Chris Lattner <clattner at apple.com>:
>
> >
> >
> >
> > On May 16, 2008, at 8:20 AM, Roman Levenstein wrote:
> >
> > So, we can see, that performance-wise the difference is not that huge.
> > But if we look at the number of new/delete calls, then it is quite
> > different:
> > 1) without STL standard allocator - Total of only 847(!!!) mallocs for
> > all of the allocators together, while adding 1000000 nodes for each
> > of them.
> > 2) with STL standard allocator - Total of 2000628 mallocs for all of
> > the allocators together, while adding 1000000 nodes for each of them.
> >
> > So, the standard allocator of STL produces a huge number of
> > new/delete calls. And other allocators reduce it
> > by almost 4 orders of magnitude. But, as mentioned before, it DOES NOT
> > result in a big performance difference on
> > my Ubuntu/Linux/x86 machine, which indicates that mallocs are very
> > efficient here. But for you it seems to be very different...
> >
> > Hey Roman,
>
>
> > Also, I don't know how your setup is arranged.
>
> I have written a special small program to test different allocators. I
> have there test_list() and test_set() methods, that test std::list and
> std::set respectively.
> Each of them creates all the empty lists/sets first, sets different
> allocators to handle their respective lists/sets and then tries to add
> 1000000 elements to them.
> Once it is done, the test function:
> - sorts lists.
> - or deletes some elements from set (e.g. every 4th o so), intermixed
> with insertion.
>
>
> > In the case of
> > LLVM, the heap is significantly fragmented before a pass starts beating on
> > (f.e.) a std::map. When the nodes go through malloc, this means that the
> > nodes for one map end up scattered throughout the heap. My guess is that
> > this makes traversals far more expensive than when using a pool allocator
> > that makes them contiguous.
>
> May be. In my case, there are no preallocated data structures on the
> heap and therefore it is not very fragmented.
>
>
> > it would be interesting to measure the locality effect of using new/delete.
>
> Well, it goes eventually too far for my small investigation :-) What I
> could do, is to add a small initialization pass to my test program
> that would try to allocate a lot of objects and try free some of them
> randomly. It should probably create a fragmented heap. And then I can
> start my experiments, as they are currently done.
I implemented this proposal.
> If your assumption holds, I should see that STL's standard allocator would perform much worse than before.
Chris, you're absolutely right!!!
The standard STL allocators is now much slower after a special prepass
creates artificially a lot of fragmented memory. When I redo the same
tests that I did before, the performance figures for STl containers
using standard allocators are about 2 times worse than for all other
allocators(whose performance is unaffected by the heap fragmentation).
I can also nicely see that the performance of containers using the
standard STL allocator constantly degrades as memory fragmentation
increases.
-Roman
More information about the llvm-commits
mailing list