[LLVMdev] MmapAllocator

Mon Aug 9 13:42:24 PDT 2010

On Sun, Aug 8, 2010 at 9:20 PM, Reid Kleckner <reid.kleckner at gmail.com>wrote:

> On Sun, Aug 8, 2010 at 8:20 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk>
> wrote:
> >
> > On Aug 7, 2010, at 7:05 PM, Steven Noonan wrote:
> >> I've been doing work on memory reduction in Unladen Swallow, and
> >> during testing, LiveRanges seemed to be consuming one of the largest
> >> chunks of memory.
> >
> > That's interesting. How did you measure this? I'd love to see your data.
> >
> > Note that the LiveRange struct is allocated by a plain std::vector, and
> your patch doesn't change that. I assume you are talking about the VNInfo
> structs?
>
> Steven has been using Instruments, and sending us screenshots.  Does
> anyone else know a better way of exporting that data?
>

So, just so you're aware, direct calls to mmap are not intercepted and
reported by Instruments.  So using mmap instead of malloc will make your
_reported_ numbers go down, but that doesn't necessarily mean you have
better performance.

This is a problem for people doing performance measurements on Mac OS X and
iOS, because exotic memory allocation schemes seem to be becoming more
common (I hope not because they dodge reporting!).  In particular, may image
buffers are allocated directly from mmap and vm_allocate, within
CoreGraphics and elsewhere.

-Ken
Cocoa Frameworks

>
> I thought I dug into the register allocation code, and found the
> VNInfo::Allocator typedef.  I assumed that was getting the traffic we
> saw in Instruments, but I don't have the data to back that up.
>
> >> I wrote a replacement allocator for use by
> >> BumpPtrAllocator which uses mmap()/munmap() in place of
> >> malloc()/free().
> >
> > It's a bit more complicated than that. Modern malloc's use a whole bag of
> tricks to avoid lock contention in multiprocessor systems, and they know
> which allocation size the kernel likes, and which system calls to use.
> >
> > By calling mmap directly, you are throwing all that system specific
> knowledge away.
>
> So the goal of this particular modification was to find ways to return
> large, one-time allocations that happen during compilation back the
> OS.  For unladen-swallow, we have a long-lived Python process where we
> JIT code every so often.  We happen to generate an ungodly amount of
> code, which we're trying to reduce.  However, this means that LLVM
> allocates a lot of memory for us, and it grows our heap by several MB
> over what it would normally be.  The breakdown was roughly 8 MB gets
> allocated for this one compilation in the spam_bayes benchmark, with 2
> MB coming form register allocation and 2 MB from SDNodes.
>
> We are looking at using mmap/munmap to avoid permanently growing the heap.
>
> This patch switches all allocators over to mmap, so you can see a lot
> of "stitches" in the graphs, where an allocator is created and thrown
> away quickly.  Those allocations are probably better served by malloc.
>
> > It's great that you provide measurements, but it's not clear what you are
> measuring. Does 'mem max' include the overhead of asking the kernel for tiny
> 4K allocations, if any? Also, what is your operating system and
> architecture? That could make a big difference.
>
> The memory size measurements in this data are all taken using
> /proc/smaps data in Linux to find the number of dirty pages.
>
> > Have you looked at the effect of twiddling the default 4K slab size in
> BumpPtrAllocator? I suspect you could get more dramatic results that way.
>
> If we did that, one thing that might happen is malloc might start
> forwarding to mmap, but I think you have to allocate ~128K at a time
> to hit that threshold.
>
> Reid
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100809/358c4e40/attachment.html>