[LLVMdev] MmapAllocator

Sun Aug 8 21:20:17 PDT 2010

On Sun, Aug 8, 2010 at 8:20 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:
>
> On Aug 7, 2010, at 7:05 PM, Steven Noonan wrote:
>> I've been doing work on memory reduction in Unladen Swallow, and
>> during testing, LiveRanges seemed to be consuming one of the largest
>> chunks of memory.
>
> That's interesting. How did you measure this? I'd love to see your data.
>
> Note that the LiveRange struct is allocated by a plain std::vector, and your patch doesn't change that. I assume you are talking about the VNInfo structs?

Steven has been using Instruments, and sending us screenshots.  Does
anyone else know a better way of exporting that data?

I thought I dug into the register allocation code, and found the
VNInfo::Allocator typedef.  I assumed that was getting the traffic we
saw in Instruments, but I don't have the data to back that up.

>> I wrote a replacement allocator for use by
>> BumpPtrAllocator which uses mmap()/munmap() in place of
>> malloc()/free().
>
> It's a bit more complicated than that. Modern malloc's use a whole bag of tricks to avoid lock contention in multiprocessor systems, and they know which allocation size the kernel likes, and which system calls to use.
>
> By calling mmap directly, you are throwing all that system specific knowledge away.

So the goal of this particular modification was to find ways to return
large, one-time allocations that happen during compilation back the
OS.  For unladen-swallow, we have a long-lived Python process where we
JIT code every so often.  We happen to generate an ungodly amount of
code, which we're trying to reduce.  However, this means that LLVM
allocates a lot of memory for us, and it grows our heap by several MB
over what it would normally be.  The breakdown was roughly 8 MB gets
allocated for this one compilation in the spam_bayes benchmark, with 2
MB coming form register allocation and 2 MB from SDNodes.

We are looking at using mmap/munmap to avoid permanently growing the heap.

This patch switches all allocators over to mmap, so you can see a lot
of "stitches" in the graphs, where an allocator is created and thrown
away quickly.  Those allocations are probably better served by malloc.

> It's great that you provide measurements, but it's not clear what you are measuring. Does 'mem max' include the overhead of asking the kernel for tiny 4K allocations, if any? Also, what is your operating system and architecture? That could make a big difference.

The memory size measurements in this data are all taken using
/proc/smaps data in Linux to find the number of dirty pages.

> Have you looked at the effect of twiddling the default 4K slab size in BumpPtrAllocator? I suspect you could get more dramatic results that way.

If we did that, one thing that might happen is malloc might start
forwarding to mmap, but I think you have to allocate ~128K at a time
to hit that threshold.

Reid