<br><br><div class="gmail_quote">On Sun, Aug 8, 2010 at 9:20 PM, Reid Kleckner <span dir="ltr"><<a href="mailto:reid.kleckner@gmail.com">reid.kleckner@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div class="im">On Sun, Aug 8, 2010 at 8:20 PM, Jakob Stoklund Olesen <<a href="mailto:stoklund@2pi.dk">stoklund@2pi.dk</a>> wrote:<br>
><br>
> On Aug 7, 2010, at 7:05 PM, Steven Noonan wrote:<br>
>> I've been doing work on memory reduction in Unladen Swallow, and<br>
>> during testing, LiveRanges seemed to be consuming one of the largest<br>
>> chunks of memory.<br>
><br>
> That's interesting. How did you measure this? I'd love to see your data.<br>
><br>
> Note that the LiveRange struct is allocated by a plain std::vector, and your patch doesn't change that. I assume you are talking about the VNInfo structs?<br>
<br>
</div>Steven has been using Instruments, and sending us screenshots. Does<br>
anyone else know a better way of exporting that data?<br></blockquote><div><br></div><div>So, just so you're aware, direct calls to mmap are not intercepted and reported by Instruments. So using mmap instead of malloc will make your _reported_ numbers go down, but that doesn't necessarily mean you have better performance. </div>
<div><br></div><div>This is a problem for people doing performance measurements on Mac OS X and iOS, because exotic memory allocation schemes seem to be becoming more common (I hope not because they dodge reporting!). In particular, may image buffers are allocated directly from mmap and vm_allocate, within CoreGraphics and elsewhere. </div>
<div><br></div><div>-Ken</div><div>Cocoa Frameworks</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<br>
I thought I dug into the register allocation code, and found the<br>
VNInfo::Allocator typedef. I assumed that was getting the traffic we<br>
saw in Instruments, but I don't have the data to back that up.<br>
<div class="im"><br>
>> I wrote a replacement allocator for use by<br>
>> BumpPtrAllocator which uses mmap()/munmap() in place of<br>
>> malloc()/free().<br>
><br>
> It's a bit more complicated than that. Modern malloc's use a whole bag of tricks to avoid lock contention in multiprocessor systems, and they know which allocation size the kernel likes, and which system calls to use.<br>
><br>
> By calling mmap directly, you are throwing all that system specific knowledge away.<br>
<br>
</div>So the goal of this particular modification was to find ways to return<br>
large, one-time allocations that happen during compilation back the<br>
OS. For unladen-swallow, we have a long-lived Python process where we<br>
JIT code every so often. We happen to generate an ungodly amount of<br>
code, which we're trying to reduce. However, this means that LLVM<br>
allocates a lot of memory for us, and it grows our heap by several MB<br>
over what it would normally be. The breakdown was roughly 8 MB gets<br>
allocated for this one compilation in the spam_bayes benchmark, with 2<br>
MB coming form register allocation and 2 MB from SDNodes.<br>
<br>
We are looking at using mmap/munmap to avoid permanently growing the heap.<br>
<br>
This patch switches all allocators over to mmap, so you can see a lot<br>
of "stitches" in the graphs, where an allocator is created and thrown<br>
away quickly. Those allocations are probably better served by malloc.<br>
<div class="im"><br>
> It's great that you provide measurements, but it's not clear what you are measuring. Does 'mem max' include the overhead of asking the kernel for tiny 4K allocations, if any? Also, what is your operating system and architecture? That could make a big difference.<br>
<br>
</div>The memory size measurements in this data are all taken using<br>
/proc/smaps data in Linux to find the number of dirty pages.<br>
<div class="im"><br>
> Have you looked at the effect of twiddling the default 4K slab size in BumpPtrAllocator? I suspect you could get more dramatic results that way.<br>
<br>
</div>If we did that, one thing that might happen is malloc might start<br>
forwarding to mmap, but I think you have to allocate ~128K at a time<br>
to hit that threshold.<br>
<font color="#888888"><br>
Reid<br>
</font><div><div></div><div class="h5"><br>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
</div></div></blockquote></div><br>