[LLVMdev] MmapAllocator

Mon Aug 9 11:00:03 PDT 2010

2010/8/9 Török Edwin <edwintorok at gmail.com>:
> On Mon, 9 Aug 2010 10:17:27 -0700
> Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:
>
>>
>> On Aug 9, 2010, at 9:54 AM, Török Edwin wrote:
>>
>> > With mmap() it is always possible to fully release the memory once
>> > you are done using it.
>>
>> Sure. Is that the goal, though?
>
> If goal is to reduce fragmentation, possibly. You
> don't know if you have fragmentation or not, the JITed app may fragment
> memory for example.

Yes, the goal is to fully release the memory back to the OS.

>> Why isn't malloc doing it already?
>
> Because it can't. sbrk() can only increase/decrease memory usage at the
> end (like a stack), you can't release something in the middle.
> Thats one of the reasons why we wrote a pool-based memory allocator for
> ClamAV.

Another thing malloc could do is to use madvise with MADV_DONTNEED to
free the pages in the middle of t heap, but malloc can't read your
mind, so it doesn't know that you aren't about to reallocate that
region of the heap.

>>
>> > With malloc() no, it takes just 1 allocation at the end of the heap
>> > to keep all the rest allocated. That wouldn't be a problem if libc
>> > would use mmap() as the low-level allocator for malloc but it
>> > doesn't. It uses sbrk() mostly for small (<128k) allocations, and
>> > even with mmaps it caches them for a while.
>>
>> Recommended reading:
>> http://people.freebsd.org/~jasone/jemalloc/bsdcan2006/jemalloc.pdf
>
> If jemalloc provides same or better memory usage than
> MMapAllocator, I think it'd be better to have a JEMallocAllocator
> instead.
> I think jemalloc is fairly portable (firefox uses it), isn't it?

Reading the abstract, jemalloc seems like it has nothing to do with
keeping the total heap usage low and everything to do with performance
in a multithreaded app.

>> > I think that is because mmap() is slow in multithreaded apps, since
>> > it needs to take a process level lock, which also contends with the
>> > lock taken by pagefaults from other existing mmaps (in fact that
>> > lock is held during disk I/O!).
>>
>> Sounds awesome, let's do that ;-)
>
> Multithreaded performance should probably be benchmarked on a real app
> with MMapAllocator, and with the MallocAllocator.

I predict that mmap will be slower than malloc, for obvious reasons.
The only way in which mmap could be better is that it reduces your
steady state heap usage.

>> You are also leaving a bunch of 4K holes in your address space. On
>> 32-bit systems, address space is a scarce resource.
>
> Doesn't BumpPtrAllocator use a larger chunk size?

Nope, it defaults to 4K.  IMO that should be bumped up (pun wasn't
intended, but then I left it in...).  Especially if we want to use
mmap as the allocator, increasing the slab size will reduce the number
of expensive system calls that grab the process lock.

Reid