[PATCH] D42030: [WebAssembly] Define __heap_base global

Wed Jan 17 14:55:56 PST 2018

ncw added a comment.

In https://reviews.llvm.org/D42030#979107, @sbc100 wrote:

> Also, I don't really understand your logic regarding the size of your bitmap.  Surely the larger the pages, the fewer pages exist, the smaller the bitmap required to track them?   Perhaps I'm missing something?

That's spot on. Smaller pages (more granular) mean less wastage during allocations, but make the bitmap a bit bigger.

In https://reviews.llvm.org/D42030#979122, @sbc100 wrote:

> The musl malloc implementation seems to choose to use brk() directly over mmap, which means it will always be working with a contiguous region, so there shouldn't be any fragmentation in this case.
>
> Even in the mmap case fragmentation should be problem according the comment in the code `Expand the heap in-place if brk can be used, or otherwise via mmap, using an exponential lower bound on growth by mmap to make fragmentation asymptotically irrelevant.`

Musl uses brk for "small" allocations (<120KiB) - and uses mmap for all large allocations, and as a fallback when brk fails. Hence, Musl's malloc //requires// the "kernel" to provide mmap support, and brk is totally optional. My Musl port simply returns ENOSYS.

The comment is regarding fragmentation as size tends to infinity. But my arithmetic is wastage, and for "medium" allocations of 100-200KiB where I would be concerned by 30% space wastage.

In https://reviews.llvm.org/D42030#979303, @dschuff wrote:

> It seems like the real issue is that because it's meant for Linux, musl assumes the existence of Linux's mmap. On traditional architectures, there's a machine page size, which is also the granularity of mmap on Linux (technically you can have some kinds of mappings that don't end on page boundaries but those pages will just be fragmented).  There's really no reason to do anything not on page boundaries, adn 4k is a pretty small page. The with wasm is that there's really no such thing as `mmap` for wasm (or I should say, for any wasm embedding, since mmap is an OS feature rather than an architecture feature). mmap is a strange swiss army knife, and currently the only functionality is to grow the memory at the end; If and when additional functionality is added (file mapping, protection, unmapped space, etc) that would be as separate calls. So mmap must always be emulated and if it is emulated then it's really just another layer of allocation. To me it makes much more sense to have one allocator (malloc) which knows directly about how wasm memory is managed (instead of assuming mmap's behavior which will always be a fiction, at least in JS-wasm). Since the real behavior is more or less `brk()` it makes sense to use that logic in existing allocators.

If you're willing to write your own malloc, that's great! But my Musl port has //zero// changes to the Musl core code (after a few patches were accepted upstream). The entirely of Wasm support is in architecture-specific directories like `arch/wasm` and `src/internal/wasm`. So for better or worse, using Musl's built-in allocator does force the Wasm port to provide mmap support. As you say, it would mean cobbling together a new malloc to work only on brk.

(I did have a look into doing that actually: the problem is that malloc uses "bins" to hold free lists. In Musl's malloc, the bins stop at 120Kib, at which point you transition to direct mmap. If there's no mmap, then the bins need to be unbounded, and that's actually quite a lot more bins to support allocations going up to 1GiB, since the bin sizes are not quite logarithmic. Basically what scales well for small allocations won't scale for big ones, and you'd end up with a two-tier allocator no matter what, I think.)

> Likewise, because memory can never be allocated from the underlying system in increments of less than 64k, it doesn't really make much sense to pretend otherwise by implementing an mmap for malloc to use.

I don't see that. How the underlying system allocates memory, and how it's split up for application use, don't have to relate at all. Exactly the same reasoning would say, "well x86 uses 4K pages so it doesn't make sense for malloc to subdivide that for a 10-byte allocation". I do agree though that it's odd/unusual and also awkward for mmap //not// to reflect the kernel's page size - but odd doesn't make it wrong, or inefficient.

> More practically, what advantage would your 2-level (malloc+mmap) allocation have vs just using malloc's usual non-mmap mechanism for all allocations (or at least, up to a larger allocation size compared to systems with 4k pages)?

The advantage is that malloc doesn't have a non-mmap for all allocations.

I really wouldn't mind putting in 64KiB pages for Musl's mmap implementation - but it wouldn't get rid of the second allocation layer (and its bitvector), nor would it make the code simpler, since you'd need just the same machinery for mapping blocks on top of Wasm's "brk" mechanism, no matter what the page size is chosen to be.

Hope that helps. What my Musl does shouldn't really matter at this stage, no-one else is using it yet!

Repository:
  rLLD LLVM Linker

https://reviews.llvm.org/D42030