Benchmarking file output strategies

Sean Silva chisophugis at gmail.com
Sat Dec 13 04:51:53 PST 2014


On Fri, Dec 12, 2014 at 3:30 PM, Rafael Ávila de Espíndola <
rafael.espindola at gmail.com> wrote:
>
> It seems that the common wisdom on the fastest way to create a file is
>
> * create the file
> * resize it to the final size
> * mmap it rw
> * write the data to the mapping
>
> I benchmarked that against doing 1 MB writes to create a 1GB file with
> pseudo random data.
>
> The test program is attached. The results I got were (in seconds, mmap is
> the first):
>
> btrfs
> 1.752698e+00
> 1.112864e+00
>
> tmpsfs
> 1.484731e+00
> 1.113772e+00
>
> hfs+ (laptop)
> 4.015817e+00
> 2.240137e+00
>
> windows 7 (vm)
> 1.609375e+00
> 3.875000e+00
>
> ext2 on arm (old google chrome book):
> 5.910171e+01
> 6.566929e+01
>
> So on Windows it is true, mmap seems to be faster than writes. On Linux
> and OS X x86_64 the situation is inverted. On arm mmap is a bit faster.
>
> It would be interesting to see if someone else can reproduce these
> numbers. It would be particularly nice to try a newer arm system and
> windows outside a vm.
>
> Also, does anyone have a theory of where the difference comes from?
>

On my Mac Pro (./test-file-output 1),
mmap: 2.8s
write: 1.6s

I DTrace'd the program to see what was going on. The mmap version spends
less time doing IO actually. The problem is that the IO doesn't start until
later. In fact, the IO doesn't start until the close, which occurs 1.8
seconds in.

>From poking around with DTrace on the Mac and seeing what is actually going
on in both cases, I'm guessing that on windows, the write() code path in
the kernel is atrociously bad (or lots of syscall overhead), so there is a
lot of overhead for each call, making the mmap version better. The
chromebook is probably taking long enough (or is under sufficient memory
pressure) that the kernel starts paging out before the close (I think linux
by default is every 30s?), giving most of the benefit of the write version,
but with naturally fewer overheads (especially if there is a spare core to
page out on the side). The tmpfs is probably getting slowed down in the
mmap case by setting up and tearing down the larger address space; you
might want to try enabling/disabling huge pages (2MB and 1GB flavors on
x86) to see if that affects things; I would be amazed if an mmap of a 1GB
page in tmpfs is not the fastest (on fast CPU's).

The virtual memory performance seems to be really bad on Mac... Replacing
the mapped_file_region with just malloc makes it barely any faster (with or
without free). Probably this is because my Mac wasn't using huge pages for
some reason (doesn't support huge pages?), as DTrace showed lots of kernel
functions called # bytes / 4096 times. Majnemer, do you know if Mac
supports huge pages? googling around doesn't seem to find anything (and I
can scarcely imagine that VMWare wouldn't be using them... maybe they
implement them in their kernel extension?).

-- Sean Silva


>
> Cheers,
> Rafael
>
>
>
>
> Sent from my iPhone
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141213/3af58eda/attachment.html>


More information about the llvm-commits mailing list