Benchmarking file output strategies

Mon Dec 15 16:07:24 PST 2014

> One more thing: on my particular machine (new Mac Pro), Rafael's test
> program is actually CPU-bottlenecked; the new Mac Pro's have insanely fast
> SSD's connected over PCI-e. Just doing the CPU work of generating the random
> numbers (1GiB version) takes 1.6s, which is basically the same time that the
> write version takes; even just generating all 1GiB of random numbers in
> place (no large memory allocation involved, no file creation involved) takes
> 1.1s. Just writing 1GiB to disk sequentially takes 1.0s.
>
> Also, it takes about .45 seconds to just memset a 1GiB malloc'd region with
> 0's; most of this is virtual memory overhead, since if you reuse the same
> 1GiB region, then after the first run it takes <0.1s.

Interesting!

I tried writing 0xabcdabcdabcdabcd instead of random numbers. The most
interesting cases are probably

tmpfs:
4.930983e-01
4.236851e-01

The time difference dropped from 0.37s to just 0.069s.  Using less cpu
for the number generation is benefitting the mmap run more than the
write one.

On windows mmap is still faster by 3 to 5x depending on the run (with
mmap taking about 1s).  Using FileWrite instead of write helps, but
not by much.

I guess the somewhat reasonable findings so far are

* On windows mmap is faster.
* On posix, write can be faster, but by how much depends on what else
the program is doing.

OK. That is sufficient to convince me to just try porting llvm-ar to
FileOutputBuffer and see what the impact is. Someone really motivated
might want to check if lld would get any faster with writes on posix
systems (but keep using mmap on windows).

Cheers,
Rafael