Benchmarking file output strategies

Mon Dec 15 21:10:31 PST 2014

Do we know if writes that large are typical for the C++ to object file or
bitcode workloads?

On Mon, Dec 15, 2014 at 7:20 PM, Sean Silva <chisophugis at gmail.com> wrote:
>
> Michael and I just did some experiments on his machine. It looks like
> Windows is doing a huge amount of IO *after* the program exits (both mmap
> and write). Which reminds me: in your OP, the windows 7 vm mmap version
> (1.6s) is faster than the native Mac HFS+ version; those were on the same
> machine, right? If so, then it is weird that the vm was outperforming the
> native OS, so something like this IO-after-program-exit is probably at work.
>
> We did find on Michael's machine (Win 8) that the mmap version was
> generally slower, roughly similar to what I was seeing on my Mac (although
> on the Mac the data was being committed to disk before the program exited).
>
> -- Sean Silva
>
> On Mon, Dec 15, 2014 at 4:07 PM, Rafael Espíndola <
> rafael.espindola at gmail.com> wrote:
>>
>> > One more thing: on my particular machine (new Mac Pro), Rafael's test
>> > program is actually CPU-bottlenecked; the new Mac Pro's have insanely
>> fast
>> > SSD's connected over PCI-e. Just doing the CPU work of generating the
>> random
>> > numbers (1GiB version) takes 1.6s, which is basically the same time
>> that the
>> > write version takes; even just generating all 1GiB of random numbers in
>> > place (no large memory allocation involved, no file creation involved)
>> takes
>> > 1.1s. Just writing 1GiB to disk sequentially takes 1.0s.
>> >
>> > Also, it takes about .45 seconds to just memset a 1GiB malloc'd region
>> with
>> > 0's; most of this is virtual memory overhead, since if you reuse the
>> same
>> > 1GiB region, then after the first run it takes <0.1s.
>>
>> Interesting!
>>
>> I tried writing 0xabcdabcdabcdabcd instead of random numbers. The most
>> interesting cases are probably
>>
>> tmpfs:
>> 4.930983e-01
>> 4.236851e-01
>>
>> The time difference dropped from 0.37s to just 0.069s.  Using less cpu
>> for the number generation is benefitting the mmap run more than the
>> write one.
>>
>> On windows mmap is still faster by 3 to 5x depending on the run (with
>> mmap taking about 1s).  Using FileWrite instead of write helps, but
>> not by much.
>>
>> I guess the somewhat reasonable findings so far are
>>
>> * On windows mmap is faster.
>> * On posix, write can be faster, but by how much depends on what else
>> the program is doing.
>>
>> OK. That is sufficient to convince me to just try porting llvm-ar to
>> FileOutputBuffer and see what the impact is. Someone really motivated
>> might want to check if lld would get any faster with writes on posix
>> systems (but keep using mmap on windows).
>>
>> Cheers,
>> Rafael
>>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141215/711f04b3/attachment.html>