<div dir="ltr">Do we know if writes that large are typical for the C++ to object file or bitcode workloads?</div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Dec 15, 2014 at 7:20 PM, Sean Silva <span dir="ltr"><<a href="mailto:chisophugis@gmail.com" target="_blank">chisophugis@gmail.com</a>></span> wrote:<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra">Michael and I just did some experiments on his machine. It looks like Windows is doing a huge amount of IO *after* the program exits (both mmap and write). Which reminds me: in your OP, the windows 7 vm mmap version (1.6s) is faster than the native Mac HFS+ version; those were on the same machine, right? If so, then it is weird that the vm was outperforming the native OS, so something like this IO-after-program-exit is probably at work.</div><div class="gmail_extra"><br></div><div class="gmail_extra">We did find on Michael's machine (Win 8) that the mmap version was generally slower, roughly similar to what I was seeing on my Mac (although on the Mac the data was being committed to disk before the program exited).</div><div class="gmail_extra"><br></div><div class="gmail_extra">-- Sean Silva</div><div class="gmail_extra"><br><div class="gmail_quote"><span class="">On Mon, Dec 15, 2014 at 4:07 PM, Rafael Espíndola <span dir="ltr"><<a href="mailto:rafael.espindola@gmail.com" target="_blank">rafael.espindola@gmail.com</a>></span> wrote:</span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><span>> One more thing: on my particular machine (new Mac Pro), Rafael's test<br>

> program is actually CPU-bottlenecked; the new Mac Pro's have insanely fast<br>

> SSD's connected over PCI-e. Just doing the CPU work of generating the random<br>

> numbers (1GiB version) takes 1.6s, which is basically the same time that the<br>

> write version takes; even just generating all 1GiB of random numbers in<br>

> place (no large memory allocation involved, no file creation involved) takes<br>

> 1.1s. Just writing 1GiB to disk sequentially takes 1.0s.<br>

><br>

> Also, it takes about .45 seconds to just memset a 1GiB malloc'd region with<br>

> 0's; most of this is virtual memory overhead, since if you reuse the same<br>

> 1GiB region, then after the first run it takes <0.1s.<br>

<br>

</span></span>Interesting!<br>

<br>

I tried writing 0xabcdabcdabcdabcd instead of random numbers. The most<br>

interesting cases are probably<br>

<br>

tmpfs:<br>

4.930983e-01<br>

4.236851e-01<br>

<br>

The time difference dropped from 0.37s to just 0.069s.  Using less cpu<br>

for the number generation is benefitting the mmap run more than the<br>

write one.<br>

<br>

On windows mmap is still faster by 3 to 5x depending on the run (with<br>

mmap taking about 1s).  Using FileWrite instead of write helps, but<br>

not by much.<br>

<br>

I guess the somewhat reasonable findings so far are<br>

<br>

* On windows mmap is faster.<br>

* On posix, write can be faster, but by how much depends on what else<br>

the program is doing.<br>

<br>

OK. That is sufficient to convince me to just try porting llvm-ar to<br>

FileOutputBuffer and see what the impact is. Someone really motivated<br>

might want to check if lld would get any faster with writes on posix<br>

systems (but keep using mmap on windows).<br>

<br>

Cheers,<br>

Rafael<br>

</blockquote></div></div></div>

<br>_______________________________________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>

<br></blockquote></div></div>