[PATCH] D86905: Flush bitcode incrementally for LTO output

stephan.yichao.zhao via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Sep 1 14:53:44 PDT 2020


stephan.yichao.zhao marked 2 inline comments as done.
stephan.yichao.zhao added inline comments.


================
Comment at: llvm/include/llvm/Bitstream/BitstreamWriter.h:157
+    {
+      FS->seek(ByteNo);
+      ssize_t BytesRead = FS->read(Bytes, BytesFromDisk);
----------------
evgeny777 wrote:
> Can we use memory mapped I/O and avoid backpatching on disk?
Our use case is likely not what mmap is good at.

I assume mmap in Linux loads pages on demand. If a code reads/writes data on pages already loaded, its access has no IO cost. For example, a code randomly accesses a chunk of continuous addresses or addresses within a same page. Although the first time a page is loaded, the memory copy and page fault cost are still paid, the cost is ignorable asymptotically. 

Our case is a bit different. Given a 512M incremental flush threshold, I tested an LTO built that outputs a 5G bitcode file. The BackpatchWord is called 16,613,927 times, among which only 12 needs disk seek. Plus, each access visits 4-8 bytes on a page, and all visited pages are far away from each other. It is likely that the pages are not cached, and need to load anyway, and after a load, our code does not access enough data on a page to 'cancel' the page fault cost. So its cost could be very similar to seek. 

Note that if a BackpatchWord needs to access disk, we need 1 seek to load existing data, 1 seek to overwrite the data, and 1 seek to jump back. The first 2 seek addresses are very close, hopefully disk cache can handle them. Although the last jump back seek is a very long jump, if a page cache is based no time or frequency, the page that it jumps back may not be evicted yet.  Overall the ratio of disk access introduced is very small, so hopefully its additional cost is small. I also did a perf profile, no observable latency is shown (because LTO takes too much time).

Give the above and that mmap support is different across systems, the seek based approach seems fine.


================
Comment at: llvm/include/llvm/Support/raw_ostream.h:570
+
+/// A raw_ostream that reads from, writes to and seeks a file descriptor.
+///
----------------
evgeny777 wrote:
> Comment is misleading
Will be updating this at https://reviews.llvm.org/D86913


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86905/new/

https://reviews.llvm.org/D86905



More information about the llvm-commits mailing list