[PATCH] D86905: Flush bitcode incrementally for LTO output

Wed Sep 2 13:10:34 PDT 2020

stephan.yichao.zhao added a comment.

In D86905#2251511 <https://reviews.llvm.org/D86905#2251511>, @evgeny777 wrote:

>> Our case is a bit different. Given a 512M incremental flush threshold, I tested an LTO built that outputs a 5G bitcode file. The BackpatchWord is called 16,613,927 times, among which only 12 needs disk seek. Plus, each access visits 4-8 bytes on a page, and all visited pages are far away from each other. It is likely that the pages are not cached, and need to load anyway, and after a load, our code does not access enough data on a page to 'cancel' the page fault cost. So its cost could be very similar to seek.
>
> It seems that you're trying to implement your own I/O caching. I don't understand why you're not letting OS to do this for you. For instance on systems with larger amount of memory (I have 64 GB on my home PC, typical build server may have even more) mmap will buffer all your 5G bc file in memoy and then write it back to disk without any seek operations (which are costly on traditional HDD).

My local machine also has lot of memory to make this work. :) The problem is when LTO-ing thousands of such targets, the build server I am using throttles memory usage. 
ThinLTO (https://www.youtube.com/watch?v=9OIEZAj243g) has one similar motivation like this: build services do not allow memory consumption above some-G threshold.

Although disk seek has cost, it happens < 1 out of million only when generating large bitcode files merged by LTO. A bitcode generated from each compilation unit is much smaller, does not need any disk seek. 
So in practice, the disk seek overhead happens in a very small chance with a trade-off to save lot of memory to make build thousands of large targets by a build service.

>> Give the above and that mmap support is different across systems, the seek based approach seems fine.
>
> LLVM has `FileOutputBuffer` class which abstracts underlying OS differences. LLVM lld.lld linker uses it for output file generation

Thank you for sharing FileOutputBuffer. This is a useful platform-independent mmap. 
If it uses mmap, maybe it is not necessary to buffer the entire file contents in memory, but still leveraging OS page management. 
So it still needs to reload evicted pages, which is similar to seek. If it buffers all 5G in memory, the memory issue still exists.

The current WriteBitcodeToFile API assumes a raw_ostream argument. If we used FileOutputBuffer, we may want to encapsulate it as a subclass of raw_ostream like raw_fd_stream.
So raw_fd_stream can be extended to use FileOutputBuffer internally when necessary in the future.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86905/new/

https://reviews.llvm.org/D86905