[PATCH] D78058: option to write files to memory instead of disk

Fri Dec 18 19:59:31 PST 2020

dexonsmith added subscribers: erik.pilkington, arphaman, Bigcheese.
dexonsmith added a comment.

I think this is important; thanks for working on it!

I'm sorry I missed it before... coincidentally, I'm working in a similar problem space right now (likely I'll post a patch with an RFC next week) but my design is a bit different.

The main abstraction I came up with is in two parts:

- An `OutputManager`, which can support opening / closing outputs with configurable destinations (kind of like a chain of consumers, but my initial design is monolithic to simplify optional optimizations like using a `WriteThroughMemoryBuffer` for mirroring a just-written on-disk file in memory). The expectation is that various `CompilerInstances` (and other things) could share one of these. This is not a file system; it just has very simple "open" / "close" / "erase" APIs for output paths (initially it has three possible destinations: on-disk, in-memory (store in an `InMemoryFileSystem`), and a request-provided `raw_pwrite_ostream`; it can do any or all of these).
- An `OutputContext`, which can be instantiated to track multiple outputs in a specific context. Each `CompilerInstance` would have one of these, effectively replacing the `std::list<OutputFile>` it currently has (`clearOutputFiles` would call `OutputContext::closeAllOutputs`, which then calls `OutputManager::closeOutput` for each output its tracking). This can have partial configuration (partially overriding the manager's default configuration).

(Initially, I was thinking of a virtual base class `OutputManager` with subclasses `OnDiskOutputManager` and `InMemoryOutputManager`, and I didn't have `OutputContext` as a separate abstraction, but I don't think that gives us the flexibility we need for various use cases; see below.)

I also have:

- `OutputConfig`, configuration for a specific file; the default configuration lives on `OutputManager` (and is configurable).
- `PartialOutputConfig`, a partial configuration that overrides only some of a configuration; can be put on an `OutputContext` ("mirror all of my outputs in memory" or "don't write to disk, only to memory, for these outputs"), or put on an individual "open file" request ("the client needs seeking support for this output").

These are the use cases I think we should support:

- A "normal" clang invocation without implicit modules should just write to disk.
- If invoking clang from the command-line and implicit modules is turned on, the modules should be mirrored in memory (but also written to disk). My idea here is that VFS is constructed with an `OverlayFileSystem` at the top level and an initially-empty `InMemoryFileSystem` underneath in front of the "usual" one. The `OutputContext`s for the modules builds would be configured to mirror all outputs into the `InMemoryFileSystem`. (This is a step toward dropping the rather awkward `InMemoryModuleCache`; I think that will be easy once this lands.)
- Index services often don't want outputs written to disk. If they're happy passing in an `InMemoryFileSystem` to the `OutputManager`, they can use the same approach.
- When clang-scan-deps is used to do a modules pre-scan, its source-minimized modules should never hit the disk. The `OutputManager` can be configured to only fill up the `InMemoryFileSystem`.
- For unit tests or a `-cc1` compile-in-memory API where we care about a single output file, it might be nice to be able to ignore most files (write to `raw_null_ostream`), but get back specific file content and/or configure certain files in some special way.
- There are lots of other outputs in clang (e.g., index-while-building files from `-index-store-path`, which is still being upstreamed), which all do their own temp-file dance (... or worse, they don't), and they should be able to delete code and use this as well; I think even references to `errs()` should go through so this could be captured in unit tests (or an in-memory compile API, or...).
- It should be easy to cleanly add more output destinations (e.g., CAS) as we find the need, and `OutputContext` users would not need to know about specific destinations unless they wanted some special configuration.

I'll reply here once I've posted the RFC and patch (as I said, I'm hoping next week) so you can take a look.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78058/new/

https://reviews.llvm.org/D78058