[PATCH] D78058: option to write files to memory instead of disk

Tue Jan 5 13:09:38 PST 2021

dexonsmith added a comment.

In D78058#2471735 <https://reviews.llvm.org/D78058#2471735>, @sammccall wrote:

> @dexonsmith thanks for sharing!
> Some initial thoughts since abstracting outputs is something we're starting to care about too...

Thanks for looking.

> This doesn't appear to be an extension point - you can write to an InMemoryFS or to real disk, but not to anything else. If we're going to add abstractions around output, I think they should support flexible use in libraries (which doesn't need to cost complexity here: FS vs MemoryFS vs custom storage can implement the same interface).
> With the right interface, I think this obviates the need for RequestedOutput and generalizes it - a caller can install an intercepting output backend that hooks certain paths and delegates the rest to the old backend.
> (Of course such a backend could be provided, but it gets the complexity out of the way of the core abstractions)

I agree with your concerns with the monolithic design. My motivation for it is to get the memory optimization of using a write-through memory buffer when the output is needed both on-disk and in-memory. I imagine similar concerns if/when we add a CAS-based storage option, where it's likely more efficient to ask the CAS to materialize the on-disk file than to write it ourselves.

I'll think about whether there's a clean way to model these kinds of interactions between storage implementations without a monolithic design; let me know if you have any concrete ideas.

The option I see is to have an abstract extension point that allows libraries to plug in custom storage, but keep the monolithic design for on-disk and in-memory (and eventually CAS).

> I think the lifecycle and allowed operations on outputs would be clearer if an `Output` class was returned, with methods, rather than a stream + handle and various "free" functions (well, methods on OutputManager) than manipulate the handle.
> This would again make it easier to reason about what operations are part of a storage backend: there'd be some `OutputBackend` interface to implement, and it would hand out `Output` objects which again must be implemented. (This would be reminiscent of FileManager + vfs::FileSystem + vfs::File).

I'll think some more about the options here. The current design is motivated by how outputs are currently used / referenced in `CompilerInstance` (and how the streams are passed around separately to APIs that don't know anything about outputs); i.e., the current design seems simple to land as an incremental change. But it's probably not hard to make it a bit cleaner.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78058/new/

https://reviews.llvm.org/D78058