<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Hi Sam:<br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On 2021 Mar  15, at 14:18, Duncan P. N. Exon Smith via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org" class="">cfe-dev@lists.llvm.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta http-equiv="Content-Type" content="text/html; charset=utf-8" class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Thanks for the replies Sam! I've just back after being out for a bit; I'll try to give a substantive reply later this week.<br class=""><div class=""><br class=""></div></div></div></blockquote><div><br class=""></div><div>(Took longer than I'd intended / expected for me to circle back, but here I am.)</div><div><br class=""></div><div><blockquote type="cite" class=""><blockquote type="cite" class=""><div dir="ltr" class=""><div class="gmail_quote"><div class="">I may have missed it (apologies if so), but is there somewhere a breakdown of which performance-critical configurations this is aimed at?</div><div class="">(e.g. many small outputs that don't benefit from streaming, mirrored to two output streams, but one is null)</div><div class="">Performance is certainly going to cut against simplicity sometimes, but it's hard to evaluate without knowing what we're optimizing for.</div></div></div></blockquote></blockquote><div class=""><br class=""></div><div class="">The main use case for this optimization — and the trigger for most of the complexity in this proposal — is to support the module cache, both by</div><div class="">- using output backend as a new implementation strategy for InMemoryModuleCache/etc. and</div><div class="">- (optionally?) capturing just-written implicit modules in a client-defined output backend.</div><div class="">The specific optimization I wanted to enable was for the "normal" case where the client is "normal" compilation from `/usr/bin/clang`, and we're writing to disk. In that case, it seems optimal to:</div><div class="">- use something akin to llvm::FileOutputBuffer to efficiently write to disk (using mmap) and</div><div class="">- save that just-written file-backed memory buffer to FileManager / InMemoryModuleCache / vfs::InMemoryFileSystem / somewhere.</div><div class=""><br class=""></div><div class="">Having had some more time to reflect (and getting some experience building on top of this patch for some tangentially-related experiments), I'm leaning strongly toward *not* trying to solve the in-memory module cache as an inherent part of OutputBackend, and virtualizing the module cache separately from (or on top of) this abstraction.</div><div class=""><br class=""></div><div class="">At a high-level, here's what I'm planning to do:</div><div class=""><br class=""></div><div class="">1. Minimal OutputBackend with a slimmed down OutputFile class, and two implementations: OnDiskOutputBackend and NullOutputBackend.</div><div class="">- We can use this to discuss tradeoffs around OutputConfig (enum+bitset wrapper for named-based parameters vs. struct for full flexibility).</div><div class="">- Doesn't really need output directories: clients for now can be expected to provide absolute paths, or call `sys::fs::change_directory`.</div><div class=""><br class=""></div><div class="">2. OutputDirectory / CWD / etc.</div><div class=""><br class=""></div><div class="">3. InMemoryOutputBackend, which saves files to a vfs::InMemoryFileSystem</div><div class="">- Maybe can't quite land before OutputDirectory is worked out, but most of the review can happen in parallel.</div><div class=""><br class=""></div><div class="">4. Temp files and temp directories</div><div class="">- Adding shared infrastructure that InMemoryOutputBackend and other virtual backends can use.</div><div class="">- Enables threading more outputs through the outputbackend</div></div><div><br class=""></div><div>5. Look more concretely at how best to virtualize the module cache using this.</div><div>- Or maybe this thinking will / should be front-loaded...</div><div><br class=""></div><div>WDYT?</div><div><br class=""></div><div>Thanks,</div><div>Duncan</div><br class=""><blockquote type="cite" class=""><div class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div class=""><blockquote type="cite" class=""><blockquote type="cite" class=""><div dir="ltr" class=""><div class="gmail_quote"><div class="">Only other thing is the scope of the patch - it's useful that as an RFC it's comprehensive and shows things fitting together.</div></div></div></blockquote></blockquote><div class=""><br class=""></div>Absolutely; agreed it needs to be split up for review.</div></div></div></blockquote><blockquote type="cite" class=""><br class=""></blockquote><blockquote type="cite" class=""><div class=""><div style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div class=""><blockquote type="cite" class=""><div class="">On 2021 Feb  22, at 06:11, Sam McCall <<a href="mailto:sammccall@google.com" class="">sammccall@google.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div dir="ltr" class="">Sorry about the long delay!</div><div dir="ltr" class=""><div class=""><br class=""></div><div class="">This looks <b class="">much</b> better to me, thank you for the simplifications!</div><div class="">I've made a bunch of notes on the patch itself which I'll send now too. High-level things inline: TL;DR: stream/buffer confusion in outputfile, scope.</div></div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Feb 10, 2021 at 3:12 AM Duncan P. N. Exon Smith <<a href="mailto:dexonsmith@apple.com" class="">dexonsmith@apple.com</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;" class="">Sam, any thoughts here?<br class=""><div class=""><br class=""><blockquote type="cite" class=""><div class="">On 2021 Feb  2, at 19:36, Duncan P. N. Exon Smith via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.org</a>> wrote:</div><br class=""><div class=""><span style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline" class="">Update: I've incorporated much of Sam's feedback into the main patch (</span><a href="https://reviews.llvm.org/D95501" style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" target="_blank" class="">https://reviews.llvm.org/D95501</a><span style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline" class="">).</span><div style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class=""><br class=""><div class=""><div class="">- Simplify OutputConfig, restricting it to semantic information about a specific output. Sink all backend configuration to flags on the backends themselves.<br class=""></div></div></div></div></blockquote></div></div></blockquote><div class="">The remaining options look great, and agree that we can't expect everything to be configured globally.</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;" class=""><div class=""><blockquote type="cite" class=""><div class=""><div style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class=""><div class=""><div class="">- Remove OutputManager, instead exposing OutputBackend directly.<br class=""></div></div></div></div></blockquote></div></div></blockquote><div class="">Hooray! </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;" class=""><div class=""><blockquote type="cite" class=""><div class=""><div style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class=""><div class=""><div class=""></div><div class="">- Merge Output and OutputDestination into a single class called OutputFile, and rename the API for creating them to OutputBackend::createFile().<br class=""></div><div class="">- Add support for working directories via OutputDirectory, and add OutputBackend::getDirectory() and OutputBackend::createDirectory().<br class=""></div><div class="">- Add support for createUniqueFile() and createUniqueDirectory(), both heavily used in clang. Backends without read access can use StableUniqueEntityAdaptor to implement these.<br class=""></div></div></div></div></blockquote></div></div></blockquote><div class="">This is definitely needed, but there are a few design options and I can't say I precisely understand either the requirements or the solution in the patch.</div><div class="">I think it would be useful to review this separately. </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;" class=""><div class=""><blockquote type="cite" class=""><div class=""><div style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class=""><div class=""><div class=""></div><div class=""><br class=""></div><div class="">The main thing not settled is the threading guarantees. Restating the straw man I proposed:</div><div class=""><br class=""></div><div class="">- All OutputBackend APIs are safe to call concurrently. Since OutputDirectory *is* an OutputBackend, it can be used concurrently as well.</div><div class="">- An OutputFile cannot be used concurrently, but two files from the same backend can be.</div></div></div></div></blockquote></div></div></blockquote><div class="">LGTM </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;" class=""><div class=""><blockquote type="cite" class=""><div class=""><div style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class=""><div class=""><div class=""><br class=""></div><div class="">Interested in everyone's thoughts on that; if that sounds reasonable I can update the patch to make it so.</div><div class=""><br class=""></div><div class=""><div style="overflow-wrap: break-word;" class=""><div class=""><div class=""><div class="">Two other points:</div><div class=""><br class=""></div><div class="">- Sam proposed dropping OutputConfig. I don't think we can, as the API client needs to communicate semantic information about specific outputs, not about the backends generally.</div><div class="">- Sam suggested OutputDestination (now OutputFile) seemed bloated. I talked through some of the details in my previous response; most of the complexity is there to make MirroringOutputBackend work efficiently and avoid duplication in subclasses. As a counterpoint, the only API a concrete subclass needs to override is storeContentBuffer().</div></div></div></div></div></div></div></div></blockquote></div></div></blockquote><div class="">Hmm, OutputFile still feels caught between two concepts: a buffer-based one and a stream based one.</div><div class="">I understand the goal to make this easy to implement and easy to use, but having a lot of conceptual distance between the two, and optional methods/implementation strategies is easy to misunderstand.</div><div class=""><br class=""></div><div class="">I may have missed it (apologies if so), but is there somewhere a breakdown of which performance-critical configurations this is aimed at?</div><div class="">(e.g. many small outputs that don't benefit from streaming, mirrored to two output streams, but one is null)</div><div class="">Performance is certainly going to cut against simplicity sometimes, but it's hard to evaluate without knowing what we're optimizing for.</div><div class=""><br class=""></div><div class=""> > MirroringOutputBackend work efficiently<br class=""></div><div class="">One thing I don't understand about MirroringOutputBackend - why does it buffer the content? It seems possible to create a raw_pwrite_stream that broadcasts to raw_pwrite_streams of the underlying outputs.</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;" class=""><div class=""><blockquote type="cite" class=""><div class=""><div style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class=""><div class=""><div class=""><br class=""></div></div><div class="">Sam, let please take another look and let me know if you have more high-level comments.</div></div></div></blockquote></div></div></blockquote><div class="">Only other thing is the scope of the patch - it's useful that as an RFC it's comprehensive and shows things fitting together.</div><div class="">But the amount of detail is a bit overwhelming :-)</div><div class="">I think the parts regarding directories, unique files, mirroring/filtering backends, many of the output options would be easier to review in detail in isolated patches.</div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="overflow-wrap: break-word;" class=""><div class=""><blockquote type="cite" class=""><div class=""><div style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class=""><div class="">Others, if you have feedback, please let me know!</div><div class=""><br class=""></div><div class="">Thanks!</div><div class="">Duncan</div><div class=""><br class=""></div><div class=""><div class=""><div class=""><blockquote type="cite" class=""><div class="">On 2021 Jan  28, at 11:24, Duncan P. N. Exon Smith via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.org</a>> wrote:</div><br class=""><div class=""><div style="overflow-wrap: break-word;" class=""><div class=""><div class="">Thanks for the detailed response Sam!</div><div class=""><br class=""><blockquote type="cite" class=""><div dir="ltr" class=""><div class="gmail_quote"><div class="">This roughly corresponds to OutputBackend + OutputDestination in the patch, except:</div><div class=""> - the OutputConfig seems like it belongs to particular backends, not the overall backend abstraction</div></div></div></blockquote><div class=""><br class=""></div><div class="">Ideally the OnDiskOutputConfig would almost entirely be settings on OnDiskOutputBackend since no one else will care. It isn't that way in the initial patch because Clang decides most of this stuff on a per-output basis. Maybe there's a refactoring that could fix it.</div><div class=""><br class=""></div><div class="">Here are the problems I hit that led to this design:</div><div class=""><br class=""></div><div class="">1. Some properties need to be associated with specific outputs, not backends, because they relate to properties of the outputs themselves. Here are two:</div><div class="">- ClientIntentOutputConfig::NeedsSeeking</div><div class="">- OnDiskOutputConfig::OpenFlagText</div><div class=""><br class=""></div><div class="">Maybe also something like:</div><div class=""><div class="">- OnDiskOutputConfig::CreateDirectories</div><div class="">(and maybe others)</div></div><div class=""><br class=""></div><div class="">I couldn't think of a good way to solve this besides passing in configuration at output creation time.</div><div class=""><br class=""></div><div class="">2. Some "configurers" may be handed a pre-constructed opaque OutputManager / OutputBackend and need to configure internal OnDiskOutputBackend. In particular, I found that some tooling wants to turn off:</div><div class="">- OnDiskOutputConfig::RemoveFileOnSignal</div><div class="">(others flags might benefit)</div><div class=""><br class=""></div><div class="">This is "documented" by the call chains of clang::CompilerInstance::createOutputFile.</div><div class=""><br class=""></div><div class="">I reused the solution from #1 since it needed (?) to exist anyway. Another option would be to add an OutputBackend visitor, to find and reconfigure any "contained" OnDiskOutputBackend. This seems pretty heavyweight though.</div><div class=""><br class=""></div><blockquote type="cite" class=""><div dir="ltr" class=""><div class="gmail_quote"><div class=""> - OutputDestination has a lot of stuff in it, I'll need to dig further into the patch to try to understand why</div></div></div></blockquote><div class=""><br class=""></div><div class="">Firstly, `Output` has the user-facing abstraction. `OutputDestination` has low-level bits. Another reasonable name might be `OutputImpl` but `OutputDescription` seemed more descriptive.</div><div class=""><br class=""></div><div class="">Secondly, most of the low-level bits avoid unnecessary copies of content buffers. Maybe there are simpler ideas for how to do this, but here are the design goals I was working around:</div><div class="">- Avoid buffering content unless necessary.</div><div class="">- Avoid duplicating content buffers unless necessary.</div><div class="">- Support multiple destinations (for mirrored backends) without breaking the above rules. The obvious "interesting" case is in-memory + on-disk (in either order).</div><div class="">- Make sub-classes of `OutputDestination` as small as possible given the above.</div><div class=""><br class=""></div><div class="">Thirdly, there's some boilerplate related to lifetimes of multiple destinations. Probably having an explicit `MirroringOutputDestination` would be better.</div><div class=""><br class=""></div><blockquote type="cite" class=""><div dir="ltr" class=""><div class="gmail_quote"><div class="">As for OutputManager itself, I think this belongs in clang, if it's needed. Its main job seems to be to store a set of default options and manage the lifecycle of backends, and it's not obvious that those sorts of concerns will generalize across tools or that there's much to be gained from sharing code here.</div></div></div></blockquote><div class=""><br class=""></div>In the end, its main job is to wrap an OutputDestination (low-level abstraction) in an Output (high-level abstraction). Output does a bunch of work for OutputDestination, such as manage the intermediate content buffer.</div><div class=""><br class=""></div><div class="">Probably it's better to have the OutputBackend return an Output directly (and get rid of the OutputManager).</div><div class=""><br class=""></div><div class="">(At one point OutputManager was managing multiple backends directly and was involved in the store operation(s); but since I factored that logic out to MirroringOutputBackend and OutputDestination it probably doesn't need to exist.)</div><div class=""><br class=""><blockquote type="cite" class=""><div dir="ltr" class=""><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=""><div class="">- Any other major concerns / suggestions?</div></div></blockquote><div class="">Thread-safety of the core plug-in interface is something that would be nice to explicitly address, as this has been a pain-point with vfs::FileSystem.</div><div class="">It's tempting to say "not threadsafe, you should lock", but this results in throwing an unnecessary global lock around all FS accesses in multithreaded programs, in the common case that the real FS is being used.</div></div></div></blockquote><div class=""><br class=""></div><div class="">Right, I hit multi-threading limitations myself when prototyping a follow-up patch (didn't get around to posting it until this AM):</div><div class=""><a href="https://reviews.llvm.org/D95628" target="_blank" class="">https://reviews.llvm.org/D95628</a></div><div class=""><br class=""></div><div class="">My intuition is:</div><div class=""><br class=""></div><div class="">Thread-safe:</div><div class="">- OutputBackend::createDestination</div><div class="">- Concurrently touching more than one Output/OutputDestination</div><div class=""><br class=""></div><div class="">Thread-unsafe:</div><div class="">- Using a single Output/OutputDestination concurrently</div><div class=""><br class=""></div><div class="">This all seems cheap and easy to maintain because of the limited interface. The problem I hit in the above patch is that for the InMemoryOutputBackend you also need any readers of the InMemoryFileSystem to be thread-safe.</div><div class=""><br class=""></div><div class="">Relatedly, <a href="https://reviews.llvm.org/D95583" target="_blank" class="">https://reviews.llvm.org/D95583</a> (a prep patch for the above) allows the llvm::LockManager to be skipped. This is not really about outputs — it's inter-process coordination to avoid doing redundant work in competing Clang command-line invocations (at one point it was needed for correctness, but the main benefit now is to avoid taxing the on-disk filesystem) — but it does touch on the idea of exclusive write access.</div><div class=""><br class=""></div><blockquote type="cite" class=""><div dir="ltr" class=""><div class="gmail_quote"><div class="">Relatedly, working-directory/relative-path handling should be considered.</div></div></div></blockquote><div class=""><br class=""></div><div class="">Yeah, you're probably right. Any specific thoughts on what to do here? It seems like llvm::vfs::FileSystem gets them pretty wrong for Windows; see (e.g.) the discussion at the end of <a href="https://reviews.llvm.org/D70701" target="_blank" class="">https://reviews.llvm.org/D70701</a>.</div><div class=""><br class=""></div><div class="">Even for POSIX, working directories could complicate concurrency guarantees. A simple solution is to not have an API for<span class=""> </span><i class="">changing</i><span class=""> </span>working directories, and instead model that by creating a proxy backend (ChangeDirectoryOutputBackend) that prepends the (new) working directory to new outputs; since each backend has an immutable view of the CWD concurrent access should be fine.</div><div class=""><br class=""></div><div class="">Two other thoughts related to paths:</div><div class=""><br class=""></div><div class="">1. I wonder if this abstraction treats the "path" as too central a property of the output. Can this evolve to allow a compiler to build a directory structure bottom-up without having to know the destination<span class=""> </span><i class="">a priori</i>? (E.g., an inner layer makes a blob, a middle layer makes a tree out of a few of those, and an outer layer decides where to put the tree.)</div><div class=""><br class=""></div><div class="">I think it can. One approach is to use proxy backends:</div><div class="">- Inner layer writes to '-' / stdout. (Why not pass a pre-constructed Output/raw_pwrite_stream? See note below.)</div><div class="">- Middle layer passes the instances of the inner layer a proxy backend that maps stdout to the various blob names. (E.g., `/name1` and `/name2`.)</div><div class="">- Outer layer passes the middle layer a proxy backend that "does the right thing" with the tree. (E.g., writes to `/Users/dexonsmith/name{1,2}`.)</div><div class=""><br class=""></div><div class="">=> If writing on-disk, "the right thing" is to prepend the correct output directory for the tree and pass each file to a regular OnDiskOutputBackend.</div><div class="">=> If writing to (e.g.) Git's CAS, "the right thing" is to call git-hash-object on Output::close and track the name-to-hash mapping as outputs come in, and then call "git-mktree" when the middle layer is "done" (maybe a callback in the backend-passed-to-the-middle-layer's destructor).</div><div class=""><br class=""></div><div class="">(IOW, a refactoring where instead of passing absolute paths / directories / filenames down through all the layers, proxy output backends build up the path/destination piece-by-piece.)</div><div class=""><br class=""></div><div class="">I think it's doable with the abstraction as-proposed. But let me know if anyone has concerns. For example:</div><div class="">- Is `Output::getPath()` an abstraction leak?</div><div class="">- Should we have a `createOutput` that doesn't take a path?</div><div class="">- ...</div><div class=""><br class=""></div><div class="">Why not pass a pre-constructed Output/raw_pwrite_stream to the inner layer?</div><div class="">- The inner layer needs an output backend if it (sometimes) dumps "side" files (such as AST record layouts into ".ast-dump" or textual IR into ".ll"). This avoids needing to know the on-disk file path ("/path/to/output" => "/path/to/output.ll"), or to even know whether there's a disk.</div><div class=""><br class=""></div><div class="">2. How should we virtualize stdout / stderr?</div><div class="">- "'-' means stdout" is probably good enough since LLVM makes that assumption all over. Unless someone disagrees?</div><div class="">- I'm not sure what to do with stderr. No one ever "closes" this stream.</div><div class="">- Are there other outputs that don't have path names?</div><div class=""><br class=""></div><div class="">3. Do we need to virtualize llvm::sys::fs::create_directories?</div><div class="">- If so, why?</div><div class=""><br class=""></div><blockquote type="cite" class=""><div dir="ltr" class=""><div class="gmail_quote"><div class="">(And a question/concern about the relationship between input and output virtualization, elaborated at the bottom)</div></div></div></blockquote><br class=""><blockquote type="cite" class=""><div dir="ltr" class=""><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=""><div class=""><b class="">Why doesn't this inherit from llvm::vfs::FileSystem?</b></div></div></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class="">Separating the input and output abstractions seems a bit cleaner. It's easy enough to join them, when useful: e.g., write to an `InMemoryFileSystem` (via an `InMemoryOutputBackend`) and install this same FS in the `FileManager` (maybe indirectly via an `OverlayFileSystem`).</div></blockquote><div class="">I agree with separating the interfaces. In hosted environments your input VFS is often read-only and outputs go somewhere else. </div><div class=""><br class=""></div><div class="">But I wonder, is there an implicit assumption that data written to OutputManager is readable via the (purportedly independent) vfs::FileSystem? This seems like a helpful assumption for module caching, but is extremely constraining and eliminates many of the simplest and most interesting possibilities.</div><div class=""><br class=""></div></div></div></blockquote><blockquote type="cite" class=""><div dir="ltr" class=""><div class="gmail_quote"><div class="">If you're going to require the FileSystem and OutputBackend to be linked, then I think they *should* be the same object.</div></div></div></blockquote><div class=""><br class=""></div><div class="">No, I don't think that should be a requirement / expectation. It's a specific requirement for Clang's implicitly built modules, and I think Clang should be responsible for hooking them together when necessary.</div><br class=""><blockquote type="cite" class=""><div dir="ltr" class=""><div class="gmail_quote"><div class="">But if it's mostly module caching that requires that, then it seems simpler and less invasive to virtualize module caching directly (put(module, key) + get(key)) rather than file access.</div></div></div></blockquote><div class=""><br class=""></div><div class="">Agreed, explicitly virtualizing module caching might be a good thing to do. Either way this is Clang's job to coordinate; I just think the output manager should efficiently support mirroring outputs to an additional/custom backend that Clang installs.</div><div class=""><br class=""></div><div class="">Note: implicit modules doesn't currently rely on reading the modules it has just built from disk. It uses InMemoryModuleCache to avoid having to read anything it has written to disk and to ensure consistency between CompilerInstances across an implicit build. It's pretty awkward though.</div></div></div><div class=""><br class=""><blockquote type="cite" class=""><div class="">On 2021 Jan  28, at 03:19, Sam McCall <<a href="mailto:sammccall@google.com" target="_blank" class="">sammccall@google.com</a>> wrote:</div><br class=""><div class=""><div dir="ltr" style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class=""><div class="">Really glad to see this work, virtualizing module cache is something we've wanted to experiment with for tooling, but never got to. I want to get into the patches in more detail, but some high-level thoughts...</div><div class=""><br class=""></div><div dir="ltr" class="">On Wed, Jan 27, 2021 at 6:23 AM Duncan P. N. Exon Smith via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org" target="_blank" class="">cfe-dev@lists.llvm.org</a>> wrote:<br class=""></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=""><div class=""><b class="">TL;DR</b>: Let's virtualize compiler outputs in Clang. These patches would get us started:</div><div class="">- <a href="https://reviews.llvm.org/D95501" target="_blank" class="">https://reviews.llvm.org/D95501</a> Add llvm::vfs::OutputManager</div><div class="">- <a href="https://reviews.llvm.org/D95502" target="_blank" class="">https://reviews.llvm.org/D95502</a> Initial adoption of llvm::vfs::OutputManager in Clang.</div><div class=""><br class=""></div><div class=""><b class="">Questions for the reader</b></div><div class="">- Should we virtualize compiler outputs in Clang? (Hint:<span class=""> </span><i class="">yes</i>.)</div></div></blockquote><div class="">Definitely agree.</div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=""><div class=""><span style="white-space:pre-wrap" class="">       </span>- Does this support belong in LLVM? (I think it does, so that non-Clang tools can easily reuse it.)</div></div></blockquote><div class="">Ideally, the core abstraction (path -> pwrite_stream) certainly belongs in LLVM, as well as the most common implementations of it.</div><div class="">Based on experience with VirtualFileSystem, I'd like this interface to be as narrow as possible to make it feasible to implement/wrap correctly, and to reason about how the wider system interacts with it.</div><div class=""><br class=""></div><div class="">This roughly corresponds to OutputBackend + OutputDestination in the patch, except:</div><div class=""> - the OutputConfig seems like it belongs to particular backends, not the overall backend abstraction</div><div class=""> - OutputDestination has a lot of stuff in it, I'll need to dig further into the patch to try to understand why</div><div class=""><br class=""></div><div class="">As for OutputManager itself, I think this belongs in clang, if it's needed. Its main job seems to be to store a set of default options and manage the lifecycle of backends, and it's not obvious that those sorts of concerns will generalize across tools or that there's much to be gained from sharing code here.</div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=""><div class=""><span style="white-space:pre-wrap" class="">  </span>- Is `llvm::vfs::` a reasonable namespace? (If not, suggestions? I think `llvm::` itself is too broad.)</div></div></blockquote><div class="">llvm::vfs:: is definitely the right namespace for the core writing stuff IMO.</div><div class="">If more ancillary bits (parts some but not all tools might use) need to go in llvm, llvm:: seems to be the best namespace we have (like e.g. SourceMgr) but maybe we should add a new one. But as mentioned, I'd prefer those to live in clang:: at least for now.</div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=""><div class="">- Do you have a use case that this won't address well?</div><div class=""><span style="white-space:pre-wrap" class="">    </span>- Should that be fixed in the initial patch, or could this be evolved in-tree to address that?</div><div class="">- Any other major concerns / suggestions?</div></div></blockquote><div class="">Thread-safety of the core plug-in interface is something that would be nice to explicitly address, as this has been a pain-point with vfs::FileSystem.</div><div class="">It's tempting to say "not threadsafe, you should lock", but this results in throwing an unnecessary global lock around all FS accesses in multithreaded programs, in the common case that the real FS is being used.</div><div class=""><br class=""></div><div class="">Relatedly, working-directory/relative-path handling should be considered.</div><div class=""><br class=""></div><div class="">(And a question/concern about the relationship between input and output virtualization, elaborated at the bottom)</div><div class=""></div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=""><div class="">- If you think the above patches should be split up for initial review / commit, how?</div></div></blockquote><div class="">Obviously my favorite would be to see a minimal core writable VFS interface extracted and land that first. What's built on top of it is less critical, and I'm not concerned about it landing in larger chunks.</div><div class=""> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=""><div class=""><br class=""></div><div class="">(Other feedback welcome too!)</div><div class=""><br class=""></div><div class=""><b class="">Longer version</b></div><div class="">There are a number of use cases for capturing compiler outputs, which I'm hoping this proposal is a step toward addressing.</div><div class=""><br class=""></div><div class="">- Tooling wants to capture outputs directly, without going through the filesystem.</div><div class=""><span style="white-space:pre-wrap" class="">     </span>- Sometimes, tertiary outputs can be ignored, or still need to be written to disk.</div><div class="">- clang-scan-deps is using a form of stripped down "implicit" modules to determine which modules need to be built explicitly. It doesn't really want to be using the on-disk module cache—in-memory would be better.</div><div class="">- clang's ModuleManager manually maintains an in-memory modules cache for implicit modules. This involves copying the PCM outputs into memory. It'd be better for these modules to be file-backed, instead of copies of the stream.</div><div class=""><br class=""></div><div class="">The patch has a bunch of details written / tested (<a href="https://reviews.llvm.org/D95501" target="_blank" class="">https://reviews.llvm.org/D95501</a>). Here are the high-level structures in the design:</div><div class=""><br class=""></div><div class="">- OutputManager—a shared manager for creating outputs without knowing about the storage.</div><div class="">- OutputConfig—configuration set on the OutputManager that can be (partially) overridden for specific outputs.</div><div class="">- Output—opaque object with a raw_pwrite_stream, output path, and `erase`/`close` functionality. Internally, it has a linked list of output destinations.</div><div class="">- OutputBackend—abstract interface installed in an OutputManager to create the "guts" of an output. While an OutputManager only has one installed, these can be layered / forked / mirrored.</div><div class="">- OutputDestination—abstract interface paired with an OutputBackend, whose lifetime is managed by an Output.</div><div class="">- ContentBuffer—actual content to allow efficient use of data by multiple backends. For example, the installed backend is a mirror between an on-disk and in-memory backend, the in-memory backend will either get the content moved directly into an llvm::MemoryBuffer, or a just-written mmap'ed file.</div><div class=""><br class=""></div><div class="">The patch includes a few backends:</div><div class=""><br class=""></div>- NullOutputBackend, for ignoring all backends.<br class="">- OnDiskOutputBackend, for writing to disk (the default), initially based on the logic in `clang::CompilerInstance`.<br class="">- InMemoryOutputBackend, for writing to an `InMemoryFileSystem`.<br class="">- MirroringOutputBackend, for writing to multiple backends. OutputDestination's API is designed around supporting this.<br class="">- FilteringOutputBackend, for filtering which outputs get written to the underlying backend.<br class=""><br class=""><b class="">Why doesn't this inherit from llvm::vfs::FileSystem?</b><br class="">Separating the input and output abstractions seems a bit cleaner. It's easy enough to join them, when useful: e.g., write to an `InMemoryFileSystem` (via an `InMemoryOutputBackend`) and install this same FS in the `FileManager` (maybe indirectly via an `OverlayFileSystem`).</div></blockquote><div class="">I agree with separating the interfaces. In hosted environments your input VFS is often read-only and outputs go somewhere else. </div><div class=""><br class=""></div><div class="">But I wonder, is there an implicit assumption that data written to OutputManager is readable via the (purportedly independent) vfs::FileSystem? This seems like a helpful assumption for module caching, but is extremely constraining and eliminates many of the simplest and most interesting possibilities.</div><div class=""><br class=""></div><div class="">If you're going to require the FileSystem and OutputBackend to be linked, then I think they *should* be the same object.</div><div class="">But if it's mostly module caching that requires that, then it seems simpler and less invasive to virtualize module caching directly (put(module, key) + get(key)) rather than file access.</div><div class=""><br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div class=""><div class=""><b class="">Other work in the area</b></div><div class="">See also: <a href="https://reviews.llvm.org/D78058" target="_blank" class="">https://reviews.llvm.org/D78058</a> (thanks to Marc Rasi for posting that patch, and to Sam McCall for some feedback on an earlier version of this proposal).<br class=""><br class="">Thanks for reading!<div class="">Duncan</div></div></div>_______________________________________________<br class="">cfe-dev mailing list<br class=""><a href="mailto:cfe-dev@lists.llvm.org" target="_blank" class="">cfe-dev@lists.llvm.org</a><br class=""><a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer" target="_blank" class="">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a></blockquote></div></div></div></blockquote></div><br class=""></div>_______________________________________________<br class="">LLVM Developers mailing list<br class=""><a href="mailto:llvm-dev@lists.llvm.org" target="_blank" class="">llvm-dev@lists.llvm.org</a><br class=""><a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank" class="">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br class=""></div></blockquote></div><br class=""></div></div></div><span style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline" class="">_______________________________________________</span><br style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class=""><span style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none;float:none;display:inline" class="">LLVM Developers mailing list</span><br style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class=""><a href="mailto:llvm-dev@lists.llvm.org" style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" target="_blank" class="">llvm-dev@lists.llvm.org</a><br style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;text-decoration:none" class=""><a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant-caps:normal;font-weight:normal;letter-spacing:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" target="_blank" class="">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a></div></blockquote></div><br class=""></div></blockquote></div></div>

</div></blockquote></div><br class=""></div>_______________________________________________<br class="">cfe-dev mailing list<br class=""><a href="mailto:cfe-dev@lists.llvm.org" class="">cfe-dev@lists.llvm.org</a><br class="">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev<br class=""></div></blockquote></div><br class=""></body></html>