[PATCH] D95501: WIP: Support: Add vfs::OutputBackend to virtualize compiler outputs

Mon Feb 22 07:49:22 PST 2021

sammccall added a comment.

Some detailed comments to go with the RFC thread!

================
Comment at: llvm/include/llvm/Support/OutputBackend.h:64
+/// configuration flag is either \c true or \c false.
+class OutputConfig {
+  /// Don't use std::bitset since it's mostly not constexpr. If primitive types
----------------
I think a struct in the style of FrontendOptions or so might have better ergonomics:
 - access is a bit more direct
 - can easily provide sensible defaults rather than having to negate half the option names

================
Comment at: llvm/include/llvm/Support/OutputBackend.h:133
+/// Opaque description of a compiler output that has been created.
+class OutputFile {
+  virtual void anchor();
----------------
It would be helpful for a client to have few "valid" bits to reason about:
 - open/closed
 - erased/not-erased
 - stream taken/not taken

I think erased is gone now, and just remains in the comments, which is nice!
Is it necessary to support both getting and taking the stream, or could we get away with one?

================
Comment at: llvm/include/llvm/Support/OutputBackend.h:152
+  /// called yet.
+  Error close();
+
----------------
Could consider calling this `commit`, which hints at the behavior when calling vs not calling it.

================
Comment at: llvm/include/llvm/Support/OutputBackend.h:172
+
+  StringRef getPath() const { return Path; }
+
----------------
As you mentioned on the mail, this seems a little wrong to expose here, I'd agree with removing it.

I think the only question it really answers is "what string was passed to OutputBackend::createFileImpl" - this may be useful, but it's easily conflated with ideas like "where can I read this file from". So I like the idea that path strings only appear in close proximity to OutputBackends.

================
Comment at: llvm/include/llvm/Support/OutputBackend.h:177
+  /// storeContentBuffer().
+  std::unique_ptr<raw_pwrite_stream> createStreamForContentBuffer();
+
----------------
ContentBuffer seems handy for implementing output-files that buffer in memory, but does support it really need to be baked into the base class?
It causes a confusing structure in OutputFile where half of the interface is about dealing with streams and the other half is dealing with buffers, and it's hard to tell which is fundamental. 

My best understanding is that ContentBuffer exists to avoid copies in some cases where the output is buffered into memory. If a copy was OK, we'd just use string/stringref, and if the output was being written to disk, we'd use a stream. It seems surprising if this really is a win overall. Is it possible to start with something simpler and evolve to something fast, with benchmarks?

(If it were gone from the main interface, it seems that it would be simple enough for e.g. InMemoryOutputDestination to implement takeOSImpl() and close() on top of a ContentBuffer member, for MirroringOutput to simply write its buffer to the output stream, etc.)

================
Comment at: llvm/include/llvm/Support/OutputBackend.h:204
+  /// Allow subclasses to call \a OutputFile::isNull() on other files.
+  static bool isNull(const OutputFile &File) { return File.isNull(); }
+
----------------
this seems like a surprising place to strike a balance, it implies:
 - it's more important to optimize away null in the mirroring file than to encapsulate nullness as an implementation detail
 - but it's more important to encapsulate nullness than to optimize in other places (e.g. actually writing output!)

If nullness enables important optimizations (seems plausible) I'd prefer just making this public.

================
Comment at: llvm/include/llvm/Support/OutputBackend.h:251
+/// OutputFileImpl given an \a OutputPath and \c OutputConfig.
+class OutputBackend : public RefCountedBase<OutputBackend> {
+  virtual void anchor();
----------------
ThreadSafeRefCountedBase?

Or even better, can we just use std::shared_ptr?

(I think the nominal reason for still having IntrusiveRefCntPtr is that it's cheaper when thread-safety is not required, but it is here)

================
Comment at: llvm/include/llvm/Support/OutputBackend.h:262
+  Expected<IntrusiveRefCntPtr<OutputDirectory>>
+  getDirectory(const Twine &Path) const;
+
----------------
My understanding is the use of Twine in the VFS interfaces is a (hard-to-fix) mistake - in practice we do not compose paths using techniques that Twines support well, we mostly end up rendering twines to strings to pass them to other APIs anyway, and the potential performance benefits of Twine are unlikely to be meaningful in the face of IO anyway.

Can we use StringRef?

================
Comment at: llvm/include/llvm/Support/OutputBackend.h:270
+  Expected<IntrusiveRefCntPtr<OutputDirectory>>
+  createDirectory(const Twine &Path, OutputConfig Config = None);
+
----------------
This is pretty fuzzy to me - it sounds like this is the API both for creating a directory, and creating an output that encapsulates the notion of working directory.

What do you do if you want only one of these? In particular, if you want to track the output directory but only create if you write any outputs. I guess: you can't use this API, and need to track working directory externally instead.

For backends that *do* have a concept of directory existence, is it *required* to call this function or can you just write to a path and have parent directories created? I guess: it's optional by default but the NoImplyCreateDirectories controls this.

---

These aren't showstoppers, but if something simpler would work equally well, that would be nice. e.g.
 - providing "createDirectory" to do the IO, and making OutputDirectory a helper class instead of part of the interface
 - only dealing in absolute paths - the platform bundles working-directory handling with IO operations, but it's not like we can actually use the OS's working directory...

================
Comment at: llvm/include/llvm/Support/OutputBackend.h:437
+/// A helper class for creating unique files and directories.
+class StableUniqueEntityHelper {
+public:
----------------
The patterns here (inheritance, state and templates) seem quite complicated and I'm having a tough time following it.
Is this modeled on existing code somewhere? Maybe we can provide a simpler version of this initially, or leave the unique-entity stuff to separate patches?

================
Comment at: llvm/include/llvm/Support/OutputBackend.h:437
+/// A helper class for creating unique files and directories.
+class StableUniqueEntityHelper {
+public:
----------------
sammccall wrote:
> The patterns here (inheritance, state and templates) seem quite complicated and I'm having a tough time following it.
> Is this modeled on existing code somewhere? Maybe we can provide a simpler version of this initially, or leave the unique-entity stuff to separate patches?
Stable seems to mean something significant here, but I don't know what it is.

================
Comment at: llvm/include/llvm/Support/OutputBackend.h:450
+private:
+  StringMap<unsigned> Models;
+};
----------------
I'm not 100% sure of the structure of the stable-entity stuff, but it looks like this state needs to be synchronized for multithreaded access.

================
Comment at: llvm/include/llvm/Support/OutputBackend.h:601
+/// The output already exists in filesystem and it cannot be overwritten.
+class CannotOverwriteExistingOutputError final
+    : public ErrorInfo<CannotOverwriteExistingOutputError, OutputError> {
----------------
Can we use errorCodeToError(std::errc::file_exists) for this?

(with createFileError if it's critical to copy the filename in here)

================
Comment at: llvm/include/llvm/Support/OutputBackend.h:627
+/// A temporary file could not be renamed to the final output on disk.
+class OnDiskOutputRenameTempError final
+    : public ErrorInfo<OnDiskOutputRenameTempError, OutputError> {
----------------
this doesn't seem like a useful public API, because temp files and renaming are low-level implementation details of particular backends. Even if you know the backend is in use, it's hard to imaging handling "rename errors" differently than other IO errors, apart from the error message.

Can this be createStringError("Could not rename...", EC) - we do actually have an underlying error code from the OS

================
Comment at: llvm/include/llvm/Support/OutputBackend.h:703
+  };
+  OutputSettings &getSettings() { return Settings; }
+  const OutputSettings &getSettings() const { return Settings; }
----------------
why is it necessary to allow externally accessing/modifying the settings after construction?

================
Comment at: llvm/lib/Support/OutputBackend.cpp:865
+      return E;
+    return File2->storeContentBufferIn(Content, *File2);
+  }
----------------
doesn't the contract say File1 may have taken the content?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D95501/new/

https://reviews.llvm.org/D95501