[PATCH] D52419: [clangd] Cache FS stat() calls when building preamble.

Tue Sep 25 14:11:31 PDT 2018

sammccall added a comment.

Generally LG. A few things I'm unsure about.

================
Comment at: clangd/CodeComplete.cpp:1028

+  IntrusiveRefCntPtr<vfs::FileSystem> VFS = Input.VFS;
+  if (Input.PreambleStatCache)
----------------
ioeric wrote:
> ilya-biryukov wrote:
> > It feels like we could apply this caching one level higher, on the TUScheduler level when creating the filesystem. It makes sense not only for code completion, but also for all other features, including rebuilds of ASTs that were washed out of the cache.
> > WDYT?
> It sounds like a reasonable thing to do. I took a quick look, and it seemed that this would probably require some refactoring/redesign on TUScheduler - it doesn't know about the VFS?
> 
> I think it shouldn't be hard to do this case by case. For example, code completion and signature help will be covered in this patch. And it should be pretty easy to make AST rebuild use the cache as well?
> It sounds like a reasonable thing to do. I took a quick look, and it seemed that this would probably require some refactoring/redesign on TUScheduler - it doesn't know about the VFS?

Right, this is a bit of a mess...
Currently there are features that access the FS "directly". This includes CodeComplete which runs its own compile action, as well as things like switchSourceHeader or format. These invoke FSProvider.
Then there are features that build an AST (which may need file accesses), and then may implicitly read files by querying the AST (the FS is attached to the FileManager or something). These use the FS obtained from the FSProvider **in the relevant addDocument call**.
There's no fundamental reason we can't have both (access FS directly and use the AST) in which case we'll be using both filesystems together...

In the near term, making the TUScheduler-managed AST rebuild use the cache stored in the preamble seems like a nice win. (As you alluded to I don't think this change is in TUScheduler itself, rather buildAST?)

================
Comment at: clangd/FS.cpp:1
+//===--- FS.cpp - File system related utils ----------------------*- C++-*-===//
+//
----------------
Can we have unit tests for these?
It should be pretty straightforward, as you can inspect/seed the cache state directly.

================
Comment at: clangd/FS.cpp:29
+PreambleFileStatusCache::lookup(llvm::StringRef File) const {
+  auto I = StatCache.find(File);
+  if (I != StatCache.end())
----------------
lock

================
Comment at: clangd/FS.cpp:48
+        return File;
+      if (auto S = File->get()->status())
+        StatCache.update(getUnderlyingFS(), std::move(*S));
----------------
I'm not sure I get this: AFAICT (at least on linux) the status is never available on a newly opened file, so this will always be a stat() call, so we're just doing the work eagerly and caching it for later. Isn't this just moving the work around?

I'm sure you've verified this is important somehow, but a comment explaining how would be much appreciated :-)

================
Comment at: clangd/FS.h:20
+/// Caches `stat()` calls during preamble build, which can be used to avoid
+/// re-`stat`s header files when preamble is re-used (e.g. in code completion).
+/// Note that the cache is only valid whene reusing preamble.
----------------
can you give a little more colour on *why* code completion tends to stat a bunch of the same files over again?
(One might assume it's to validate the preamble, but it shouldn't be, because we don't do that for other reasons)

================
Comment at: clangd/FS.h:21
+/// re-`stat`s header files when preamble is re-used (e.g. in code completion).
+/// Note that the cache is only valid whene reusing preamble.
+///
----------------
whene -> when

================
Comment at: unittests/clangd/ClangdTests.cpp:966

+TEST(ClangdTests, PreambleVFSStatCache) {
+  class ListenStatsFSProvider : public FileSystemProvider {
----------------
it's not obvious what this test does (without the rest of the patch as context).
Maybe a high-level comment: `Check that running code completion doesn't stat() a bunch of files from the preamble again. (They should be using the preamble's stat-cache)`

================
Comment at: unittests/clangd/ClangdTests.cpp:980
+        llvm::ErrorOr<vfs::Status> status(const Twine &Path) override {
+          auto I =
+              CountStats.try_emplace(llvm::sys::path::filename(Path.str()), 1);
----------------
just ++CountStats[...]?
life is too short :-)

================
Comment at: unittests/clangd/ClangdTests.cpp:984
+            I.first->second += 1;
+          return getUnderlyingFS().status(Path);
+        }
----------------
or just `return ProxyFileSystem::status(Path);` which is *slightly* less coupled

================
Comment at: unittests/clangd/ClangdTests.cpp:1018
+
+  unsigned Before = CountStats["foo.h"];
+  auto Completions = cantFail(runCodeComplete(Server, SourcePath, Code.point(),
----------------
I think this would be much clearer if we asserted the actual count here. It should be 1, right? Do you think it'd be too brittle?

As it stands, it seems entirely possible that the test is not seeing any stats for whatever reason and is spuriously passing.

Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D52419