[PATCH] D93873: [clangd] Cache preambles of closed files

Tue Jan 12 00:43:44 PST 2021

kadircet added a comment.

In D93873#2490314 <https://reviews.llvm.org/D93873#2490314>, @sammccall wrote:

> 1. a *persistent* cache so closing+reopening clangd loses less state. (This is complicated because only the PCH is easily serializable, the rest of the PreambleData struct isn't)
> 2. building caches of preambles while background-indexing (this would be great for modules but is probably way too big for whole preambles)
> 3. reusing the "wrong" preamble initially when you open a new file, to give some basic functionality (using existing preamble patching logic, just in a more aggressive scenario)
> 4. having the disk-based storage unlink the file preemptively, to eliminate any chance of leaking the *.pch

It feels like a mixture of 1 and 3 is going to provide the most value for decreasing time until semantic features (but I might be a little biased :D, also we might hit a nice sweet spot with pseudoparsing too).
I don't think having a cache for previously built preambles will ever be enough. As Sam pointed out, scaling is one of the biggest problems, as I don't think it would be feasible to have tens of preambles lying around on the disk, especially when they are costly the built (as it implies increased size).
Surely it optimizes the case of users working on a small set of files but frequently closes and re-opens them. But that's just one use case, it is also quite common to open tens of library headers while investigating an issue, or trying to understand details of some code through chains of go-to-definitions.
Users won't have any preambles for a while on those files and even after building the preamble they'll just be sitting in the cache probably only to be evicted.

So I think having a cache of preambles while optimizing for reusability (by keeping a small set of preambles that cover different set of files, as we can't use a preamble for a source file if it covers the source file in question) and then patching those to be applicable for current file at hand sounds like a better compromise. Surely it won't be as effective for frequent close/re-open use case, but I think the costs of such a cache isn't justified if it is only applicable to a single workflow.

As for mixing idea-1 into the equation, all of these will require clangd to do the work from scratch per instance, if we can have some sort of persistent on-disk cache, we can both share the work (and associated storage costs) across clangd instances and ensure clangd is also responsive even at startup without requiring user to build a bunch of preambles with every new clangd instance first.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93873/new/

https://reviews.llvm.org/D93873