[PATCH] D80900: [clangd] Use different FS in PreambleThread

Kadir Cetinkaya via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Jun 2 13:43:22 PDT 2020


kadircet added a comment.

In D80900#2066327 <https://reviews.llvm.org/D80900#2066327>, @sammccall wrote:

> TL;DR: I think there are three viable paths that we should consider:
>  A) minimal change: put the FSProvider in ParseInputs
>  B) this patch, but with more documentation and safety
>  C) fight to clean up VFS multithreading semantics
>
> Assuming we don't want to block on C, I'm not sure whether A or B is conceptually better. But A is a smaller change and brings side benefits, so wondering if you see advantages in B.


I didn't want to go for A) as I was afraid of satisfying the contract on `FileSystemProvider::getFileSystem`

  /// Context::current() will be the context passed to the clang entrypoint,
  /// such as addDocument(), and will also be propagated to result callbacks.

As this patch keep the `FileSystemProvider` inside TUScheduler.cpp and we need to ensure context is always the one we received in the entrypoint. Whereas `ParseInputs` is widely
passed around, and ensuring that assumption holds might be hard.

> You're right about the race condition of course and I'm sure this patch is correct.
>  But this rubs me the wrong way a bit - the API change doesn't seem clearly motivated either from a high level (we conceptually *want* a single FS to build a single version of code) or from a low level (what are we actually doing in parallel that isn't threadsafe?).
> 
> Looking at the low level perspective first. VFS has three classes threadsafety issues:
> 
> - it contains the "working directory" as mutable state, which isn't multi-threading friendly. We do set working directory concurrently with other work, but always to the same thing. We could factor our way around this by setting the working directory prior to any concurrency (to CDB.Directory) and ensuring we never set it again.
> - the RealFilesystem in real-working-directory mode is a singleton so its use is thread-hostile. This isn't really relevant to us because we use emulated-working-directory mode and have distinct instances. But it probably informed VFS's lack of useful threading semantics.
> - read operations aren't `const` or documented as threadsafe even though the canonical `RealFilesystem` is. In-tree implementations seem to be threadsafe, some out-of-tree ones are not :-( This definitely bites us here.
> 
>   I think it would be fundamentally possible to use a single VFS here. We need to modify the VFS interface to make concurrent reads safe (and const, while here) and fix any implementations. This isn't trivial. I think it's worthwhile, but experience tells me different projects want different things from the VFS interface...

I totally agree, this was actually my initial motivation. But I wasn't sure if it is worth the hassle, as this might receive some push-back from downstream users (VFS interface is not just in clang, but in llvm) and coming to a consensus is likely to take time.
It would be nice to clear that technical debt, but I don't think it provides any other practical benefits.

> The high-level view: IMO FSProvider is basically a hack around VFS's thread-safety issues, rather than a fundamental part of the application - it's a filesystem that doesn't have an attached working directory and whose read operations (e.g. `getFileSystem()->status()`) are threadsafe. We could rename FSProvider -> ThreadsafeFS and getFileSystem() -> newCursor(WD) or something, and maybe we should. 
>  From this perspective, this patch is changing two things that have little to do with each other: it's replacing FS -> ThreadsafeFS and it's moving the FS ownership from one revision of the file to the TUScheduler itself.
> 
> I'm not really sure whether the latter is a good or a bad idea. If we want to do it, the use of ParseInputs as the interface to TUScheduler is a bit confusing - we should at least document that FS isn't used (as we do for CompileCommand) and probably assert that it's null, too.

Right, the latter was a consequence of not storing FSProvider in ParseInputs, rather than a deliberate one. We can also push it around as a parameter to update, rather than a member if that looks more clear.
I didn't do that as it required more plumbing and change was actually local to TUScheduler.

> But maybe simpler would be to "just" change the VFS member in ParseInputs to a `const FSProvider*`. This keeps the design approximately what it is now, but simplifies the ParseInputs concept: it's now truly readonly data, and functions that consume it won't have effects that escape.

Isn't this just what you proposed in A) ? My concern above applies to this one too. It is not a practical concern as of today, as most uses of ParseInputs is through callbacks in ASTWorker, hence the context should be implicitly set. Unless, someone attaches  another async callback.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80900/new/

https://reviews.llvm.org/D80900





More information about the cfe-commits mailing list