[PATCH] D42174: [clangd] Refactored threading in ClangdServer

Thu Jan 25 08:23:57 PST 2018

sammccall added a comment.

Agreed we can defer lots of stuff in order to keep this patch compact.
Generally the things I think we can get right before landing:

- names and file organization for new things
- documentation including where we want to get to

================
Comment at: clangd/ClangdServer.cpp:246
+
+  ParseInputs Inputs = getInputs(File);
+  std::shared_ptr<const PreambleData> Preamble =
----------------
I think you want to take a reference here, and then capture by value

================
Comment at: clangd/ClangdServer.cpp:323
+      [](Context Ctx, std::promise<Context> DonePromise, llvm::Error Err) {
+        // FIXME(ibiryukov): allow to report error to the caller.
+        // Discard error, if any.
----------------
this fixme doesn't make sense if we're removing the callback

================
Comment at: clangd/ClangdServer.cpp:345
+  return scheduleReparseAndDiags(std::move(Ctx), File, std::move(FileContents),
+                                 TaggedFS);
 }
----------------
nit: you dropped a move here

================
Comment at: clangd/ClangdServer.cpp:400
+                  CallbackType Callback,
+                  llvm::Expected<InputsAndPreamble> InpPreamble) {
+    assert(InpPreamble &&
----------------
nit: "Inp" in InpPreamble is pretty awkward. In this context, Read might work? or IP.

================
Comment at: clangd/ClangdServer.cpp:403
+           "error when trying to read preamble for codeComplete");
+    auto Preamble = InpPreamble->Preamble;
+    auto &Command = InpPreamble->Inputs.CompileCommand;
----------------
InpPreamble->Preamble->Preamble is pretty hard to understand. Can we find better names for these things? Or at least unpack it on one line?

================
Comment at: clangd/ClangdServer.h:164

+/// Handles running tasks for ClangdServer and managing the resources (e.g.,
+/// preambles and ASTs) for opened files.
----------------
ilya-biryukov wrote:
> sammccall wrote:
> > This is a nice abstraction, so much better than dealing with Cppfile! A couple of observations:
> > 
> > 1) all methods refer to a single file, neither the contracts nor the implementation have any interactions between files. An API that reflects this seems more natural. e.g. it naturally provides operations like "are we tracking this file", and makes it natural to be able to e.g. lock at the per-file level. e.g.
> > 
> >   class WorkingSet {
> >      shared_ptr<TranslationUnit> get(Path);
> >      shared_ptr<TranslationUnit> getOrCreate(Path)
> >    }
> >    class TranslationUnit {
> >      void update(ParseInputs);
> >      void read(Callback);
> >    }
> > 
> > This seems like it would make it easier to explain what the semantics of scheduleRemove are, too.
> > 
> > 2) The callbacks from individual methods seem more powerful than needed, and encourage deeper coupling. Basically, the inputs (changes) and outputs (diagnostics, reads) don't seem to want to interact with the same code. This suggests decoupling them: the changes are a sequence of input events, diagnostics are a sequence of output events, reads look much as they do now.
> > 
> > One benefit here is that the versioning relationship between inputs and outputs no longer show up in the function signatures (by coupling an input to a matching output). Expressing them as data makes it easier to tweak them.
> > 
> > 3) It's not spelled out how this interacts with drafts: whether text<->version is maintained here or externally, and what the contracts around versions are. There are no options offered, so I would guess that scheduleUpdate delivers new-version-or-nothing, and scheduleASTRead delivers... current-or-newer? current-or-nothing?
> > 
> > I think it would *probably* be clearer to have versions minted by the external DraftStore, that way we can decouple "we know about the contents of this file" from "we're building this file". e.g. we probably want wildly different policies for discarding resources of old versions, when "resources" = source code vs "resources" = ASTs and preambles.
> > 
> > 4) Scheduler (or anything it decomposes into) is important and isolated enough that it deserves its own header.
> 1. Given the nature of the LSP, the users of the interface will probably always call only a single function of `TranslationUnit`, so we won't win much in terms of the code clarity.
> ```
> scheduleUpdate(File, Inputs) --> get(File).update(Inputs)
> ```
> That adds some complexity to the interface, though. I'd opt for not doing that in the initial version.
> 
> 2. One place where I find these callbacks useful are tests where we could wait for latest `addDocument` to complete. A more pressing concern is that the interface of `ClangdServer` does not allow to remove the callbacks (`addDocument` returns a future) and I would really love to keep the public interface of `ClangdServer` the same for the first iteration.
> 
> 3. I would err on the side of saying that `scheduleASTRead` delivers current version. This gives correct results for the current callers of the interface. If we do current-or-newer I'd start with adding the versions to the interface, so that the callers of the API would have enough context to know whether the results correspond to the latest version or not. I'd really love to keep version out of the initial patch, happy to chat about them for follow-up patches.
> 
> 4.  Totally agree. I planned to move it into a separate header after the changes land. Happy to do that with a follow-up patch.
1. splitting the functionality up into per-file stuff can definitely be deferred until after this patch. Can you add a FIXME?
It should probably happen before the new implementation though, so it pushes that impl in the right direction.

2. The right abstraction for tests is probably "wait until the scheduler is idle", I think, and that can be implemented without each of these individual async methods being observable.
Agreed that we should keep ClangdServer's interface the same for this patch, so we can't change this yet. FIXME? (If you agree with the testing strategy, that'd be worth mentioning)

3. Always delivering exactly the sequenced version is fine, but please spell this out.

4. I'd like this newly added class to have the right name and file structure when it first lands, rather than trying to move it later.

================
Comment at: clangd/ClangdServer.h:184
+  /// resources. \p Action will be called when resources are freed.
+  /// If an error occurs during processing, it is forwarded to the \p Action
+  /// callback.
----------------
ilya-biryukov wrote:
> sammccall wrote:
> > The callback here is a bit confusing and YAGNI (both clangd and our known embedders discard the returned future).
> > It seems enough to return synchronously and know that subsequent reads that ask for = or >= version are going to get nothing back.
> I'm fine with removing the callback, but I'd like this change to focus on internals of `ClangdServer` without changing its interface, happy to remove the callback afterwards.
> Added a FIXME for that.
Sounds good. ScheduleUpdate -> update, though? Since this callback will go away, it's not really observable that anything is being scheduled.

================
Comment at: clangd/ClangdServer.h:200
+  /// callback.
+  void schedulePreambleRead(
+      PathRef File,
----------------
ilya-biryukov wrote:
> sammccall wrote:
> > can we pass in a consistency policy here? even if we only support a subset of {ast, preamble} x policy for now, a richer API would enable experiments/tradeoffs later. e.g.
> > 
> >   enum class VersionConstraint {
> >     ANY,     // returns the first data that's available, typically immediately
> >     EQUAL, // returns the data for the previous update
> >     GREATER_EQUAL, // Returns the data for the previous update, or some subsequent update.
> >   }
> > 
> > (I guess the semantics you've described for AST read are EQUAL, and ANY for preamble read)
> I would rather not extend the API beyond what it can do at the moment, but happy to explore this approach after threading gets rid of the legacy stuff that we have now.
I'm happy for this not to land in this patch, but with some of the other API changes it should go in before/with the new implementation

================
Comment at: clangd/ClangdServer.h:206
+  CppFileCollection Units;
+  SimpleThreadPool Executor;
+};
----------------
ilya-biryukov wrote:
> sammccall wrote:
> > these two names seem a bit confusing - might be easier as just CppFiles, Threads?
> > 
> > The names seem to be ~synonyms for the types, which I don't think is better than echoing the types. (Often there's a completely separate good name, but I don't think so here)
> - I'd go with `Files` instead of `CppFiles`. WDYT?
> - `Executor` sounds better to me than `Threads`. Could you elaborate why you find it confusing?
executor often names an abstraction that's related to but not the same as a threadpool (c.f. java or google3). Not a big deal, but it doesn't seem to add any extra semantics over "threads" which would be less ambiguous.

================
Comment at: clangd/ClangdServer.h:165
+
+struct InputsAndAST {
+  const ParseInputs &Inputs;
----------------
I'm not sure about so many little structs, vs providing one with some fields being optional.
That would make a unified read API easier. WDYT? we can defer this.

================
Comment at: clangd/ClangdServer.h:177
+/// preambles and ASTs) for opened files.
+class Scheduler {
+public:
----------------
This class has important responsibilities beyond threading itself, which "Scheduler" suggests.

I can't think of a perfectly coherent name, options that seem reasonable:
 - TUManager - pretty bland/vague, but gets what this class is mostly about
 - Workshop - kind of a silly metaphor, but unlikely to be confused with something else
 - Clearinghouse - another silly metaphor, maybe more accurate but more obscure

================
Comment at: clangd/ClangdServer.h:177
+/// preambles and ASTs) for opened files.
+class Scheduler {
+public:
----------------
sammccall wrote:
> This class has important responsibilities beyond threading itself, which "Scheduler" suggests.
> 
> I can't think of a perfectly coherent name, options that seem reasonable:
>  - TUManager - pretty bland/vague, but gets what this class is mostly about
>  - Workshop - kind of a silly metaphor, but unlikely to be confused with something else
>  - Clearinghouse - another silly metaphor, maybe more accurate but more obscure
Worth saying something abouth the threading properties here:

- Scheduler is not threadsafe, only the main thread should be providing updates and scheduling tasks.
 - callbacks are run on a large threadpool, and it's appropriate to do slow, blocking work in them

================
Comment at: clangd/ClangdServer.h:195
+  /// should remove it.
+  void scheduleRemove(PathRef File, UniqueFunction<void(llvm::Error)> Action);
+
----------------
similarly scheduleRemove -> remove

================
Comment at: clangd/ClangdServer.h:202
+  void
+  scheduleASTRead(PathRef File,
+                  UniqueFunction<void(llvm::Expected<InputsAndAST>)> Action);
----------------
I like "schedule" in the description, but it seems to in-the-weeds for a function name - ideally the verb would be what the caller is trying to do, not how we implement it.

runWithAST? or if you want to emphasize the async nature, runSoonWithAST?

================
Comment at: clangd/ClangdServer.h:216
+
+private:
+  llvm::StringMap<ParseInputs> CachedInputs;
----------------
duplicate private

Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D42174