[PATCH] D148088: [RFC][clangd] Move preamble index out of document open critical path

Kadir Cetinkaya via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed May 17 03:19:17 PDT 2023


kadircet added inline comments.


================
Comment at: clang-tools-extra/clangd/ClangdServer.cpp:90
+    if (Tasks) {
+      std::unique_lock<Semaphore> Lock(*Barrier, std::try_to_lock);
+      Tasks->runAsync("task:" + Path + Version, std::move(Task));
----------------
what's the point of this semaphore, if we're only going to `try_to_lock`? this should be done in the `Task` that we're passing to the `Tasks`.

but more importantly, this is going to be a change in behavior, before this patch we had one preamble thread per open file and they'll run indexing (in addition to preamble builds) without any throttling.

if we limit number of indexig tasks to 1, we'll need a queue to ensure progress for every file. semaphore won't be fair especially in the face of contention, we'll also likely handle requests out-of-order (which is redundant but OK correctness-wise as our file-index already accounts for that).

hence I don't think there's any reason to change behaviour here until we see some issues with contention (which I believe is unlikely as we can still only issue preamble indexing requests as quick as we can build preambles, and the latter is slower than the former).


================
Comment at: clang-tools-extra/clangd/ClangdServer.cpp:91
+      std::unique_lock<Semaphore> Lock(*Barrier, std::try_to_lock);
+      Tasks->runAsync("task:" + Path + Version, std::move(Task));
+    } else
----------------
s/task:/Preamble indexing for:/


================
Comment at: clang-tools-extra/clangd/ClangdServer.cpp:114
+    if (Tasks) {
+      std::unique_lock<Semaphore> Lock(*Barrier, std::try_to_lock);
       Tasks->runAsync("IndexStdlib", std::move(Task));
----------------
same as above


================
Comment at: clang-tools-extra/clangd/Preamble.cpp:101
 
+  CapturedASTCtx takeLife() { return std::move(*CapturedCtx); }
+
----------------
unlike other `takeXXX` methods, this will actually leave the optional inside still in a valid state.
can you instead return a `std::optional<CapturedASTCtx>` here?


================
Comment at: clang-tools-extra/clangd/Preamble.cpp:698-700
+      // While extending the life of FileMgr and VFS, StatCache should also be
+      // extended.
+      Ctx.setStatCache(Result->StatCache);
----------------
we use a "producingfs" when building the preamble, hence there shouldn't be any references to statcache inside the FM&VFS stored in the compiler instance here. have you noticed something to the contrary ?


================
Comment at: clang-tools-extra/clangd/Preamble.cpp:106-113
+    CI.setSema(nullptr);
+    CI.setASTConsumer(nullptr);
+    if (CI.getASTReader()) {
+      CI.getASTReader()->setDeserializationListener(nullptr);
+      /* This just sets consumer to null when DeserializationListener is null */
+      CI.getASTReader()->StartTranslationUnit(nullptr);
     }
----------------
kuganv wrote:
> kadircet wrote:
> > why are we triggering destructors for all of these objects eagerly? if this is deliberate to "fix" some issue, could you mention that in comments?
> > why are we triggering destructors for all of these objects eagerly? if this is deliberate to "fix" some issue, could you mention that in comments?
> 
> Thanks a lot for the review.
> If we don't destruct and set it to null, CapturedASTCtx will also have to keep instances such as ASTConsumer including other related callbacks such PreambleCallbacks. This was making the CapturedASTCtx interface and implementation complex.
> If we don't destruct and set it to null, CapturedASTCtx will also have to keep instances such as ASTConsumer 

Sorry if I am being dense but I can't actually see any references to those objects in the structs we preseve. AFAICT, Sema, ASTConsumer and ASTReader are members of CompilerInstance, which we don't keep around. Can you point me towards any references to the rest of the objects you're setting to null in case I am missing any?

`ASTMutationListener` seems to be the only one that's relevant. It's indeed exposed through `ASTContext` and our indexing operations can trigger various callbacks to fire. Can you set only that one to nullptr and leave a comment explaining the situation?


================
Comment at: clang-tools-extra/clangd/Preamble.cpp:694
     Result->MainIsIncludeGuarded = CapturedInfo.isMainFileIncludeGuarded();
-    return Result;
+    CapturedCtx.emplace(CapturedInfo.takeLife());
+    return std::make_pair(Result, CapturedCtx);
----------------
DmitryPolukhin wrote:
> kuganv wrote:
> > kadircet wrote:
> > > kuganv wrote:
> > > > kadircet wrote:
> > > > > what about just keeping the callback (with a different signature) and calling it here? e.g.:
> > > > > ```
> > > > > PreambleCallback(CapturedInfo.takeLife());
> > > > > ```
> > > > > 
> > > > > that way we don't need to change the return type and also control if we received a valid object on every call site (since callback will only be invoked on success)
> > > > > what about just keeping the callback (with a different signature) and calling it here? e.g.:
> > > > > ```
> > > > > PreambleCallback(CapturedInfo.takeLife());
> > > > > ```
> > > > > 
> > > > > that way we don't need to change the return type and also control if we received a valid object on every call site (since callback will only be invoked on success)
> > > > 
> > > > Apologies for the misunderstanding.  Just to be clear, you prefer indexing in UpdateIndexCallbacks using the thread Tasks that also indexes in indexStdlib? In this implementation, I am calling the index action in preamble thread. I will revise it.
> > > > Apologies for the misunderstanding. Just to be clear, you prefer indexing in UpdateIndexCallbacks using the thread Tasks that also indexes in indexStdlib? In this implementation, I am calling the index action in preamble thread. I will revise it.
> > > 
> > > Yes, i guess that's the reason why I was confused while looking at the code. Sorry if I give the impression that suggests doing the indexing on `PreambleThread`, but I think both in my initial comment:
> > > 
> > > >> As for AsyncTaskRunner to use, since this indexing task only depends on the file-index, which is owned by ClangdServer, I don't think there's any need to introduce a new taskrunner into TUScheduler and block its destruction. We can just re-use the existing TaskRunner inside parsingcallbacks, in which we run stdlib indexing tasks.
> > > 
> > > and in the follow up;
> > > 
> > > >> I think we can just change the signature for PreambleParsedCallback to pass along refcounted objects. forgot to mention in the first comment, but we should also change the CanonicalIncludes to be a shared_ptr so that it can outlive the PreambleData. We should invoke the callback inside buildPreamble after a successful build. Afterwards we should also change the signature for onPreambleAST to take AST, PP and CanonicalIncludes as ref-counted objects again and PreambleThread::build should just forward objects received from PreambleParsedCallback. Afterwards inside the UpdateIndexCallbacks::onPreambleAST we can just invoke indexing on Tasks if it's present or synchronously in the absence of it.
> > > 
> > > I was pointing towards running this inside the `Tasks` in `UpdateIndexCallbacks`.
> > > 
> > > ---
> > > 
> > > There's definitely some upsides to running that indexing on the preamble thread as well (which is what we do today) but I think the extra sequencing requirements (make sure to first notify the ASTPeer and then issue preamble callbacks) we put into TUScheduler (which is already quite complex) is probably not worth it.
> > > > Apologies for the misunderstanding. Just to be clear, you prefer indexing in UpdateIndexCallbacks using the thread Tasks that also indexes in indexStdlib? In this implementation, I am calling the index action in preamble thread. I will revise it.
> > > 
> > > Yes, i guess that's the reason why I was confused while looking at the code. Sorry if I give the impression that suggests doing the indexing on `PreambleThread`, but I think both in my initial comment:
> > > 
> > > >> As for AsyncTaskRunner to use, since this indexing task only depends on the file-index, which is owned by ClangdServer, I don't think there's any need to introduce a new taskrunner into TUScheduler and block its destruction. We can just re-use the existing TaskRunner inside parsingcallbacks, in which we run stdlib indexing tasks.
> > > 
> > > and in the follow up;
> > > 
> > > >> I think we can just change the signature for PreambleParsedCallback to pass along refcounted objects. forgot to mention in the first comment, but we should also change the CanonicalIncludes to be a shared_ptr so that it can outlive the PreambleData. We should invoke the callback inside buildPreamble after a successful build. Afterwards we should also change the signature for onPreambleAST to take AST, PP and CanonicalIncludes as ref-counted objects again and PreambleThread::build should just forward objects received from PreambleParsedCallback. Afterwards inside the UpdateIndexCallbacks::onPreambleAST we can just invoke indexing on Tasks if it's present or synchronously in the absence of it.
> > > 
> > > I was pointing towards running this inside the `Tasks` in `UpdateIndexCallbacks`.
> > > 
> > > ---
> > > 
> > > There's definitely some upsides to running that indexing on the preamble thread as well (which is what we do today) but I think the extra sequencing requirements (make sure to first notify the ASTPeer and then issue preamble callbacks) we put into TUScheduler (which is already quite complex) is probably not worth it.
> > 
> > Thanks again for the review. Updated the diff such that indexing is now in IndexTasks. AFIK, there will be a single thread that will now manage indexing of preamble for all the opened TUs.  Please let me know if you see any issues there.
> > There's definitely some upsides to running that indexing on the preamble thread as well (which is what we do today) but I think the extra sequencing requirements (make sure to first notify the ASTPeer and then issue preamble callbacks) we put into TUScheduler (which is already quite complex) is probably not worth it.
> 
> @kadircet I'm sorry for interfering with the review but could you please elaborate what are the downsides of current approach when preamble indexing happening off critical path but on the same thread? The benefits of such approach that preamble marked available earlier and compilation of the main file starts earlier in parallel with indexing preamble. Also such approach eliminates potential race condition between preamble thread and indexing thread. Code in [PrecompiledPreamble::Build](https://github.com/llvm/llvm-project/blob/main/clang/lib/Frontend/PrecompiledPreamble.cpp#L551-L577) accesses AST for preamble and if indexing is running in separate thread it will be race condition. If we schedule indexing after PrecompiledPreamble::Build e still will need to be very careful to don't access anything in preamble AST from PreambleThread after the task is scheduled. The only downside of running indexing on PreambleThread is that we cannot start building new preamble while previous one is indexing but I think it might be positive thing to throttle preamble building.
> but could you please elaborate what are the downsides of current approach when preamble indexing happening off critical path but on the same thread?

As I mentioned above this makes TUScheduler more complicated now. We need to make sure we first send the preamble to astpeer and then invoke parsed callbacks, the sequencing is subtle and ties our hands if we want to refactor this code in the future as it needs to be preserved.

> the benefits of such approach that preamble marked available earlier and compilation of the main file starts earlier in parallel with indexing preamble. 

that's the case with both approaches, right? surely directly sending the preamble to ASTPeer might be couple milliseconds faster, but we're still indexing preamble in parallel. We might loose couple milliseconds while putting the new task into the queue, but considering the preamble/ast build times i don't think that really matters.


> Also such approach eliminates potential race condition between preamble thread and indexing thread. Code in PrecompiledPreamble::Build accesses AST for preamble and if indexing is running in separate thread it will be race condition. If we schedule indexing after PrecompiledPreamble::Build e still will need to be very careful to don't access anything in preamble AST from PreambleThread after the task is scheduled. 

sorry I don't fully understand the argument here. in both alternatives we're running indexing after `PrecompiledPreamble::Build` finishes hence they cannot access "live" ast concurrently.
Maybe you mean `ParsedAST::build`, i.e. the main ast build? If that's the case, it uses the serialized preamble, not "live" astcontext (and serialized preamble is already thread-safe to read concurrently).


================
Comment at: clang-tools-extra/clangd/Preamble.h:49-50
+///
+/// As we index the preamble in a different thread, we extend the life of
+/// associated data by capturing and storing references here.
+struct CapturedASTCtx {
----------------
Having the use case mentioned here is nice, but I think we should also document "what/how" this struct operates.

maybe something like:
```
Keeps necessary structs for an ASTContext and Preprocessor alive.

This enables consuming them after context that produced the AST is gone. (e.g. indexing a preamble ast on a separate thread).
ASTContext stored inside is still not thread-safe.
```


================
Comment at: clang-tools-extra/clangd/Preamble.h:51
+/// associated data by capturing and storing references here.
+struct CapturedASTCtx {
+public:
----------------
can you also make this struct move-only by deleting copy constructor/assignment? as we don't want to have multiple references to astcontext by mistake


================
Comment at: clang-tools-extra/clangd/Preamble.h:105
+  std::shared_ptr<PreambleFileStatusCache> StatCache;
+  std::shared_ptr<CanonicalIncludes> CanonIncludes;
   // Whether there was a (possibly-incomplete) include-guard on the main file.
----------------
s/CanonicalIncludes/const CanonicalIncludes/ (to make sure we don't accidentally modify it concurrently) this would also imply changes on the callback signatures


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148088/new/

https://reviews.llvm.org/D148088



More information about the cfe-commits mailing list