[PATCH] D148088: [RFC][clangd] Move preamble index out of document open critical path

Dmitry Polukhin via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed May 17 05:33:48 PDT 2023


DmitryPolukhin added inline comments.


================
Comment at: clang-tools-extra/clangd/ClangdServer.cpp:90
+    if (Tasks) {
+      std::unique_lock<Semaphore> Lock(*Barrier, std::try_to_lock);
+      Tasks->runAsync("task:" + Path + Version, std::move(Task));
----------------
kadircet wrote:
> what's the point of this semaphore, if we're only going to `try_to_lock`? this should be done in the `Task` that we're passing to the `Tasks`.
> 
> but more importantly, this is going to be a change in behavior, before this patch we had one preamble thread per open file and they'll run indexing (in addition to preamble builds) without any throttling.
> 
> if we limit number of indexig tasks to 1, we'll need a queue to ensure progress for every file. semaphore won't be fair especially in the face of contention, we'll also likely handle requests out-of-order (which is redundant but OK correctness-wise as our file-index already accounts for that).
> 
> hence I don't think there's any reason to change behaviour here until we see some issues with contention (which I believe is unlikely as we can still only issue preamble indexing requests as quick as we can build preambles, and the latter is slower than the former).
> we can still only issue preamble indexing requests as quick as we can build preambles, and the latter is slower than the former

Preamble compliation is not always slower than indexing. For example, if modules are actively used, compilation itself can be very fast but indexing is slow. Said that, I think we can eliminate the semaphore if there is no good reason to keep it.


================
Comment at: clang-tools-extra/clangd/Preamble.cpp:694
     Result->MainIsIncludeGuarded = CapturedInfo.isMainFileIncludeGuarded();
-    return Result;
+    CapturedCtx.emplace(CapturedInfo.takeLife());
+    return std::make_pair(Result, CapturedCtx);
----------------
kadircet wrote:
> DmitryPolukhin wrote:
> > kuganv wrote:
> > > kadircet wrote:
> > > > kuganv wrote:
> > > > > kadircet wrote:
> > > > > > what about just keeping the callback (with a different signature) and calling it here? e.g.:
> > > > > > ```
> > > > > > PreambleCallback(CapturedInfo.takeLife());
> > > > > > ```
> > > > > > 
> > > > > > that way we don't need to change the return type and also control if we received a valid object on every call site (since callback will only be invoked on success)
> > > > > > what about just keeping the callback (with a different signature) and calling it here? e.g.:
> > > > > > ```
> > > > > > PreambleCallback(CapturedInfo.takeLife());
> > > > > > ```
> > > > > > 
> > > > > > that way we don't need to change the return type and also control if we received a valid object on every call site (since callback will only be invoked on success)
> > > > > 
> > > > > Apologies for the misunderstanding.  Just to be clear, you prefer indexing in UpdateIndexCallbacks using the thread Tasks that also indexes in indexStdlib? In this implementation, I am calling the index action in preamble thread. I will revise it.
> > > > > Apologies for the misunderstanding. Just to be clear, you prefer indexing in UpdateIndexCallbacks using the thread Tasks that also indexes in indexStdlib? In this implementation, I am calling the index action in preamble thread. I will revise it.
> > > > 
> > > > Yes, i guess that's the reason why I was confused while looking at the code. Sorry if I give the impression that suggests doing the indexing on `PreambleThread`, but I think both in my initial comment:
> > > > 
> > > > >> As for AsyncTaskRunner to use, since this indexing task only depends on the file-index, which is owned by ClangdServer, I don't think there's any need to introduce a new taskrunner into TUScheduler and block its destruction. We can just re-use the existing TaskRunner inside parsingcallbacks, in which we run stdlib indexing tasks.
> > > > 
> > > > and in the follow up;
> > > > 
> > > > >> I think we can just change the signature for PreambleParsedCallback to pass along refcounted objects. forgot to mention in the first comment, but we should also change the CanonicalIncludes to be a shared_ptr so that it can outlive the PreambleData. We should invoke the callback inside buildPreamble after a successful build. Afterwards we should also change the signature for onPreambleAST to take AST, PP and CanonicalIncludes as ref-counted objects again and PreambleThread::build should just forward objects received from PreambleParsedCallback. Afterwards inside the UpdateIndexCallbacks::onPreambleAST we can just invoke indexing on Tasks if it's present or synchronously in the absence of it.
> > > > 
> > > > I was pointing towards running this inside the `Tasks` in `UpdateIndexCallbacks`.
> > > > 
> > > > ---
> > > > 
> > > > There's definitely some upsides to running that indexing on the preamble thread as well (which is what we do today) but I think the extra sequencing requirements (make sure to first notify the ASTPeer and then issue preamble callbacks) we put into TUScheduler (which is already quite complex) is probably not worth it.
> > > > > Apologies for the misunderstanding. Just to be clear, you prefer indexing in UpdateIndexCallbacks using the thread Tasks that also indexes in indexStdlib? In this implementation, I am calling the index action in preamble thread. I will revise it.
> > > > 
> > > > Yes, i guess that's the reason why I was confused while looking at the code. Sorry if I give the impression that suggests doing the indexing on `PreambleThread`, but I think both in my initial comment:
> > > > 
> > > > >> As for AsyncTaskRunner to use, since this indexing task only depends on the file-index, which is owned by ClangdServer, I don't think there's any need to introduce a new taskrunner into TUScheduler and block its destruction. We can just re-use the existing TaskRunner inside parsingcallbacks, in which we run stdlib indexing tasks.
> > > > 
> > > > and in the follow up;
> > > > 
> > > > >> I think we can just change the signature for PreambleParsedCallback to pass along refcounted objects. forgot to mention in the first comment, but we should also change the CanonicalIncludes to be a shared_ptr so that it can outlive the PreambleData. We should invoke the callback inside buildPreamble after a successful build. Afterwards we should also change the signature for onPreambleAST to take AST, PP and CanonicalIncludes as ref-counted objects again and PreambleThread::build should just forward objects received from PreambleParsedCallback. Afterwards inside the UpdateIndexCallbacks::onPreambleAST we can just invoke indexing on Tasks if it's present or synchronously in the absence of it.
> > > > 
> > > > I was pointing towards running this inside the `Tasks` in `UpdateIndexCallbacks`.
> > > > 
> > > > ---
> > > > 
> > > > There's definitely some upsides to running that indexing on the preamble thread as well (which is what we do today) but I think the extra sequencing requirements (make sure to first notify the ASTPeer and then issue preamble callbacks) we put into TUScheduler (which is already quite complex) is probably not worth it.
> > > 
> > > Thanks again for the review. Updated the diff such that indexing is now in IndexTasks. AFIK, there will be a single thread that will now manage indexing of preamble for all the opened TUs.  Please let me know if you see any issues there.
> > > There's definitely some upsides to running that indexing on the preamble thread as well (which is what we do today) but I think the extra sequencing requirements (make sure to first notify the ASTPeer and then issue preamble callbacks) we put into TUScheduler (which is already quite complex) is probably not worth it.
> > 
> > @kadircet I'm sorry for interfering with the review but could you please elaborate what are the downsides of current approach when preamble indexing happening off critical path but on the same thread? The benefits of such approach that preamble marked available earlier and compilation of the main file starts earlier in parallel with indexing preamble. Also such approach eliminates potential race condition between preamble thread and indexing thread. Code in [PrecompiledPreamble::Build](https://github.com/llvm/llvm-project/blob/main/clang/lib/Frontend/PrecompiledPreamble.cpp#L551-L577) accesses AST for preamble and if indexing is running in separate thread it will be race condition. If we schedule indexing after PrecompiledPreamble::Build e still will need to be very careful to don't access anything in preamble AST from PreambleThread after the task is scheduled. The only downside of running indexing on PreambleThread is that we cannot start building new preamble while previous one is indexing but I think it might be positive thing to throttle preamble building.
> > but could you please elaborate what are the downsides of current approach when preamble indexing happening off critical path but on the same thread?
> 
> As I mentioned above this makes TUScheduler more complicated now. We need to make sure we first send the preamble to astpeer and then invoke parsed callbacks, the sequencing is subtle and ties our hands if we want to refactor this code in the future as it needs to be preserved.
> 
> > the benefits of such approach that preamble marked available earlier and compilation of the main file starts earlier in parallel with indexing preamble. 
> 
> that's the case with both approaches, right? surely directly sending the preamble to ASTPeer might be couple milliseconds faster, but we're still indexing preamble in parallel. We might loose couple milliseconds while putting the new task into the queue, but considering the preamble/ast build times i don't think that really matters.
> 
> 
> > Also such approach eliminates potential race condition between preamble thread and indexing thread. Code in PrecompiledPreamble::Build accesses AST for preamble and if indexing is running in separate thread it will be race condition. If we schedule indexing after PrecompiledPreamble::Build e still will need to be very careful to don't access anything in preamble AST from PreambleThread after the task is scheduled. 
> 
> sorry I don't fully understand the argument here. in both alternatives we're running indexing after `PrecompiledPreamble::Build` finishes hence they cannot access "live" ast concurrently.
> Maybe you mean `ParsedAST::build`, i.e. the main ast build? If that's the case, it uses the serialized preamble, not "live" astcontext (and serialized preamble is already thread-safe to read concurrently).
> We need to make sure we first send the preamble to astpeer and then invoke parsed callbacks, the sequencing is subtle and ties our hands if we want to refactor this code in the future as it needs to be preserved.

I completely agree that clangd threading has lots of subtle things that makes it easy to violate expected sequence of events. My comment about potential race condition was about very similar thing, it is very easy to keep pointer to ASTContext/FileManager/etc. in a callback and call it after the task was scheduled and it will be race condition. That is why I thought that indexing on the same thread that was used for building preamble AST is safer. I'm not aware of any existing race conditions with this diff. My example was hypothetical what would happen if indexing task was scheduled from AfterExecute, for example, as it was from one of the version of this diff. Just to make it clear, I'm fine with current approach too.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148088/new/

https://reviews.llvm.org/D148088



More information about the cfe-commits mailing list