[PATCH] D55224: [clangd] Introduce loading of shards within auto-index

Wed Jan 9 09:05:46 PST 2019

kadircet added inline comments.

================
Comment at: clangd/index/Background.cpp:259
+      // if this thread sees the older version but finishes later. This should
+      // be rare in practice.
+      DigestIt.first->second = Hash;
----------------
ilya-biryukov wrote:
> kadircet wrote:
> > ilya-biryukov wrote:
> > > kadircet wrote:
> > > > ilya-biryukov wrote:
> > > > > kadircet wrote:
> > > > > > ilya-biryukov wrote:
> > > > > > > > "should be rare in practice"
> > > > > > > And yet, can we avoid this altogether?
> > > > > > > 
> > > > > > > Also, I believe it won't be rare. When processing multiple different TUs, we can easily run into the same header multiple times, possibly with the different contents.
> > > > > > > E.g. imagine the index for the whole of clang is being built and the user is editing `Sema.h` at the same time. We'll definitely be running into this case over and over again.
> > > > > > Well I am open to ideas, but currently there is no way of knowing whether this is the newer version for the file. Because only information we have is the digest is different than what we currently have and this doesn't yield any chronological information.
> > > > > Do we know which request is "newer" when scheduling it?
> > > > > If so, we could keep the map of the **latest** hashes of the files we've seen (in addition to the `IndexedFileDigests`, which correspond to the digest for which we built the index IIUC).
> > > > > 
> > > > > This would give us a simple invariant of the final state we want to bring the index to: `IndexedFileDigests` should correspond to the latest hashes seen so far. If not, we have to rebuild the index for the corresponding files. That, in turn, gives us a way to resolve conflicts: we should never replace with symbols built for the latest version (hash) of the file we've seen.
> > > > > 
> > > > > Would that work?
> > > > I am not sure if it would work for non-main files. We update the Hash for the main files at each indexing action, so we can safely keep track of the latest versions. But we collect hashes for dependencies while performing the indexing which happens in parallel. Therefore an indexing action triggered earlier might get an up-to-date version of a dependency than an action triggered later-on.
> > > If updates for each TU are atomic (i.e. all files included in the TU are updated in a single go) we would get the up-to-date index eventually, assuming we rebuild the index when TU dependencies change (we don't schedule rebuilds on changes to included header now, but we're planning to do so at some point).
> > Sure, that assumption seems more relaxed than the previous one, just wanna make sure I got it right:
> > Your suggested solution assumes: Dependencies of a TU won't change during indexing of that TU
> > Whereas the previous assumption was: Files won't change between close indexing actions
> Exactly. The idea is that eventually we'll start tracking changes and will be able to detect even those cases and rebuild the index accordingly. Just trying to keep us from drifting away from that model too much.
> 
> > files won't change between close indexing actions
> This assumption worked fine for indexing actions running right away, but I think the situation with loading shards is different: the shards we are loading might be old and if we don't want a stale shard (which might be arbitrarily old) to overwrite results of the fresh indexing action. Let me know if I'm missing something here.
Nope, your point seems to be correct

================
Comment at: clangd/index/Background.cpp:312
+      // Skip if file is already up to date.
+      auto DigestIt = IndexedFileDigests.try_emplace(Path);
+      if (!DigestIt.second && DigestIt.first->second == Hash)
----------------
ilya-biryukov wrote:
> We still update digests and symbols non-atomically in that case. Could we do both under the same lock, similar to how it's done in `loadShards`?
Moved update to the end of the loop, but now we might perform unnecessary work for building {symbol,ref}slabs for files that are already up-to-date. It shouldn't be too much of an issue, as a solution we can lock at the entrance and only check if the file was up-to-date, than we update at the exit. Or we can keep a snapshot.

Repository:
  rCTE Clang Tools Extra

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55224/new/

https://reviews.llvm.org/D55224