[PATCH] D55224: [clangd] Introduce loading of shards within auto-index
Ilya Biryukov via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Wed Jan 9 06:44:19 PST 2019
ilya-biryukov added inline comments.
================
Comment at: clangd/SourceCode.h:21
#include "clang/Tooling/Core/Replacement.h"
+#include "llvm/Support/Path.h"
#include "llvm/Support/SHA1.h"
----------------
Looks redundant. Remove?
================
Comment at: clangd/index/Background.cpp:197
+ Cmd.CommandLine.push_back("-resource-dir=" + ResourceDir);
+ const std::string FileName = Cmd.Filename;
+ if (auto Error = index(std::move(Cmd), Storage))
----------------
kadircet wrote:
> ilya-biryukov wrote:
> > use `llvm::StringRef`
> If the command fails we want to show the filename but we are moving the Cmd, so we need to keep a copy of the filename.
Ah, we're moving out of `Cmd` here... I'd say `index` should accept a `const CompileCommand&`? If the goal was to optimize for less copies, then we failed - now we need to copy the filename string. `index()` is not part of this patch, though, so let's not change it.
Could you add a comment that we can't use `StringRef` because we move from Cmd? It's easy to miss.
================
Comment at: clangd/index/Background.cpp:312
+ // Skip if file is already up to date.
+ auto DigestIt = IndexedFileDigests.try_emplace(Path);
+ if (!DigestIt.second && DigestIt.first->second == Hash)
----------------
We still update digests and symbols non-atomically in that case. Could we do both under the same lock, similar to how it's done in `loadShards`?
================
Comment at: clangd/index/Background.cpp:259
+ // if this thread sees the older version but finishes later. This should
+ // be rare in practice.
+ DigestIt.first->second = Hash;
----------------
kadircet wrote:
> ilya-biryukov wrote:
> > kadircet wrote:
> > > ilya-biryukov wrote:
> > > > kadircet wrote:
> > > > > ilya-biryukov wrote:
> > > > > > > "should be rare in practice"
> > > > > > And yet, can we avoid this altogether?
> > > > > >
> > > > > > Also, I believe it won't be rare. When processing multiple different TUs, we can easily run into the same header multiple times, possibly with the different contents.
> > > > > > E.g. imagine the index for the whole of clang is being built and the user is editing `Sema.h` at the same time. We'll definitely be running into this case over and over again.
> > > > > Well I am open to ideas, but currently there is no way of knowing whether this is the newer version for the file. Because only information we have is the digest is different than what we currently have and this doesn't yield any chronological information.
> > > > Do we know which request is "newer" when scheduling it?
> > > > If so, we could keep the map of the **latest** hashes of the files we've seen (in addition to the `IndexedFileDigests`, which correspond to the digest for which we built the index IIUC).
> > > >
> > > > This would give us a simple invariant of the final state we want to bring the index to: `IndexedFileDigests` should correspond to the latest hashes seen so far. If not, we have to rebuild the index for the corresponding files. That, in turn, gives us a way to resolve conflicts: we should never replace with symbols built for the latest version (hash) of the file we've seen.
> > > >
> > > > Would that work?
> > > I am not sure if it would work for non-main files. We update the Hash for the main files at each indexing action, so we can safely keep track of the latest versions. But we collect hashes for dependencies while performing the indexing which happens in parallel. Therefore an indexing action triggered earlier might get an up-to-date version of a dependency than an action triggered later-on.
> > If updates for each TU are atomic (i.e. all files included in the TU are updated in a single go) we would get the up-to-date index eventually, assuming we rebuild the index when TU dependencies change (we don't schedule rebuilds on changes to included header now, but we're planning to do so at some point).
> Sure, that assumption seems more relaxed than the previous one, just wanna make sure I got it right:
> Your suggested solution assumes: Dependencies of a TU won't change during indexing of that TU
> Whereas the previous assumption was: Files won't change between close indexing actions
Exactly. The idea is that eventually we'll start tracking changes and will be able to detect even those cases and rebuild the index accordingly. Just trying to keep us from drifting away from that model too much.
> files won't change between close indexing actions
This assumption worked fine for indexing actions running right away, but I think the situation with loading shards is different: the shards we are loading might be old and if we don't want a stale shard (which might be arbitrarily old) to overwrite results of the fresh indexing action. Let me know if I'm missing something here.
================
Comment at: clangd/index/Background.h:111
+ bool NeedsReIndexing;
+ Source(llvm::StringRef Path, bool NeedsReIndexing = true)
+ : Path(Path), NeedsReIndexing(NeedsReIndexing) {}
----------------
kadircet wrote:
> ilya-biryukov wrote:
> > NIT: maybe remove the constructor? plain structs can easily be initialized with init lists and adding a constructor forbids per-field assignment, which is a bit nicer in some cases.
> > Not very important, since it's a private API, but still.
> Once I assign a default value for NeedsReIndexing, I cannot construct Source objects with init lists, therefore I've added a constructor instead.
Right, makes sense. You could initialize explicitly, but I can see how constructor might be simpler to use in that particular use-case.
Since we now have a constructor, could you remove the default arg? It's not possible to get an undefined value for `NeedsReindexing` with a constructor.
Repository:
rCTE Clang Tools Extra
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D55224/new/
https://reviews.llvm.org/D55224
More information about the cfe-commits
mailing list