[PATCH] D77732: [clangd] Shard preamble symbols in dynamic index

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Apr 13 14:43:03 PDT 2020


sammccall accepted this revision.
sammccall added a comment.
This revision is now accepted and ready to land.

LG, API quibbles.

It would be nice to test this but the good things about it (memory usage) aren't easy to test, and the bad things (sensitivity to header parsing context) seem odd to be the primary test.
Actually I guess one nice thing we get from this is erasure of stale symbols: if we index the preamble {foo.cpp=`#include "x.h"`, x.h=`int a;`}, and then {bar.cpp=`#include "x.h"`, x.h=`int b;`}, then I think before this patch we see both a and b in the index, and after we see only b.
That would be a reasonable test.

Or I guess now that the sharding function is public we can test it directly.



================
Comment at: clang-tools-extra/clangd/index/FileIndex.h:129
   //
-  // FIXME: Because the preambles for different TUs have large overlap and
-  // FileIndex doesn't deduplicate, this uses lots of extra RAM.
-  // The biggest obstacle in fixing this: the obvious approach of partitioning
-  // by declaring file (rather than main file) fails if headers provide
-  // different symbols based on preprocessor state.
+  // Note that we store only one version of a heder, hence symbols appearing in
+  // different PP states will be missing.
----------------
nit: header


================
Comment at: clang-tools-extra/clangd/index/FileIndex.h:157
 
+/// Takes slabs coming from a TU(multiple files) and shards them per declaration
+/// location.
----------------
nit: space before parenthetical


================
Comment at: clang-tools-extra/clangd/index/FileIndex.h:159
+/// location.
+struct ShardIndexToFiles {
+  /// \p HintPath is used to convert file URIs stored in symbols into absolute
----------------
Hmm, this is a good name for a function, but we want a noun for a class. FileShardedIndex?


================
Comment at: clang-tools-extra/clangd/index/FileIndex.h:165
+  /// Returns absolute paths for all files that has a shard.
+  std::vector<PathRef> getAllFiles() const;
+
----------------
I do find it a little weird not to expose the map-structure of the vast majority of the data here.

What steers you away from just making this a function
`StringMap<X> shardIndexToFile(IndexFileIn, PathRef)`
where X could be `File` with a method to obtain the data, or the more anonymous `unique_function<IndexFileIn()>`?


================
Comment at: clang-tools-extra/clangd/index/FileIndex.h:172
+  /// Generates slabs for the \p File using the sharded information. Note that
+  /// these functions results in a copy of all the relevant data.
+  SymbolSlab buildSymbolSlab(PathRef File) const;
----------------
Can we bundle these parallel functions into one that returns IndexFileIn?
I don't think there's much efficiency in building only a subset, is there? (If there's a lot of data in e.g. refs we're going to ignore, we can strip them from the input)

That would also mean no need to special-case mainfilecmd in the API - the only caller of that function does it in the file loop, so it would be provided for the main-file shard only.


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:286
 
+  // ND is the canonical (i.e. first) declaration. If it's in the main file
+  // (which is not a header), then no public declaration was visible, so assume
----------------
kadircet wrote:
> put this before `processRelations` to ensure `Subject` of a relation is a symbol we are going to collect.
The code around this seems unchanged - how does this relate to the rest of the patch?
(Other than I guess you've been inspecting the data and finding bogus shards :-)

This is bit subtle, and hard to reason about since we only have one example. Indeed dangling relations are pointless. However:
 - the direction of the relation is fairly arbitrary. Should we be checking shouldCollectSymbol on the object of the relation too?
 - shouldCollectSymbols depends somewhat on the file context, and also the indexer options. In principle the subject could be indexable elsewhere but not here. Or this index could see the relation and another could see the symbol itself. Can't think of compelling examples though, just hypotheticals.

I think probably we should give this a comment (about not wanting relations or refs either), add filtering on the relation objects, and add a test (I think that should be pretty easy with inheriting from a main-file-only class?).
Maybe it's a separate patch?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77732/new/

https://reviews.llvm.org/D77732





More information about the cfe-commits mailing list