[PATCH] D50385: [clangd] Collect symbol occurrences from AST.

Thu Aug 23 05:58:19 PDT 2018

ilya-biryukov added inline comments.

================
Comment at: clangd/index/Index.h:46
+
+inline bool operator==(const SymbolLocation::Position &L,
+                       const SymbolLocation::Position &R) {
----------------
hokein wrote:
> ilya-biryukov wrote:
> > NIT: having friend decls inside the classes themselves might prove to be more readable.
> > Not opposed to the current one too, feel free to ignore.
> These operator implementations seem not as much interesting as members in the structure, putting them to the structure probably adds some noise to readers.
> 
> 
Ok, LG outside too

================
Comment at: clangd/index/Index.h:350
+
+    llvm::DenseMap<SymbolID, std::vector<SymbolOccurrence>> SymbolOccurrences;
   };
----------------
hokein wrote:
> ilya-biryukov wrote:
> > Any store occurences in a file-centric manner?
> > E.g. 
> > ```
> > /// Occurences inside a single file.
> > class FileOccurences {
> >   StringRef File;
> >   vector<pair<Point, OccurenceKind>> Locations;
> > };
> > 
> > // ....
> > DenseMap<SymbolID, vector<FileOccurences>>  SymbolOccurences;
> > ```
> > 
> > As discussed previously, this representation is better suited for both merging and serialization.
> The file-centric manner doesn't seem to suite our current model:  whenever we update the index for the main AST, we just replace the symbol slab with the new one; and for index merging, we only use the index `findOccurrences` interfaces.
> 
> It would save some memory usage of `StringRef` File, but AFAIK, the memory usage of current model is relatively small (comparing with the SymbolSlab for code completion) since we only store occurrences in main file (~50KB for `CodeComplete.cpp`).
> 
> I'd leave it as it is now, and we could revisit it later.
Isn't the merging model different for the occurrences?
We would actually have to drop **all** references from the older index when merging if the new one contains locations in the same file.
If the merge if file-centric, the file-based representation makes more sense in the first place. 

Apart from simpler merging the code, the file-based representation also buys us more efficient serialization for the static index, arguably efficient enough to stash all the occurrences even into our YAML index.

Postponing till later is also fine, but I'm not sure it buys us much now. These arguments only apply if we think the file-centric approach is a the right final design, though.

================
Comment at: clangd/index/SymbolCollector.h:59
+      // If none, all symbol occurrences will be collected.
+      llvm::Optional<llvm::DenseSet<SymbolID>> IDs = llvm::None;
+    };
----------------
hokein wrote:
> ilya-biryukov wrote:
> > Could you elaborate on what this option will be used for? How do we know in advance which symbols we're interested in?
> This is used for finding references in the AST as a part of the xref implementation, basically the workflow would be:
> 
> 1. find SymbolIDs of the symbol under the cursor, using `DeclarationAndMacrosFinder`
> 2. run symbol collector to find all occurrences in the main AST with all SymbolIDs in #1
> 3. query the index, to get more occurrences
> 4. merge them  
Can we instead find all the occurences in  `DeclarationAndMacrosFinder` directly?
Extra run of `SymbolCollector` means another AST traversal, which is slow by itself, and SymbolCollector s designed for a much more hairy problem, its interface is just not nicely suited for things like only occurrences. The latter seems to be a simpler problem, and we can have a simpler interface to solve it (possibly shared between SymbolCollector and DeclarationAndMacrosFinder). WDYT?

Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D50385