[PATCH] D39050: Add index-while-building support to Clang

Thu Dec 7 09:53:19 PST 2017

malaperle added inline comments.

================
Comment at: include/indexstore/IndexStoreCXX.h:84
+
+  std::pair<unsigned, unsigned> getLineCol() {
+    unsigned line, col;
----------------
ioeric wrote:
> nathawes wrote:
> > malaperle wrote:
> > > From what I understand, this returns the beginning of the occurrence. It would be useful to also have the end of the occurrence. From what I tested in Xcode, when you do "Find Selected Symbol in Workspace", it highlights the symbol name in yellow in the list of matches, so it mush use that LineCol then highlight the matching name. This is works in many situations but others occurrences won't have the name of the symbol. For example:
> > > "MyClass o1, o2;"
> > > If I use "Find Selected Symbol in Workspace" on MyClass constructor, if won't be able to highlight o1 and o2
> > > Do you think it would be possible to add that (EndLineCol)? If not, how would one go about extending libindexstore in order to add additional information per occurrence? It is not obvious to me so far.
> > > We also need other things, for example in the case of definitions, we need the full range of the definition so that users can "peek" at definitions in a pop-up. Without storing the end of the definition in the index, we would need to reparse the file.
> > > 
> > Our approach to related locations (e.g. name, signature, body, and doc comment start/end locs) has been to not include them in the index and derive them from the start location later. There's less data to collect and write out during the build that way, and deriving the other locations isn't that costly usually, as in most cases you 1) don't need to type check or even preprocess to get the related locations, as with finding the end of o1 and o2 in your example, and 2) usually only need to derive locations for a single or small number of occurrences, like when 'peeking' at a definition. 
> > 
> > Are there cases where you think this approach won't work/perform well enough for the indexer-backed queries clangd needs to support?
> > 
> I agree that we should try to keep the serialized symbol size minimal, but I think it's worthwhile to also store end locations for occurrences because 1) it's cheap, 2) it's not necessary easy to compute without parsing for occurrences like `a::b::c` or `a::b<X>`, 3) it would be useful for many LSP use cases.
There's a few reason I think it's better to store the end loc.
- When doing "find references", computing the end loc of each occurrence will be costly. Imagine having thousands of occurrences and for each of them having to run logic to find the end of the occurrence.
- The AST and preprocessor are the best tools I know to figure out the proper end loc. Not using them means having to write a mini-preprocessor with some knowledge about the language semantics to cover some cases.
```
MyClass |o1, o2;
```
Here, I have to stop at the comma. So it's basically take any alpha-numerical character, right?
```
bool operator<(const Foo&, const Foo&)
Ret operator()(Params ...params)
```
No, in those cases, we have to take < and the first (). 
- In the case of body start/end locations, similarly, it can be non-trivial.
```
void foo() {
  if (0) {
  }
}
```
We have to count the balanced { } until we finish the body.

```
#define FUNC_BODY {\
\
}
void foo() FUNC_BODY
```
Oops, where's the body? We need another special logic for this, etc.

I think overall, it puts a lot of burden on the client of libIndexStore, burden that would be much more work and more inaccurate than using the AST/Preprocessor while indexing.

================
Comment at: include/indexstore/IndexStoreCXX.h:374
+
+  enum class DependencyKind {
+    Unit,
----------------
nathawes wrote:
> malaperle wrote:
> > As part of this dependency tracking mechanism, I haven't found that it could provide information about about the files including a specific header. So given a header (or source file in odd cases), I do not see a way to get all the files that would need to be reindexed if that header changed. Is that something you achieve outside the index? Or perhaps this is something I missed in the code.
> The unit files store the path of the header/source files they depend on as 'File' dependencies. So any unit file with 'File' dependency on header/source file that was modified may need to be re-indexed.
> 
> To support finding which specific files include or are included by a given header (rather than which units somehow transitively include it) we also store the file-to-file inclusions in the unit file (retrieved via IndexUnitReader's foreachInclude method below).
Thanks! I'll play around with this a bit more with this new information.

================
Comment at: include/indexstore/IndexStoreCXX.h:377
+    Record,
+    File,
+  };
----------------
nathawes wrote:
> malaperle wrote:
> > Could there be a bit of explanation about what's a File dependency versus record and unit? All units and records are file dependencies, right? Are there any files that are neither records or units?
> I'll rename this to SourceFile and add some comments to explain. Unit file dependencies separate the source dependencies into 'File' dependencies and 'Record' dependencies. The 'File' dependencies track the paths of the header/source files seen in the translation unit, while the 'Record' dependencies track which record files have the symbolic content seen in those source files – the header/source file path doesn't appear anywhere in the record file. This separation lets us depend on a single record file corresponding to multiple source files (e.g. when two source files have the same symbolic content), and on a single source file corresponding multiple record files (e.g. when a single header is included multiple times with different preprocessor contexts changing its symbolic content).
It's more clear now, thanks!

https://reviews.llvm.org/D39050