[PATCH] D39050: Add index-while-building support to Clang

Eric Liu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Nov 28 15:16:48 PST 2017


ioeric added a comment.

First round of comments. Mostly around indexing actions and file records; I haven't started reviewing the data writing and storing code. I think it might make sense to split the index writing and storing logics into a separate patch, which should be possible if `writeUnitData` is abstracted into an interface.



================
Comment at: include/clang/Frontend/CompilerInstance.h:188
+  /// produced to generate imported modules before they are executed.
+  std::function<std::unique_ptr<FrontendAction>
+    (const FrontendOptions &opts, std::unique_ptr<FrontendAction> action)>
----------------
It might make sense to define an alias for `std::function<std::unique_ptr<FrontendAction>(const FrontendOptions &opts, std::unique_ptr<FrontendAction> action)>`, which is used multiple times.


================
Comment at: include/clang/Frontend/FrontendOptions.h:262
 
+  std::string IndexStorePath;
+  unsigned IndexIgnoreSystemSymbols : 1;
----------------
It might make sense to also have documentations for these options here. 


================
Comment at: include/indexstore/IndexStoreCXX.h:84
+
+  std::pair<unsigned, unsigned> getLineCol() {
+    unsigned line, col;
----------------
nathawes wrote:
> malaperle wrote:
> > From what I understand, this returns the beginning of the occurrence. It would be useful to also have the end of the occurrence. From what I tested in Xcode, when you do "Find Selected Symbol in Workspace", it highlights the symbol name in yellow in the list of matches, so it mush use that LineCol then highlight the matching name. This is works in many situations but others occurrences won't have the name of the symbol. For example:
> > "MyClass o1, o2;"
> > If I use "Find Selected Symbol in Workspace" on MyClass constructor, if won't be able to highlight o1 and o2
> > Do you think it would be possible to add that (EndLineCol)? If not, how would one go about extending libindexstore in order to add additional information per occurrence? It is not obvious to me so far.
> > We also need other things, for example in the case of definitions, we need the full range of the definition so that users can "peek" at definitions in a pop-up. Without storing the end of the definition in the index, we would need to reparse the file.
> > 
> Our approach to related locations (e.g. name, signature, body, and doc comment start/end locs) has been to not include them in the index and derive them from the start location later. There's less data to collect and write out during the build that way, and deriving the other locations isn't that costly usually, as in most cases you 1) don't need to type check or even preprocess to get the related locations, as with finding the end of o1 and o2 in your example, and 2) usually only need to derive locations for a single or small number of occurrences, like when 'peeking' at a definition. 
> 
> Are there cases where you think this approach won't work/perform well enough for the indexer-backed queries clangd needs to support?
> 
I agree that we should try to keep the serialized symbol size minimal, but I think it's worthwhile to also store end locations for occurrences because 1) it's cheap, 2) it's not necessary easy to compute without parsing for occurrences like `a::b::c` or `a::b<X>`, 3) it would be useful for many LSP use cases.


================
Comment at: lib/Frontend/CompilerInstance.cpp:1176
+            new GenerateModuleFromModuleMapAction);
+        if (WrapGenModuleAction) {
+          Action = WrapGenModuleAction(FrontendOpts, std::move(Action));
----------------
nit: no braces around one liners.


================
Comment at: lib/FrontendTool/ExecuteCompilerInvocation.cpp:170
+    Act = index::createIndexDataRecordingAction(FEOpts, std::move(Act));
+    CI.setGenModuleActionWrapper(&index::createIndexDataRecordingAction);
+  }
----------------
Could you comment on what this does? The `Act` above is already wrapped. Why do we need `setGenModuleActionWrapper` to `createIndexDataRecordingAction` again? Also, `createIndexDataRecordingAction` doesn't seem related to `GenModule`.


================
Comment at: lib/Index/FileIndexRecord.cpp:23
+                                       ArrayRef<SymbolRelation> Relations) {
+  assert(D->isCanonicalDecl());
+
----------------
Why?


================
Comment at: lib/Index/FileIndexRecord.cpp:37
+
+  DeclOccurrence NewInfo(Roles, Offset, D, Relations);
+  auto It = std::upper_bound(Decls.begin(), Decls.end(), NewInfo);
----------------
Please comment when this would happen.


================
Comment at: lib/Index/FileIndexRecord.cpp:39
+  auto It = std::upper_bound(Decls.begin(), Decls.end(), NewInfo);
+  Decls.insert(It, std::move(NewInfo));
+}
----------------
Why do we need `Decls` to be sorted by offset? If we want this for printing, it might make sense to just do a sort there.


================
Comment at: lib/Index/FileIndexRecord.h:24
+
+class FileIndexRecord {
+public:
----------------
Please add documentation.


================
Comment at: lib/Index/FileIndexRecord.h:41
+
+    friend bool operator <(const DeclOccurrence &LHS,
+                           const DeclOccurrence &RHS) {
----------------
Is this clang-formatted? You might want to run git-clang-format on the whole patch.


================
Comment at: lib/Index/IndexingAction.cpp:284
+                     SourceManager &SourceMgr) :
+    IndexCtx(indexCtx), RecordOpts(recordOpts),
+    Includes(IncludesForFile), SourceMgr(SourceMgr) {}
----------------
Again, you don't need the full `IndexingContext` and `RecordOptions` here.


================
Comment at: lib/Index/IndexingAction.cpp:294
+
+    std::pair<FileID, unsigned> LocInfo =
+      SourceMgr.getDecomposedExpansionLoc(From);
----------------
Note that `getDecomposedExpansionLoc ` can also return invalid decomposed loc. 


================
Comment at: lib/Index/IndexingAction.cpp:308
+    if (!FE)
+      return;
+    auto lineNo = SourceMgr.getLineNumber(LocInfo.first, LocInfo.second);
----------------
Do we want better error handling here?


================
Comment at: lib/Index/IndexingAction.cpp:327
+
+class IndexDependencyProvider {
+public:
----------------
Please provide documentation.


================
Comment at: lib/Index/IndexingAction.cpp:504
+
+    CreatedASTConsumer = true;
+    std::vector<std::unique_ptr<ASTConsumer>> Consumers;
----------------
 Can we get this state from the base class instead of maintaining a another state, which seems to be identical?


================
Comment at: lib/Index/IndexingAction.cpp:524
+  std::string RepositoryPath = getClangRepositoryPath();
+  StringRef BuildNumber = StringRef(RepositoryPath);
+  size_t DashOffset = BuildNumber.find('-');
----------------
Just `StringRef BuildNumber = RepositoryPath;`



================
Comment at: lib/Index/IndexingAction.cpp:701
+namespace {
+class ModuleFileIndexDependencyCollector : public IndexDependencyProvider {
+  serialization::ModuleFile &ModFile;
----------------
Please provide a brief documentation for this class.


================
Comment at: lib/Index/IndexingAction.cpp:703
+  serialization::ModuleFile &ModFile;
+  RecordingOptions RecordOpts;
+
----------------
Again, it doesn't seem necessary for this class to have information about all record options. It seems that you only need `RecordSystemDependencies` here.


================
Comment at: lib/Index/IndexingAction.cpp:713
+      override {
+    auto Reader = CI.getModuleManager();
+    Reader->visitInputFiles(ModFile, RecordOpts.RecordSystemDependencies,
----------------
readability nit: avoid using `auto` if the return type is short to spell but hard to infer from the value expression.  Same else where.


================
Comment at: lib/Index/IndexingAction.cpp:762
+  HeaderSearch &HS = CI.getPreprocessor().getHeaderSearchInfo();
+  Module *UnitMod = HS.lookupModule(Mod.ModuleName, /*AllowSearch=*/false);
+
----------------
Could you add a comment explaining why we are not allowing searching.


================
Comment at: lib/Index/IndexingAction.cpp:769
+  IndexCtx.setSysrootPath(SysrootPath);
+  Recorder.init(&IndexCtx, CI);
+
----------------
It's a bit worrying that `IndexDataRecorder` and `IndexContext` reference each other. If you only need some information from the `IndexingContext`, simply pass it into `Recorder`. In this case, I think you only need the `SourceManager`  from the `ASTContext` in the recorder to calculate whether a file is a system header. I see you also cache result of `IndexingContext::isSystemFile` in the indexing context, but I think it would be more sensible for the callers to handle caching for this call.


================
Comment at: lib/Index/IndexingAction.cpp:771
+
+  for (const Decl *D : CI.getModuleManager()->getModuleFileLevelDecls(Mod)) {
+    IndexCtx.indexTopLevelDecl(D);
----------------
nit: no braces around one liners.


================
Comment at: lib/Index/IndexingAction.cpp:779
+                Mod.FileName, /*RootFile=*/nullptr, UnitMod, SysrootPath);
+
+}
----------------
nit: redundant empty line


================
Comment at: lib/Index/IndexingAction.cpp:837
+  index::RecordingOptions RecordOpts;
+  std::tie(IndexOpts, RecordOpts) = getIndexOptionsFromFrontendOptions(FEOpts);
+  return ::createIndexDataRecordingAction(IndexOpts, RecordOpts,
----------------
Just `auto pair = getIndexOptionsFromFrontendOptions(FEOpts);` and then use `pair.first` and `pair.second`?

Same below.


================
Comment at: lib/Index/IndexingContext.h:39
 
 class IndexingContext {
   IndexingOptions IndexOpts;
----------------
Please define the scope of this class to avoid throwing random states into it, which usually happens to a "context" class. 


https://reviews.llvm.org/D39050





More information about the cfe-commits mailing list