[PATCH] D39050: Add index-while-building support to Clang

Eric Liu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Dec 19 15:01:28 PST 2017


ioeric added a comment.

Thanks a lot for further cleaning up the patch! It is now much easier to review. I really appreciate it!

Some more comments on the public APIs and the layering of classes. There are a lot of helper classes in the implementation, so I think it's important to get a clear layering so that they could be easily understood by future contributors :)

Also, with the `IndexUnitDataConsumer` abstraction, it seems to be feasible now to add some unit tests for `createUnitIndexingAction`. With all the recent major changes,  I think it's important that we have some degree of testing to make sure components actually work together.



================
Comment at: include/clang/Frontend/CompilerInstance.h:187
+  typedef std::function<std::unique_ptr<FrontendAction>(
+      const FrontendOptions &opts, std::unique_ptr<FrontendAction> action)>
+      ActionWrapperTy;
----------------
ioeric wrote:
> nit: LLVM variable names start with upper-case letters.
`opts` and `action` are still lower-case.


================
Comment at: include/clang/Index/DeclOccurrence.h:38
+    }
+  };
+
----------------
Nit: indentation.

Tip: `git-clang-format` against the diff base can format all changed lines in your patch.


================
Comment at: include/clang/Index/IndexUnitDataConsumer.h:1
+//===--- IndexUnitDataConsumer.h - Abstract index unit data consumer ---------------===//
+//
----------------
IIUC, this is the index data for a translation unit, as opposed to an AST. If so, consider calling this `UnitIndexDataConsumer` to match `(AST)IndexDataConsumer` which is parallel to this. We might want to rename them to be either `index::UnitDataConsumer` vs `index::ASTDataConsumer` or `index::UnitIndexDataConsumer` vs `index::ASTIndexDataConsumer` . I am inclined to the first pair as `index` is already implied in the namespace.


================
Comment at: include/clang/Index/IndexUnitDataConsumer.h:67
+private:
+  virtual void _anchor();
+};
----------------
Comment? Why do we actually need this?


================
Comment at: include/clang/Index/IndexingAction.h:49
 
+struct UnitIndexingOptions : IndexingOptions {
+  enum class FileIncludeFilterKind {
----------------
We are now mixing functionalities for Unit indexing and AST indexing actions in the same file. We might want to separate these into two headers e..g `UnitIndexingAction.h` and `ASTIndexingAction.h`. This would make it easier for users to find the right functions :)


================
Comment at: include/clang/Index/IndexingAction.h:66
+/// Information on the translation unit
+struct UnitDetails {
+  Module *UnitModule;
----------------
Please add documentation for each field. It's not trivial what each field is for, especially some fields seem to be optional and some seem to be mutually exclusive.


================
Comment at: include/clang/Index/IndexingAction.h:67
+struct UnitDetails {
+  Module *UnitModule;
+  std::string ModuleName;
----------------
These pointers suggest the life time of this struct is tied to some other struct, which makes the struct look a bit dangerous to use. Should we also carry a reference or a smart pointer to the underlying object that keeps these pointers valid? Would it be a `CompilerInstance ` (guessing from `IndexUnitDataConsumerFactory `)? 


================
Comment at: include/clang/Index/IndexingAction.h:114
+std::unique_ptr<FrontendAction>
+createUnitIndexingAction(const UnitIndexingOptions &IndexOpts,
+                         IndexUnitDataConsumerFactory ConsumerFactory,
----------------
 What is the intended user of this function? It's unclear how users could obtain a `ConsumerFactory ` (i.e. `UnitDetails`) without the functionalities in `UnitDataConsumerActionImpl `.

(Also see comment in the implementation of `createIndexDataRecordingAction`.)


================
Comment at: include/clang/Index/IndexingAction.h:128
+RecordingOptions
+getRecordingOptionsFromFrontendOptions(const FrontendOptions &FEOpts);
+
----------------
This is likely only useful for compiler invocation. I would put it in the compiler invocation code.


================
Comment at: lib/Driver/Job.cpp:211
 
+/// The leftover modules from the crash are stored in
+///  <name>.cache/vfs/modules
----------------
nit: Comment should start with an overview of what the function does.

```
Returns a directory path that is ...
```

Also, consider calling this `getDirAdjacentToModCache`. `buildDir` can be ambiguous.


================
Comment at: lib/Driver/Job.cpp:220
+  llvm::SmallString<128> RelModCacheDir = llvm::sys::path::parent_path(
+  llvm::sys::path::parent_path(CrashInfo->VFSPath));
+  llvm::sys::path::append(RelModCacheDir, DirName);
----------------
Please clang-format the code. Without indentation, this looks like an no-op statement.


================
Comment at: lib/Index/IndexingAction.cpp:88
+/// \c WrappingIndexAction frontend actions.
+struct IndexActionImpl {
+  virtual ~IndexActionImpl() = default;
----------------
Use `class` for interfaces.


================
Comment at: lib/Index/IndexingAction.cpp:99
+  /// Callback at the end of processing a single input.
+  virtual void finish(CompilerInstance &CI) = 0;
 };
----------------
Does `CI` here have to be the same instance as the one in `createIndexASTConsumer `? Might worth documenting.


================
Comment at: lib/Index/IndexingAction.cpp:137
+
+    CreatedASTConsumer = true;
+    std::vector<std::unique_ptr<ASTConsumer>> Consumers;
----------------
nit: Move this after `Impl->createIndexASTConsumer(CI)`.

Do we need to reset this flag? Calling `CreateASTConsumer ` multiple times on the same instance seems to be allowed? 


================
Comment at: lib/Index/IndexingAction.cpp:243
+/// Collects and groups consumed index data by \c FileID.
+class IndexDataCollector : public IndexDataConsumer {
+  Preprocessor *PP = nullptr;
----------------
This seems to be related to files. Maybe `FileIndexDataCollector`?


================
Comment at: lib/Index/IndexingAction.cpp:250
+public:
+  void setPreprocessor(Preprocessor &PreProc) {
+    PP = &PreProc;
----------------
`override`


================
Comment at: lib/Index/IndexingAction.cpp:254
+
+  IndexDataByFileTy::const_iterator by_file_begin() const {
+    return IndexDataByFile.begin();
----------------
Simply `begin`, if the class is called `FileIndexDataCollector `.

Similar below to match iterator naming convention.


================
Comment at: lib/Index/IndexingAction.cpp:265
+private:
+  bool handleDeclOccurence(const Decl *D, SymbolRoleSet Roles,
+                           ArrayRef<SymbolRelation> Relations, FileID FID,
----------------
I think this should be `public` as this is still implementing `IndexDataConsumer`.


================
Comment at: lib/Index/IndexingAction.cpp:323
+      llvm_unreachable("should have already checked in the beginning");
+    case UnitIndexingOptions::FileIncludeFilterKind::UserOnly:
+      if (SystemCache.isSystem(LocInfo.first, SourceMgr))
----------------
I'd simply do: 
```
if FileIncludeFilter  == UnitIndexingOptions::FileIncludeFilterKind::UserOnly)
  if (isSystem...)
    return;
```


================
Comment at: lib/Index/IndexingAction.cpp:337
+
+  virtual void InclusionDirective(SourceLocation HashLoc,
+                                  const Token &IncludeTok, StringRef FileName,
----------------
Same here. This should be `public`


================
Comment at: lib/Index/IndexingAction.cpp:354
+
+  virtual void visitFileDependencies(
+      const CompilerInstance &CI,
----------------
The naming convention for the callback interfaces is `forEach*` e.g. `forEachFileDependency`.

s/visitor/Callback/
(same below).


================
Comment at: lib/Index/IndexingAction.cpp:358
+  virtual void
+  visitIncludes(llvm::function_ref<void(const FileEntry *Source, unsigned Line,
+                                        const FileEntry *Target)>
----------------
`forEachInclude`


================
Comment at: lib/Index/IndexingAction.cpp:361
+                    visitor) const = 0;
+  virtual void visitModuleImports(
+      const CompilerInstance &CI,
----------------
`forEachModuleImport`


================
Comment at: lib/Index/IndexingAction.cpp:369
+/// file to file inclusions, for the source files in a translation unit
+class SourceFilesIndexDependencyCollector : public DependencyCollector,
+                                            public IndexDependencyProvider {
----------------
This is two classes in one, which is difficult to understand. Could you split it into `FileIndexDependencyCollector ` and `FileIndexDependencyProvider` and have `FileIndexDependencyCollector` returns a provider on finish (e.g. `Provider consume();`; you might want to copy/move the collected data into the provider). It would be easier to justify the behavior (e.g. what happens when you access the provider while collector is still working?)


================
Comment at: lib/Index/IndexingAction.cpp:373
+  UnitIndexingOptions IndexOpts;
+  llvm::SetVector<const FileEntry *> Entries;
+  llvm::BitVector IsSystemByUID;
----------------
What does `Entries` contain? What files are added?


================
Comment at: lib/Index/IndexingAction.cpp:501
+  /// IndexUnitDataConsumer constructed from the \c UnitConsumerFactory if the
+  /// \c ParentUnitConsumer indicates \c Mod should be indexed.
+  ///
----------------
Instead of passing `ParentUnitConsumer`, consider checking the `Mod` before calling the function.


================
Comment at: lib/Index/IndexingAction.cpp:504
+  /// \returns true if \c Mod was indexed
+  static bool indexModule(
+      const CompilerInstance &CI, serialization::ModuleFile &Mod,
----------------
Non-factory static method is often a code smell. Any reason not to make these static methods private members? With that, you wouldn't need to pass along so many parameters. You could make them `const` if you don't want members to be modified.


================
Comment at: lib/Index/IndexingAction.cpp:511
+  /// Get unit details for the given module file
+  static UnitDetails getUnitDetails(serialization::ModuleFile &Mod,
+                                    const CompilerInstance &CI,
----------------
Why is this overload public while others are private? Aren't they all used only in this class?


================
Comment at: lib/Index/IndexingAction.cpp:531
+};
+} // anonymous namespace
+
----------------
Any reason to close the anonymous namespace here? Shouldn't outlined definitions of `UnitDataConsumerActionImpl`'s methods also in the anonymous namespace? 


================
Comment at: lib/Index/IndexingAction.cpp:758
+
+  class IndexUnitDataRecorder : public IndexUnitDataConsumer {
+  public:
----------------
I think the inheritance of `IndexUnitDataConsumer`  and the creation of factory should be in user code (e.g. implementation for on-disk persist-index-data should come from the compiler invocation code `ExecuteCompilerInvocation.cpp` or at least a separate file in the library that compiler invocation can use), and the user should only use `createUnitIndexingAction` by providing a factory.  Currently, `createUnitIndexingAction` and `createIndexDataRecordingAction` are mostly identical except for the code that implements `IndexUnitDataConsumer ` and creates the factory.

The current `createIndexDataRecordingAction` would probably only used by the compiler invocation, and we can keep the generalized `createUnitIndexingAction` in the public APIs.


================
Comment at: lib/Index/IndexingAction.cpp:765
+  auto ConsumerFactory =
+      [&](const CompilerInstance &CI, UnitDetails UnitInfo) ->
+          std::unique_ptr<IndexUnitDataConsumer> {
----------------
The `UnitInfo` is ignored? What do we actually need it for?


================
Comment at: lib/Index/IndexingAction.cpp:769
+      };
+  auto Base = llvm::make_unique<UnitDataConsumerActionImpl>(
+      std::move(RecordOpts), ConsumerFactory);
----------------
`Base` doesn't seem to be a very meaningful name here.


https://reviews.llvm.org/D39050





More information about the cfe-commits mailing list