[PATCH] D152900: [clangd] Update symbol collector to use include-cleaner.

Kadir Cetinkaya via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Jul 10 07:29:39 PDT 2023


kadircet added inline comments.


================
Comment at: clang-tools-extra/clangd/ClangdServer.cpp:85
     auto Task = [FIndex(FIndex), Path(Path.str()), Version(Version.str()),
-                 ASTCtx(std::move(ASTCtx)),
-                 CanonIncludes(CanonIncludes)]() mutable {
+                 ASTCtx(std::move(ASTCtx)), PI]() mutable {
       trace::Span Tracer("PreambleIndexing");
----------------
nit: `PI(std::move(PI))`


================
Comment at: clang-tools-extra/clangd/TUScheduler.cpp:1086
         Callbacks.onPreambleAST(FileName, Inputs.Version, std::move(ASTCtx),
-                                CanonIncludes);
+                                PI);
       },
----------------
nit: s/PI/std::move(PI)


================
Comment at: clang-tools-extra/clangd/index/CanonicalIncludes.cpp:672
 
 llvm::StringRef CanonicalIncludes::mapHeader(FileEntryRef Header) const {
   if (!StdSuffixHeaderMapping)
----------------
we're no longer using any `FileEntry` pieces of this parameter you can take in just a `llvm::StringRef HeaderPath` instead.


================
Comment at: clang-tools-extra/clangd/index/CanonicalIncludes.h:40
   /// Adds a file-to-string mapping from \p ID to \p CanonicalPath.
   void addMapping(FileEntryRef Header, llvm::StringRef CanonicalPath);
 
----------------
VitaNuo wrote:
> kadircet wrote:
> > AFAICT, only users of this endpoint were in `IWYUCommentHandler` (and some tests). we can also get rid of this one now.
> > 
> > More importantly now this turns into a mapper used only by symbolcollector, that can be cheaply created at use time (we just copy around a pointer every time we want to create it). so you can drop `CanonicalIncludes` from all of the interfaces, including `SymbolCollector::Options` and create one on demand in `HeaderFileURICache`. all we need is langopts, and it's available through preprocessor (not necessarily on construction, but at the time when we want to do the mappings).
> > 
> > As you're already touching all of the interfaces that propagate `CanonicalIncludes` around, hopefully this change should only make things simpler (and you already need to touch them when you rebase), but if that feels like too much churn in this patch feel free to do that in a follow up as well.
> > ... and create one on demand in HeaderFileURICache. all we need is langopts, and it's available through preprocessor (not necessarily on construction, but at the time when we want to do the mappings
> 
> This time sounds a little late, since we don't want to rebuild the system header mapping every time we request a mapping for a particular header. 
> Cache construction time doesn't work, since it happens before the preprocessor is set in the symbol collector (and we need the preprocessor to retrieve language options).
> So far the safe place I've found is right after the preprocessor is set in the symbol collector. Another option is to collect the system header options in the finish() method, but I can't see why we would need to delay until then. 
> This time sounds a little late, since we don't want to rebuild the system header mapping every time we request a mapping for a particular header.

System header mapping is a singleton, we build it statically once on first request throughout the whole process and just use that for the rest of the execution.


================
Comment at: clang-tools-extra/clangd/index/FileIndex.cpp:49
+    const MainFileMacros *MacroRefsToIndex,
+    std::shared_ptr<const include_cleaner::PragmaIncludes> PI,
+    bool IsIndexMainAST, llvm::StringRef Version, bool CollectMainFileRefs) {
----------------
again you can have a `const include_cleaner::PragmaIncludes &` without `shared_ptr` here.


================
Comment at: clang-tools-extra/clangd/index/FileIndex.h:118
+                 Preprocessor &PP,
+                 std::shared_ptr<const include_cleaner::PragmaIncludes> PI);
   void updatePreamble(IndexFileIn);
----------------
you can have a `const include_cleaner::PragmaIncludes& PI` directly both in here and in `indexHeaderSymbols`.

after building a preamble we always have a non-null PragmaIncludes and these functions are always executed synchronously, so no need to extend ownership and make the code indirect.


================
Comment at: clang-tools-extra/clangd/index/IndexAction.cpp:230
   }
-  auto Includes = std::make_unique<CanonicalIncludes>();
-  Opts.Includes = Includes.get();
-  return std::make_unique<IndexAction>(
-      std::make_shared<SymbolCollector>(std::move(Opts)), std::move(Includes),
-      IndexOpts, SymbolsCallback, RefsCallback, RelationsCallback,
-      IncludeGraphCallback);
+  auto PragmaIncludes = std::make_shared<include_cleaner::PragmaIncludes>();
+  Opts.PragmaIncludes = PragmaIncludes;
----------------
you can have a unique_ptr here instead (once we store a non-owning pointer in `Opts.PragmaIncludes`)


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:407
+
+    if (auto Verbatim = PI->getPublic(SM.getFileEntryForID(FID));
+        !Verbatim.empty())
----------------
s/SM.getFileEntryForID(FID)/FE


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:793
+      S, DefLoc,
+      include_cleaner::Macro{const_cast<IdentifierInfo *>(Name), DefLoc});
   Symbols.insert(S);
----------------
you can actually change `include_cleaner::Macro` to have a `const IdentifierInfo*` instead of the `const_cast` here. it's mostly an oversight to store a non-const `IdentifierInfo` in the first place.


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:839
+  if (!Headers.empty())
+    SymbolProviders.insert({S.ID, Headers[0]});
 }
----------------
once we have the optional<Header> as the value type you can also rewrite this as:
```
auto [It, Inserted] = SymbolProviders.try_emplace(S.ID);
if (Inserted) {
  auto Providers = include_cleaner::headersForSymbol(Sym, SM, Opts.PragmaIncludes.get());
  if(!Providers.empty()) It->second = Providers.front();
} 
```

that way we can get rid of some redundant calls to `headersForSymbol`


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:895
+          !IncludeHeader.empty()) {
+        NewSym.IncludeHeaders.push_back({IncludeHeader, 1, Directives});
+        Symbols.insert(NewSym);
----------------
you can move the copy into this branch to make sure we're only doing that when necessary, e.g:
```
if(auto IncludeHeader = ...) {
  Symbol NewSym = *S;
  NewSym.IncludeHeaders...;
  Symbols.insert(NewSym);
}
```


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:907
+    auto It = SymbolProviders.find(SID);
+    if (It == SymbolProviders.end())
+      continue;
----------------
FWIW, i think we should `assert(It != SymbolProviders.end())` (or `IncludeFiles.end()` when you change the outer loop). as we're inserting into these maps simultaneously.


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:910
+    const auto &H = It->second;
+    llvm::DenseMap<include_cleaner::Header, std::string> HeaderSpelling;
+    const auto [SpellingIt, Inserted] = HeaderSpelling.try_emplace(H);
----------------
we're not really getting any caching benefits as this map is inside the loop. can you put it outside the loop body?


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:913
+    if (!Inserted)
+      continue;
+    switch (H.kind()) {
----------------
we shouldn't `continue` here, it just means we can directly use `SpellingIt` to figure out what to put into `NewSym.IncludeHeaders`.

maybe something like:

```
auto [SpellingIt, Inserted] = HeaderSpelling.try_emplace(H);
if (Inserted)
  SpellingIt-> second = getSpelling(H, *PP);
if(!SpellingIt->second.empty()) {
  Symbol NewSym = *S;
  NewSym.IncludeHeaders.push_back({SpellingIt->second, 1, Directives});
  Symbols.insert(NewSym);
}
```

```
std::string getSpelling(const include_cleaner::Header &H, const Preprocessor &PP) {
  if(H.kind() == Physical) {
   // Check if we have a mapping for the header in system includes.
   // FIXME: get rid of this once stdlib mapping covers these system headers too.
   CanonicalIncludes CI;
   CI.addSystemHeadersMapping(PP.getLangOpts());
    if (auto Canonical = CI.mapHeader(H.physical()->getLastRef()); !Canonical.empty())
        return Canonical;
    if (!tooling::isSelfContainedHeader(...))
        return "";
  }
  // Otherwise use default spelling strategies in include_cleaner.
  return include_cleaner::spellHeader(H);
}
```


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:250
+  // of whether it's an otherwise-good header (header guards etc).
+  llvm::StringRef mapCanonical(FileID FID) {
+    if (SysHeaderMapping) {
----------------
VitaNuo wrote:
> kadircet wrote:
> > let's change the interface here to take in a `FileEntry` instead.
> I believe it has to be `FileEntryRef` instead. This is what the `CanonicalIncludes::mapHeader` expects, and this is also seems right (somewhere in the Clang docs I've read "everything that needs a name should use `FileEntryRef`", and name is exactly what `CanonicalIncludes::mapHeader` needs). 
> 
> For the case of a physical header (i.e., `FileEntry` instance) I've switched to using `getLastRef` now. There's also another API `FileManager::getFileRef` where I could try passing the result of `tryGetRealPathName`. I am not sure I have enough information to judge if any of these options is distinctly better.
yeah `getLastRef` is fine, i was mostly trying to avoid dodgy call to `getOrCreateFileID`.

you can also pass in just a `llvm::StringRef HeaderPath` now, as we're not using anything but file path. which you can retreive with `getName` from both FileEntry and FileEntryRef


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:849
+      shouldCollectIncludePath(S.SymInfo.Kind) != Symbol::Invalid)
+    SymbolProviders[S.ID] = std::move(Headers);
+}
----------------
VitaNuo wrote:
> kadircet wrote:
> > because you're taking in `Headers` as `const`, this move is not actually moving.
> Ok, this is not relevant anymore, but I am not sure I understand why. Am I not allowed to say that I don't need this variable/memory anymore, even though it's `const`?
`std::move` is unfortunately a little bit misleading at times. it is just a `cast` that returns an rvalue reference. hence on its own it doesn't actually `move` anything.

but it enables picking rvalue versions of constructors/assignment operators. (e.g. `Foo(Foo&&)` or `operator=(Foo&&)`) which can be implemented efficiently with the assumption that RHS object is being "moved from" and will no longer be used. that way certain types implement a move semantics by just consuming the RHS object. unfortunately that is not possbile when RHS is `const`, and expressions use regular copy semantics instead.


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:877
   // We delay this until end of TU so header guards are all resolved.
   for (const auto &[SID, FID] : IncludeFiles) {
     if (const Symbol *S = Symbols.find(SID)) {
----------------
VitaNuo wrote:
> kadircet wrote:
> > let's iterate over `SymbolProviders` instead, as `IncludeFiles` is a legacy struct now.
> Not sure I'm following. Iterating over `SymbolProviders` means retrieving `include_cleaner::Header`s. How would I handle the entire Obj-C part then, without the FID of the include header? 
we're joining `IncludeFiles` and `SymbolProviders`, as they have the same keys.

right now you're iterating over the keys in `IncludeFiles` and doing a lookup into `SymbolProviders` using that key to get providers.
we can also do the iteration over `SymbolProviders` and do the lookup into `IncludeFiles` instead to find associated `FID`.


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.h:64
+    /// different #include path specified by IWYU pragmas.
+    std::shared_ptr<const include_cleaner::PragmaIncludes> PragmaIncludes =
+        nullptr;
----------------
no need to have ownership here, we can have just a `const include_cleaner::PragmaIncludes* PI`. SymbolCollector shouldn't need to extend lifetimes


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.h:130
     this->PP = PP.get();
+    SysHeaderMapping.addSystemHeadersMapping(PP->getLangOpts());
+  }
----------------
as mentioned above `addSystemHeadersMapping` is cheap after the first call in the process, we initialize the mappings only once. so you can just have it as a member in `HeaderFileURICache`, and drop it completely from the public interface of `SymbolCollector`


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.h:186
+  // The final spelling is calculated in finish().
+  llvm::DenseMap<SymbolID, include_cleaner::Header> SymbolProviders;
+
----------------
can we have a `llvm::DenseMap<SymbolID, std::optional<include_cleaner::Header>>`, to indicate there are no providers for a symbol, rather than missing the symbol completely from the mapping?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152900/new/

https://reviews.llvm.org/D152900



More information about the cfe-commits mailing list