[PATCH] D152900: [clangd] Update symbol collector to use include-cleaner.

Kadir Cetinkaya via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Jun 22 05:47:07 PDT 2023


kadircet added inline comments.


================
Comment at: clang-tools-extra/clangd/ParsedAST.cpp:628
     CanonIncludes.addSystemHeadersMapping(Clang->getLangOpts());
-  std::unique_ptr<CommentHandler> IWYUHandler =
-      collectIWYUHeaderMaps(&CanonIncludes);
----------------
can you also add a FIXME here saying that we should attach PragmaInclude callbacks to main file ast build too?

this is not a recent regression as we were not doing it properly in previous version either. IWYU handler we attach only cares about private pragmas, which don't mean much inside the main file. but we actually need it for other pragmas like keep/export/no_include etc. (no need to do it in this patch as it's already getting quite big, let's do that as a follow up)


================
Comment at: clang-tools-extra/clangd/Preamble.h:83
 
-using PreambleParsedCallback = std::function<void(ASTContext &, Preprocessor &,
-                                                  const CanonicalIncludes &)>;
+using PreambleParsedCallback =
+    std::function<void(ASTContext &, Preprocessor &, const CanonicalIncludes &,
----------------
you need to rebase as we had some big changes around this logic recently. we should pass the `PragmaIncludes` also as a `shared_ptr` (and store as one inside the preambledata)


================
Comment at: clang-tools-extra/clangd/index/CanonicalIncludes.h:40
   /// Adds a file-to-string mapping from \p ID to \p CanonicalPath.
   void addMapping(FileEntryRef Header, llvm::StringRef CanonicalPath);
 
----------------
AFAICT, only users of this endpoint were in `IWYUCommentHandler` (and some tests). we can also get rid of this one now.

More importantly now this turns into a mapper used only by symbolcollector, that can be cheaply created at use time (we just copy around a pointer every time we want to create it). so you can drop `CanonicalIncludes` from all of the interfaces, including `SymbolCollector::Options` and create one on demand in `HeaderFileURICache`. all we need is langopts, and it's available through preprocessor (not necessarily on construction, but at the time when we want to do the mappings).

As you're already touching all of the interfaces that propagate `CanonicalIncludes` around, hopefully this change should only make things simpler (and you already need to touch them when you rebase), but if that feels like too much churn in this patch feel free to do that in a follow up as well.


================
Comment at: clang-tools-extra/clangd/index/CanonicalIncludes.h:43
   /// Returns the overridden include for files in \p Header, or "".
   llvm::StringRef mapHeader(FileEntryRef Header) const;
 
----------------
let's mention that this will always return a verbatim spelling, e.g. with quotes or angles.


================
Comment at: clang-tools-extra/clangd/index/CanonicalIncludes.h:57
   /// A map from the private header to a canonical include path.
   llvm::DenseMap<llvm::sys::fs::UniqueID, std::string> FullPathMapping;
   /// A map from a suffix (one or components of a path) to a canonical path.
----------------
you can drop this one too, only `addMapping` would introduce mappings here, but that's going away too.


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:250
+  // of whether it's an otherwise-good header (header guards etc).
+  llvm::StringRef mapCanonical(FileID FID) {
+    if (SysHeaderMapping) {
----------------
let's change the interface here to take in a `FileEntry` instead.


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:251
+  llvm::StringRef mapCanonical(FileID FID) {
+    if (SysHeaderMapping) {
+      llvm::StringRef Canonical =
----------------
nit: early exits

```
if (!SysHeaderMapping)
  return "";
auto Canonical = ...;
if(Canonical.empty())
  return "";
...
```


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:258
+          return Canonical;
+        return toURI(Canonical);
+      }
----------------
we should never hit this code path anymore. so feel free to convert the `if` above to an `assert`


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:412-413
+
+    if (auto Verbatim = PI->getPublic(FileEntry); !Verbatim.empty())
+      return Verbatim;
+
----------------
let's move this logic into `mapCanonical` too.


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:415
+
+    auto Canonical = mapCanonical(FID);
+    if (!Canonical.empty())
----------------
nit: `if (auto Canonical = ...; !Canonical.empty()) return Canonical;`


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:846
+void SymbolCollector::setSymbolProviders(
+    const Symbol &S, const llvm::SmallVector<include_cleaner::Header> Headers) {
+  if (Opts.CollectIncludePath &&
----------------
what about merging this one and `setIncludeLocation` ?

e.g:
```
void SymbolCollector::setIncludeLocation(const Symbol &IdxS, SourceLocation DefLoc, const include_cleaner::Symbol &Sym) {
if (!Opts.CollectIncludePath ||
      shouldCollectIncludePath(S.SymInfo.Kind) == Symbol::Invalid)
  return;
IncludeFiles[SID] = ...;
SymbolProviders[SID] = headersForSymbol(...);
}
```


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:849
+      shouldCollectIncludePath(S.SymInfo.Kind) != Symbol::Invalid)
+    SymbolProviders[S.ID] = std::move(Headers);
+}
----------------
because you're taking in `Headers` as `const`, this move is not actually moving.


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:877
   // We delay this until end of TU so header guards are all resolved.
   for (const auto &[SID, FID] : IncludeFiles) {
     if (const Symbol *S = Symbols.find(SID)) {
----------------
let's iterate over `SymbolProviders` instead, as `IncludeFiles` is a legacy struct now.


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:878
   for (const auto &[SID, FID] : IncludeFiles) {
     if (const Symbol *S = Symbols.find(SID)) {
+      // Determine if the FID is #include'd or #import'ed.
----------------
inside of this branch is getting crowded, maybe early exit instead?
```
const Symbol *S = Symbols.find(SID);
if(!S) continue;
...
```


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:908
+              {HeaderFileURIs->getIncludeHeader(FID), 1, Directives});
+        Symbols.insert(NewSym);
+        continue;
----------------
no need to call insert if we didn't modify `IncludeHeaders`, so maybe reflow to something like:
```
if (Directives & Import) {
  if(auto IncHeader = ..; !IncHeader.empty()) {
     auto NewSym = *S;
     NewSym.IncludeHeaders....;
     Symbols.insert(NewSym);
  }
  // FIXME: Use providers from include-cleaner once it's polished for ObjC too.
  continue;
}
```


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:914
+      // library.
+      if (Directives == Symbol::Include) {
+        auto It = SymbolProviders.find(SID);
----------------
s/if/assert


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:930
+                auto Canonical =
+                    HeaderFileURIs->mapCanonical(SM.getOrCreateFileID(
+                        H.physical(), SrcMgr::CharacteristicKind::C_User));
----------------
no need for the `getOrCreateFileID` after you change parameter for `mapCanonical` to be a file entry.


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:932
+                        H.physical(), SrcMgr::CharacteristicKind::C_User));
+                if (!Canonical.empty())
+                  SpellingIt->second = Canonical;
----------------
nit:
```
if(auto Canonical = mapCanonical(H.physical()); !Canonical.empty())
  SpellingIt->second = Canonical;
```


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:934
+                  SpellingIt->second = Canonical;
+                if (!tooling::isSelfContainedHeader(H.physical(),
+                                                    PP->getSourceManager(),
----------------
this should be `else if`. if we performed a mapping, we no longer care about self-containedness of original header.


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:935
+                if (!tooling::isSelfContainedHeader(H.physical(),
+                                                    PP->getSourceManager(),
+                                                    PP->getHeaderSearchInfo()))
----------------
s/PP->getSourceManager()/SM


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:938
+                  break;
+                if (SpellingIt->second.empty())
+                  SpellingIt->second =
----------------
s/if (SpellingIt->second.empty())/else


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.cpp:393-395
+    const auto *FileEntry = SM.getFileEntryForID(FID);
+    for (const auto *Export : PI.getExporters(FileEntry, SM.getFileManager()))
+      return toURI(Export->tryGetRealPathName());
----------------
VitaNuo wrote:
> kadircet wrote:
> > sorry i don't understand what this logic is trying to achieve.
> > 
> > we seem to be prefering "first" exporting header, over the headers that directly provide the symbol. Also comment says it's done for objc, but there's nothing limiting this logic to ObjC. any chance this is unintended?
> No, this was intended, but possibly I didn't get it right.
> 
> >  comment says it's done for objc, but there's nothing limiting this logic to objc
> 
> AFAIU this whole code path will now only be reachable for objc symbols.  I have removed pragma includes from the canonical include mapping now, leaving the pragma handling to include cleaner. However, that only triggers for C/C++, so in case we need the `export` and `private` pragmas for objc, we need to re-insert the handling for them somewhere. 
> There is no pre-existing test case for pragmas in objc and there is no usage of these pragmas for objc code in the code base either,  so I have no way of knowing if this is actually needed. But I believe you've mentioned in some discussion we should still handle the pragmas for objc.
> 
> > we seem to be preferring "first" exporting header, over the headers that directly provide the symbol.
> 
> Isn't that correct if there's an `IWYU export` pragma involved? The snippet comes from `include_cleaner::findHeaders`, with the only difference that it stops at the first exporter (since the canonical include mapping also just stored one mapping for the header).
> 
> Let me know how to do it better, or maybe if this is necessary at all. Honestly, I am not sure about this, since, as mentioned, there are no `export` or `private/public` pragmas in objc files in the codebase atm.
> 
existing `IWYUCommentHandler` only handled private pragmas, so export pragmas were already dropped on the floor, we shouldn't introduce handling for them in here now. the mappings you perform for private/public pragmas and system headers below is enough.

(P.S. using exporters as "alternatives" vs "sole provider" are quite different, policy in include_cleaner does the former whereas logic in here does the latter)


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.h:175
   SymbolSlab::Builder Symbols;
-  // File IDs for Symbol.IncludeHeaders.
-  // The final spelling is calculated in finish().
+  // File IDs to distinguish imports from includes.
   llvm::DenseMap<SymbolID, FileID> IncludeFiles;
----------------
we seem to be populating this map for all kinds of symbols, as long as `shouldCollectIncludePath` returns non-invalid. what's up with distinguishing `imports from includes`?


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.h:186
+  // (i.e., header spellings in this case).
+  llvm::DenseMap<include_cleaner::Header, std::string> HeaderSpelling;
   // Files which contain ObjC symbols.
----------------
no need to have this in the class state, you can just inline this into `::finish` method instead.


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.h:192
+  void
+  setSymbolProviders(const Symbol &S,
+                     const llvm::SmallVector<include_cleaner::Header> Headers);
----------------
let's move this closer to the `SymbolProviders` map


================
Comment at: clang-tools-extra/clangd/index/SymbolCollector.h:205
   Preprocessor *PP = nullptr;
+  const include_cleaner::PragmaIncludes *PI;
   std::shared_ptr<GlobalCodeCompletionAllocator> CompletionAllocator;
----------------
we already store this in `Opts`, can we just use it from there instead of creating an extra name for it?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152900/new/

https://reviews.llvm.org/D152900



More information about the cfe-commits mailing list