[PATCH] D143496: [clangd] Add support for missing includes analysis.

Kadir Cetinkaya via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Feb 21 06:48:41 PST 2023


kadircet added inline comments.


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:359
+    TransformedInc.Angled = WrittenRef.starts_with("<");
+    auto FE = SM.getFileManager().getFile(Inc.Resolved);
+    if (!FE)
----------------
unfortunately `getFile` returns an `llvm::Expected` which requires explicit error handling (or it'll trigger a crash). you can simply `elog` the issue:
```
if (!FE) {
  elog("IncludeCleaner: Failed to get an entry for resolved path {0}: {1}", Inc.Resolved, FE.takeError());
  continue;
}
```


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:519
+std::vector<include_cleaner::SymbolReference>
+collectMacroReferences(ParsedAST &AST) {
+  const auto &SM = AST.getSourceManager();
----------------
this should also be either static or put into anon namespace


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:554
+
+std::vector<Diag> generateMissingIncludeDiagnostics(
+    ParsedAST &AST, llvm::SmallVector<MissingIncludeDiagInfo> MissingIncludes,
----------------
`generateMissingIncludeDiagnostics` should also be either static or put into anon namespace


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:555
+std::vector<Diag> generateMissingIncludeDiagnostics(
+    ParsedAST &AST, llvm::SmallVector<MissingIncludeDiagInfo> MissingIncludes,
+    llvm::StringRef Code) {
----------------
i think it's better to use an `llvm::ArrayRef` instead of `llvm::SmallVector` for diag infos here, we don't need to copy.


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:563
+  for (const auto &SymbolWithMissingInclude : MissingIncludes) {
+    std::string Spelling =
+        spellHeader(AST, MainFile, SymbolWithMissingInclude.Providers.front());
----------------
we've got `Config::Diagnostics::Includes::IgnoreHeader` that disables include-cleaner analysis on headers that match a pattern. we should be respecting those config options here too.


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:582
+        HeaderRef.trim("\"<>"), Angled, tooling::IncludeDirective::Include);
+    if (!Replacement.has_value())
       continue;
----------------
could you add a comment here, as this is subtle, something like `We might suggest insertion of an existing include in edge cases, e.g. include is present in a PP-disabled region, or spelling of the header turns out to be the same as one of the unresolved includes in the main file`


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:589
+    D.InsideMainFile = true;
+    Result.push_back(D);
+  }
----------------
nit: `Result.push_back(std::move(D))`, as `D` has lots of strings in it, it might be expensive to copy it around. Even better if you construct the Diag in place via `Diag &D = Result.emplace_back()`, you can achieve this by moving all the logic that might bail out (e.g. replacement generation) to the top of the loop, and start forming the diagnostic only after we're sure that there'll be one.


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:595
+std::vector<Diag>
+generateUnusedIncludeDiagnostics(PathRef FileName,
+                                 std::vector<const Inclusion *> UnusedIncludes,
----------------
this also needs to be static or put into anon namespace


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:596
+generateUnusedIncludeDiagnostics(PathRef FileName,
+                                 std::vector<const Inclusion *> UnusedIncludes,
+                                 llvm::StringRef Code) {
----------------
no need to copy the vector by taking a `std::vector` here, you can take an `llvm::ArrayRef` instead.


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:623
     D.InsideMainFile = true;
-    Result.push_back(std::move(D));
+    Result.push_back(D);
+  }
----------------
can you restore `std::move`?


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:632
+  include_cleaner::Includes ConvertedIncludes =
+      convertIncludes(AST.getSourceManager(), Includes.MainFileIncludes);
+  const FileEntry *MainFile = SM.getFileEntryForID(SM.getMainFileID());
----------------
s/AST.getSourceManager()/SM


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:639
+  llvm::DenseSet<IncludeStructure::HeaderID> Used;
+  include_cleaner::walkUsed(
+      AST.getLocalTopLevelDecls(), /*MacroRefs=*/Macros,
----------------
can you introduce a `trace::Span` wrapping the call to `walkUsed` with name `IncludeCleanerAnalysis` so that we can collect some stats about latency here?


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:651
+          }
+          for (auto *&Inc : ConvertedIncludes.match(H)) {
+            Satisfied = true;
----------------
nit: drop either `*` or `&` (preferably `&`), having a reference vs a pointer doesn't make any differences performance wise, but creates a confusion (as we don't realy need a reference to a pointer here)


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:661
+        const syntax::TokenBuffer &Tokens = AST.getTokens();
+        if (!Satisfied && !Providers.empty() &&
+            Ref.RT == include_cleaner::RefType::Explicit) {
----------------
nit: we prefer `early exits` to extra `nesting`, e.g. rewriting this as:
```
if (Satisfied || Providers.empty() || Ref.RT != Explicit)
  continue;
const auto &TB = AST.getTokens();
auto SpelledTokens = TB.spelledForExpanded(...);
if (!SpelledTokens)
  continue;
...
```

increases readability by:
- reducing the nesting
- making it more explicit about under what assumptions the rest of the code is working


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:665
+                  Tokens.expandedTokens(Ref.RefLocation))) {
+            syntax::FileRange Range = syntax::Token::range(
+                AST.getSourceManager(), (*SpelledForExpanded)[0],
----------------
nit: `auto Range = syntax::Token::range(SM, SpelledForExpanded->front(), SpelledForExpanded->back());`


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:668-671
+            llvm::SmallVector<include_cleaner::Header> ProviderHeaders;
+            for (const auto H : Providers) {
+              ProviderHeaders.push_back(H);
+            }
----------------
you don't need to explicitly copy `Providers` into `ProviderHeaders`, you can pass it directly to `DiagInfo` below.


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:680
+              SymbolName =
+                  static_cast<const NamedDecl &>(Ref.Target.declaration())
+                      .getQualifiedNameAsString();
----------------
we use llvm casts, specifically `llvm::dyn_cast<NamedDecl*>(&Ref.Target.declaration())->getQualifiedNameAsString()`


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:681
+                  static_cast<const NamedDecl &>(Ref.Target.declaration())
+                      .getQualifiedNameAsString();
+              break;
----------------
`getQualifiedNameAsString` is going to print names that are really ugly at certain times, but unfortunately that's a problem we don't have a great solution to. so no action needed ATM, but we might want to switch between qualified and unqualified name depending on the length at the very least (e.g. symbol is coming from a templated class, which has a nasty nested instantiation).


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:685
+            MissingIncludeDiagInfo DiagInfo{SymbolName, Range, ProviderHeaders};
+            MissingIncludes.push_back(std::move(DiagInfo));
+          }
----------------
nit: `MissingIncludes.emplace_back(std::move(SymbolName), Range, Providers);`


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:691
+      getUnused(AST, Used, /*ReferencedPublicHeaders*/ {});
+  return {UnusedIncludes, MissingIncludes};
+}
----------------
nit: you'd want `std::move`s here, around both of them


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:718
+
+  if (!Cfg.Diagnostics.Suppress.contains("unused-includes")) {
+    switch (Cfg.Diagnostics.UnusedIncludes) {
----------------
nit: it might be worth re-writing the following section as:
```
std::vector<Diag> Result = generateUnusedIncludeDiagnostics(AST.tuPath(),
                Cfg.Diagnostics.UnusedIncludes == Strict ? computeUnusedIncludes(AST) : Findings.UnusedIncludes, Code);
llvm::move(generateMissingIncludeDiagnostics(AST, MissingIncludes, Code),
             std::back_inserter(Result));
```

and move the checks like `if (Cfg.Diagnostics.MissingIncludes == Config::IncludesPolicy::Strict && !Cfg.Diagnostics.Suppress.contains("missing-includes"))` into the specific function, e.g. `generateUnusedIncludeDiagnostics`, as they already do some of the diagnostic filtering logic.


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:523
+    TransformedInc.Angled = WrittenRef.starts_with("<");
+    if (auto FE = SM.getFileManager().getFile(Inc.Resolved))
+      TransformedInc.Resolved = *FE;
----------------
VitaNuo wrote:
> kadircet wrote:
> > i don't think we should convert any unresolved includes. it usually means header wasn't found, so we can't perform any reliable analysis on them. anything i am missing?
> Ok I will skip unresolved includes. But I am not sure I fully understand. We do the following:
> 1. Convert clangd includes to include-cleaner includes.
> 2. Match include-cleaner includes with symbol providers.
> 3. If match found, symbol reference is satisfied.
> 
> How does it matter in this scenario if the include is resolved? AFAIU as long as the header is spelled in the main file + it's matched with a symbol provider, we should say that the symbol reference is satisfied.
> 
> Otherwise, it seems like we'll say that the header is missing, although it's there in the main file and unresolved.
> 
> I don't know if this is in any way a realistic scenario. I am just approaching it with general logic, and in this sense having more "satisfied" symbols seems better than having less => leads to less false positives. It can lead to false negatives, too, but AFAIU false negatives are much less of a risk for missing include management. 
> How does it matter in this scenario if the include is resolved? AFAIU as long as the header is spelled in the main file + it's matched with a symbol provider, we should say that the symbol reference is satisfied.

It doesn't matter at a high-level view. But since the implementation recognizes headers based on the HeaderID and it's only defined for resolved includes, if we were to have any unresolved include matches somehow (e.g. it has spelling "foo/bar.h", but is unresolved, and we do a match based on spelling because some IWYU pragma pointed at this header), we would hit the assertion around HeaderID always having value.


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:535
+  Diag D;
+  D.Message = llvm::formatv("header {0} is missing", Spelling);
+  D.Name = "missing-includes";
----------------
VitaNuo wrote:
> kadircet wrote:
> > i think the symbol name should also be part of the diagnostic. as editors can show these diagnostics without context (e.g. you've got a big file open, there's a diagnostic panel displaying messages/fixes for all of the file. you should be able to reason about the diagnostics and fixes without jumping all over the file). so maybe something like:
> > 
> > `formatv("{0} providing '{1}' is not directly included", Header, Symbol)`
> Sure. The new design does this, as well as skipping the header name.
i feel like there's actually value in keeping the header name around, i.e. the user will have some idea about the action, without triggering an extra interaction. this helps especially in cases where the finding is wrong, they'll discover this sooner, hence we'll be more likely to receive bug reports. but don't really have a strong preference, so feel free to keep it that way.


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:549
+  bool Angled = HeaderRef.starts_with("<");
+  std::optional<tooling::Replacement> Replacement = HeaderIncludes.insert(
+      HeaderRef.trim("\"<>"), Angled, tooling::IncludeDirective::Include);
----------------
VitaNuo wrote:
> kadircet wrote:
> > this returns nullopt only if Header is already included in the main file. our analysis should never suggest such a header for include, unless the include is coming from a PP-disabled region.
> > 
> > so i think if Replacement generation fails, we should drop the diagnostic completely rather than just dropping the fix. WDYT?
> Ok, sure. What's a PP-disabled region? Are you talking about #ifdef's and such?
> What's a PP-disabled region

Yes, I was trying to say "preprocessor disabled region". e.g. in a piece of code like:
```
#if 0
#include "foo.h"
#endif
```

preprocessor won't actually trigger inclusion of "foo.h", but most of the heuristic parsers (most importantly the logic in `HeaderIncludes`) will treat this include as usual.


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.h:41
+struct MissingIncludeDiagInfo {
+  std::string SymbolName;
+  syntax::FileRange SymRefRange;
----------------
it seems unfortunate that we're duplicating these strings for each diag we want to emit. it might be better to just store a Symbol here (similar to Header) and delay spelling until needed.


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.h:46
+  bool operator==(const MissingIncludeDiagInfo &Other) const {
+    return SymbolName == Other.SymbolName && SymRefRange == Other.SymRefRange &&
+           Providers == Other.Providers;
----------------
nit: `std::tie(SymbolName, SymRefRange, Providers) == std::tie(Other.SymbolName, ...);`

I'd also put the SymbolName match to be the last, as it's a string match and might be more costly (if we can bail out early)


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.h:53
+  std::vector<const Inclusion *> UnusedIncludes;
+  llvm::SmallVector<MissingIncludeDiagInfo> MissingIncludes;
+};
----------------
I don't think we've much to gain by using SmallVector here, instead of std::vector


================
Comment at: clang-tools-extra/clangd/unittests/IncludeCleanerTests.cpp:427
+
+  IncludeCleanerFindings Findings = computeIncludeCleanerFindings(AST);
+  const SourceManager &SM = AST.getSourceManager();
----------------
i don't think there's much value in testing out analysis here, we should rather focus on diagnostics generation, which isn't part of `computeIncludeCleanerFindings`. existing tests were focused on analysis, because legacy implementation for include-cleaner was actually performing these analysis itself.

so I'd rather suggest having trivial test cases (from include-cleaner analysis perspective, no need for complicated directory/file layouts) and rather test things out through calls to `generateMissingIncludeDiagnostics` to make sure diagnostics has the right ranges, text and fix contents.

right now we're not testing:
- header spelling
- symbol name generation
- ranges these diagnostics correspond to

and these are the main functionality we're adding on top of include-cleaner analysis.

you can take a look at the tests in llvm/llvm-project/clang-tools-extra/clangd/unittests/DiagnosticsTests.cpp to see how we're testing out diagnostics ranges, messages, fixes and what kind of helpers/matchers we have for them.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143496/new/

https://reviews.llvm.org/D143496



More information about the cfe-commits mailing list