[PATCH] D143496: [clangd] Add support for missing includes analysis.

Viktoriia Bakalova via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Feb 22 09:29:10 PST 2023


VitaNuo added a comment.

Thanks for the comments! I should have addressed everything apart from testing.



================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:523
+    TransformedInc.Angled = WrittenRef.starts_with("<");
+    if (auto FE = SM.getFileManager().getFile(Inc.Resolved))
+      TransformedInc.Resolved = *FE;
----------------
kadircet wrote:
> VitaNuo wrote:
> > kadircet wrote:
> > > i don't think we should convert any unresolved includes. it usually means header wasn't found, so we can't perform any reliable analysis on them. anything i am missing?
> > Ok I will skip unresolved includes. But I am not sure I fully understand. We do the following:
> > 1. Convert clangd includes to include-cleaner includes.
> > 2. Match include-cleaner includes with symbol providers.
> > 3. If match found, symbol reference is satisfied.
> > 
> > How does it matter in this scenario if the include is resolved? AFAIU as long as the header is spelled in the main file + it's matched with a symbol provider, we should say that the symbol reference is satisfied.
> > 
> > Otherwise, it seems like we'll say that the header is missing, although it's there in the main file and unresolved.
> > 
> > I don't know if this is in any way a realistic scenario. I am just approaching it with general logic, and in this sense having more "satisfied" symbols seems better than having less => leads to less false positives. It can lead to false negatives, too, but AFAIU false negatives are much less of a risk for missing include management. 
> > How does it matter in this scenario if the include is resolved? AFAIU as long as the header is spelled in the main file + it's matched with a symbol provider, we should say that the symbol reference is satisfied.
> 
> It doesn't matter at a high-level view. But since the implementation recognizes headers based on the HeaderID and it's only defined for resolved includes, if we were to have any unresolved include matches somehow (e.g. it has spelling "foo/bar.h", but is unresolved, and we do a match based on spelling because some IWYU pragma pointed at this header), we would hit the assertion around HeaderID always having value.
Ah ok, makes sense.


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:535
+  Diag D;
+  D.Message = llvm::formatv("header {0} is missing", Spelling);
+  D.Name = "missing-includes";
----------------
kadircet wrote:
> VitaNuo wrote:
> > kadircet wrote:
> > > i think the symbol name should also be part of the diagnostic. as editors can show these diagnostics without context (e.g. you've got a big file open, there's a diagnostic panel displaying messages/fixes for all of the file. you should be able to reason about the diagnostics and fixes without jumping all over the file). so maybe something like:
> > > 
> > > `formatv("{0} providing '{1}' is not directly included", Header, Symbol)`
> > Sure. The new design does this, as well as skipping the header name.
> i feel like there's actually value in keeping the header name around, i.e. the user will have some idea about the action, without triggering an extra interaction. this helps especially in cases where the finding is wrong, they'll discover this sooner, hence we'll be more likely to receive bug reports. but don't really have a strong preference, so feel free to keep it that way.
Hm I'd expect we'll actually have a higher probability to receive a bug report if the user clicks on the "Quick fix" and gets a wrong header included, because that's annoying :)
Having a full message on hover, seeing that it's wrong and just ignoring it might not be annoying enough to file a bug ;-)

But this is a pure speculation ofc.
I think the point discussed on the design document makes sense: without mentioning the header name it will be a bit easier to extend this to suggesting multiple fixes. So if that's the general direction, I'd prefer to keep it the current way.



================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:549
+  bool Angled = HeaderRef.starts_with("<");
+  std::optional<tooling::Replacement> Replacement = HeaderIncludes.insert(
+      HeaderRef.trim("\"<>"), Angled, tooling::IncludeDirective::Include);
----------------
kadircet wrote:
> VitaNuo wrote:
> > kadircet wrote:
> > > this returns nullopt only if Header is already included in the main file. our analysis should never suggest such a header for include, unless the include is coming from a PP-disabled region.
> > > 
> > > so i think if Replacement generation fails, we should drop the diagnostic completely rather than just dropping the fix. WDYT?
> > Ok, sure. What's a PP-disabled region? Are you talking about #ifdef's and such?
> > What's a PP-disabled region
> 
> Yes, I was trying to say "preprocessor disabled region". e.g. in a piece of code like:
> ```
> #if 0
> #include "foo.h"
> #endif
> ```
> 
> preprocessor won't actually trigger inclusion of "foo.h", but most of the heuristic parsers (most importantly the logic in `HeaderIncludes`) will treat this include as usual.
thanks.


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:359
+    TransformedInc.Angled = WrittenRef.starts_with("<");
+    auto FE = SM.getFileManager().getFile(Inc.Resolved);
+    if (!FE)
----------------
kadircet wrote:
> unfortunately `getFile` returns an `llvm::Expected` which requires explicit error handling (or it'll trigger a crash). you can simply `elog` the issue:
> ```
> if (!FE) {
>   elog("IncludeCleaner: Failed to get an entry for resolved path {0}: {1}", Inc.Resolved, FE.takeError());
>   continue;
> }
> ```
It returns `llvm::ErrorOr`, if I am not mistaken. There was explicit error handling already (`if (!FE) continue` below), just without the `elog`. Are you trying to say it will crash without the logging? Not sure that's feasible :)
Added the logging.


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:519
+std::vector<include_cleaner::SymbolReference>
+collectMacroReferences(ParsedAST &AST) {
+  const auto &SM = AST.getSourceManager();
----------------
kadircet wrote:
> this should also be either static or put into anon namespace
sure, thanks.


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:554
+
+std::vector<Diag> generateMissingIncludeDiagnostics(
+    ParsedAST &AST, llvm::SmallVector<MissingIncludeDiagInfo> MissingIncludes,
----------------
kadircet wrote:
> `generateMissingIncludeDiagnostics` should also be either static or put into anon namespace
Well then `generateUnusedIncludeDiagnostics` too, I guess.


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:563
+  for (const auto &SymbolWithMissingInclude : MissingIncludes) {
+    std::string Spelling =
+        spellHeader(AST, MainFile, SymbolWithMissingInclude.Providers.front());
----------------
kadircet wrote:
> we've got `Config::Diagnostics::Includes::IgnoreHeader` that disables include-cleaner analysis on headers that match a pattern. we should be respecting those config options here too.
Ok, added that. Please have a look. At the moment it's trying to run filters on the output of the `spellHeader` method.


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:582
+        HeaderRef.trim("\"<>"), Angled, tooling::IncludeDirective::Include);
+    if (!Replacement.has_value())
       continue;
----------------
kadircet wrote:
> could you add a comment here, as this is subtle, something like `We might suggest insertion of an existing include in edge cases, e.g. include is present in a PP-disabled region, or spelling of the header turns out to be the same as one of the unresolved includes in the main file`
Ok sure. 


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:589
+    D.InsideMainFile = true;
+    Result.push_back(D);
+  }
----------------
kadircet wrote:
> nit: `Result.push_back(std::move(D))`, as `D` has lots of strings in it, it might be expensive to copy it around. Even better if you construct the Diag in place via `Diag &D = Result.emplace_back()`, you can achieve this by moving all the logic that might bail out (e.g. replacement generation) to the top of the loop, and start forming the diagnostic only after we're sure that there'll be one.
Sure. Doing the in-place construction now.


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:595
+std::vector<Diag>
+generateUnusedIncludeDiagnostics(PathRef FileName,
+                                 std::vector<const Inclusion *> UnusedIncludes,
----------------
kadircet wrote:
> this also needs to be static or put into anon namespace
Ah didn't realize that you also left a comment here when replying to the identical comment on `generateMissingIncludeDiagnostics`. Should be done.


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:596
+generateUnusedIncludeDiagnostics(PathRef FileName,
+                                 std::vector<const Inclusion *> UnusedIncludes,
+                                 llvm::StringRef Code) {
----------------
kadircet wrote:
> no need to copy the vector by taking a `std::vector` here, you can take an `llvm::ArrayRef` instead.
Oh this isn't even my code, but as long as it's a small change, sure :)


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:623
     D.InsideMainFile = true;
-    Result.push_back(std::move(D));
+    Result.push_back(D);
+  }
----------------
kadircet wrote:
> can you restore `std::move`?
I'd rather do the emplacing, so that it's the same as in the `generateMissingIncludeDiagnostics`.


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:639
+  llvm::DenseSet<IncludeStructure::HeaderID> Used;
+  include_cleaner::walkUsed(
+      AST.getLocalTopLevelDecls(), /*MacroRefs=*/Macros,
----------------
kadircet wrote:
> can you introduce a `trace::Span` wrapping the call to `walkUsed` with name `IncludeCleanerAnalysis` so that we can collect some stats about latency here?
Sure.


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:681
+                  static_cast<const NamedDecl &>(Ref.Target.declaration())
+                      .getQualifiedNameAsString();
+              break;
----------------
kadircet wrote:
> `getQualifiedNameAsString` is going to print names that are really ugly at certain times, but unfortunately that's a problem we don't have a great solution to. so no action needed ATM, but we might want to switch between qualified and unqualified name depending on the length at the very least (e.g. symbol is coming from a templated class, which has a nasty nested instantiation).
Ok so for now no action, IIUC.


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:685
+            MissingIncludeDiagInfo DiagInfo{SymbolName, Range, ProviderHeaders};
+            MissingIncludes.push_back(std::move(DiagInfo));
+          }
----------------
kadircet wrote:
> nit: `MissingIncludes.emplace_back(std::move(SymbolName), Range, Providers);`
This does not compile. Seems like it needs a certain type of constructor to be present in `MissingIncludeDiagInfo`, whereas atm it's just a struct.


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:718
+
+  if (!Cfg.Diagnostics.Suppress.contains("unused-includes")) {
+    switch (Cfg.Diagnostics.UnusedIncludes) {
----------------
kadircet wrote:
> nit: it might be worth re-writing the following section as:
> ```
> std::vector<Diag> Result = generateUnusedIncludeDiagnostics(AST.tuPath(),
>                 Cfg.Diagnostics.UnusedIncludes == Strict ? computeUnusedIncludes(AST) : Findings.UnusedIncludes, Code);
> llvm::move(generateMissingIncludeDiagnostics(AST, MissingIncludes, Code),
>              std::back_inserter(Result));
> ```
> 
> and move the checks like `if (Cfg.Diagnostics.MissingIncludes == Config::IncludesPolicy::Strict && !Cfg.Diagnostics.Suppress.contains("missing-includes"))` into the specific function, e.g. `generateUnusedIncludeDiagnostics`, as they already do some of the diagnostic filtering logic.
> nit: it might be worth re-writing the following section as

This code seems to ignore the option `Config::IncludesPolicy::None`. It's saying to either return the old-style clangd results in case of `Strict` or `include-cleaner` results otherwise (incl. in case of `None`). Am I missing something?

> and move the checks like ...

Ok, moved into `generateMissingIncludeDiagnostics`.


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.h:41
+struct MissingIncludeDiagInfo {
+  std::string SymbolName;
+  syntax::FileRange SymRefRange;
----------------
kadircet wrote:
> it seems unfortunate that we're duplicating these strings for each diag we want to emit. it might be better to just store a Symbol here (similar to Header) and delay spelling until needed.
I'm not sure I can see your point re `delaying spelling until needed`.  
Each `MissingIncludeDiagInfo` corresponds to one diagnostic.

Whether we resolve the `Symbol` to the `SymbolName` during analysis (i.e., `walkUsed`) or in diagnostic generation (i.e., `generateMissingIncludeDiagnostics`), does not change the fact that the same symbol will be resolved to its name multiple times.

As discussed in the doc, this results in simpler code than the version that maps a single `Symbol` object to multiple  `Range`s and `Provider`s. 
AFAICS, the implementation of `Symbol` to `SymbolName` resolution does not seem to be very expensive either.


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.h:46
+  bool operator==(const MissingIncludeDiagInfo &Other) const {
+    return SymbolName == Other.SymbolName && SymRefRange == Other.SymRefRange &&
+           Providers == Other.Providers;
----------------
kadircet wrote:
> nit: `std::tie(SymbolName, SymRefRange, Providers) == std::tie(Other.SymbolName, ...);`
> 
> I'd also put the SymbolName match to be the last, as it's a string match and might be more costly (if we can bail out early)
Ok, sure. I can see your point with the string comparisons. But I don't fully see the point of using `std::tie` here. What is so great about creating an extra object, even if it only stores references? Is it a purely stylistic suggestion?


================
Comment at: clang-tools-extra/clangd/IncludeCleaner.h:53
+  std::vector<const Inclusion *> UnusedIncludes;
+  llvm::SmallVector<MissingIncludeDiagInfo> MissingIncludes;
+};
----------------
kadircet wrote:
> I don't think we've much to gain by using SmallVector here, instead of std::vector
Sure. I'm not so clear on the preferences yet. AFAIU your point, stdlib is to be preferred unless the llvm alternative is clearly beneficial. Is that the case?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143496/new/

https://reviews.llvm.org/D143496



More information about the cfe-commits mailing list