[PATCH] D143496: [clangd] Add support for missing includes analysis.

Thu Feb 23 00:59:25 PST 2023

kadircet added inline comments.

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:291
   }
   for (auto &Filter : Cfg.Diagnostics.Includes.IgnoreHeader) {
     // Convert the path to Unix slashes and try to match against the filter.
----------------
can you also change the logic here to use `isFilteredByConfig` (we need the `native` call inside `isFilteredByConfig` as well to make sure it works on windows)

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:373
+                        include_cleaner::Header Provider) {
+  std::string SpelledHeader;
+  if (Provider.kind() == include_cleaner::Header::Physical) {
----------------
nit: you can directly define `SpelledHeader` at line 377

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:408
+
+bool isFilteredByConfig(const Config &Cfg, llvm::StringRef HeaderSpelling) {
+  for (auto &Filter : Cfg.Diagnostics.Includes.IgnoreHeader) {
----------------
this shouldn't be spelling, it should be the resolved path of the include.

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:433
+    if (isFilteredByConfig(Cfg, Spelling)) {
+      dlog("{0} header is filtered out by the configuration", Spelling);
+      continue;
----------------
can you prefix this with `IncludeCleaner: ` and rather say `not diagnosing missing include {0}, filtered by config` to add a bit more context about what specific interaction the log is coming from

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:461
+                         SymbolWithMissingInclude.SymRefRange.endOffset())};
+    D.Fixes.emplace_back();
+    D.Fixes.back().Message = "#include " + Spelling;
----------------
nit:
```
auto &F = D.Fixes.emplace_back();
F.Message = ...;
F.Edits.push_back(replacementToEdit(...));
```

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:465
+    D.Fixes.back().Edits.emplace_back(std::move(Edit));
+    D.InsideMainFile = true;
+  }
----------------
nit: it'd put this next to `D.File` above

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:695
+        auto Range = syntax::Token::range(
+            AST.getSourceManager(), (*SpelledForExpanded)[0],
+            (*SpelledForExpanded)[(*SpelledForExpanded).size() - 1]);
----------------
`auto Range = syntax::Token::range(SM, SpelledForExpanded->front(), SpelledForExpanded->back());`

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:697
+            (*SpelledForExpanded)[(*SpelledForExpanded).size() - 1]);
+        std::string SymbolName;
+        switch (Ref.Target.kind()) {
----------------
as mentioned elsewhere, i think we should delay this symbol name spelling to diagnostic generation. to make sure core analysis we perform don't do work that might not get re-used (e.g. if we're not going to diagnose missing includes, or in the future when we don't care about spelling of all the symbols)

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:712
+      getUnused(AST, Used, /*ReferencedPublicHeaders*/ {});
+  return {std::move(UnusedIncludes), std::move(MissingIncludes)};
 }
----------------
nit: I think logically it makes more sense for us to return set of `Used` includes here, and let the interaction that issues unused include diagnostics to derive this information from the set of used includes, and change the the missingincludes to a `vector< tuple<Symbol, Ref, Providers> >` (not only the unsatisfied ones) would represent the analysis better and make it more usable in the future (i.e. when we want to augment Hover responses, we can't re-use all the logic in here, we really need to implement another call to `walkUsed` because the analysis we get out of this call won't contain information for `satisfied` symbols.

no need to do it now though, we can perform that kind of refactoring as we're adding the features too (or maybe it'll actually look neater to just have another call in those features rather than try and re-use the logic here)

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:359
+    TransformedInc.Angled = WrittenRef.starts_with("<");
+    auto FE = SM.getFileManager().getFile(Inc.Resolved);
+    if (!FE)
----------------
VitaNuo wrote:
> kadircet wrote:
> > unfortunately `getFile` returns an `llvm::Expected` which requires explicit error handling (or it'll trigger a crash). you can simply `elog` the issue:
> > ```
> > if (!FE) {
> >   elog("IncludeCleaner: Failed to get an entry for resolved path {0}: {1}", Inc.Resolved, FE.takeError());
> >   continue;
> > }
> > ```
> It returns `llvm::ErrorOr`, if I am not mistaken. There was explicit error handling already (`if (!FE) continue` below), just without the `elog`. Are you trying to say it will crash without the logging? Not sure that's feasible :)
> Added the logging.
> It returns llvm::ErrorOr, if I am not mistaken. 

Ah you're right, I confused it with `getFileRef`. So in theory `ErrorOr` doesn't require explicit checking hence it won't trigger a crash if destroyed while containing an error.

> Are you trying to say it will crash without the logging? Not sure that's feasible :)

Right, it isn't the logging that'll prevent the crash but rather a combination of the call to `takeError` and the way `elog` consumes `Error` objects. But it isn't relevant here, as `ErrorOr` doesn't require mandatory handling.

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:596
+generateUnusedIncludeDiagnostics(PathRef FileName,
+                                 std::vector<const Inclusion *> UnusedIncludes,
+                                 llvm::StringRef Code) {
----------------
VitaNuo wrote:
> kadircet wrote:
> > no need to copy the vector by taking a `std::vector` here, you can take an `llvm::ArrayRef` instead.
> Oh this isn't even my code, but as long as it's a small change, sure :)
well, this is `our` code in the end :D

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:691
+      getUnused(AST, Used, /*ReferencedPublicHeaders*/ {});
+  return {UnusedIncludes, MissingIncludes};
+}
----------------
kadircet wrote:
> nit: you'd want `std::move`s here, around both of them
oops, i forgot to put the surrounding `{}` it should've been `MissingIncludes.emplace_back({...});`

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:718
+
+  if (!Cfg.Diagnostics.Suppress.contains("unused-includes")) {
+    switch (Cfg.Diagnostics.UnusedIncludes) {
----------------
VitaNuo wrote:
> kadircet wrote:
> > nit: it might be worth re-writing the following section as:
> > ```
> > std::vector<Diag> Result = generateUnusedIncludeDiagnostics(AST.tuPath(),
> >                 Cfg.Diagnostics.UnusedIncludes == Strict ? computeUnusedIncludes(AST) : Findings.UnusedIncludes, Code);
> > llvm::move(generateMissingIncludeDiagnostics(AST, MissingIncludes, Code),
> >              std::back_inserter(Result));
> > ```
> > 
> > and move the checks like `if (Cfg.Diagnostics.MissingIncludes == Config::IncludesPolicy::Strict && !Cfg.Diagnostics.Suppress.contains("missing-includes"))` into the specific function, e.g. `generateUnusedIncludeDiagnostics`, as they already do some of the diagnostic filtering logic.
> > nit: it might be worth re-writing the following section as
> 
> This code seems to ignore the option `Config::IncludesPolicy::None`. It's saying to either return the old-style clangd results in case of `Strict` or `include-cleaner` results otherwise (incl. in case of `None`). Am I missing something?
> 
> > and move the checks like ...
> 
> Ok, moved into `generateMissingIncludeDiagnostics`.
> This code seems to ignore the option Config::IncludesPolicy::None. It's saying to either return the old-style clangd results in case of Strict or include-cleaner results otherwise (incl. in case of None). Am I missing something?

Well that was to be addressed by second part of the comment `And move the checks like if (Cfg.Diagnostics.MissingIncludes == Config::IncludesPolicy::Strict && !Cfg.Diagnostics.Suppress.contains("missing-includes")) into the specific function, e.g. generateUnusedIncludeDiagnostics, as they already do some of the diagnostic filtering logic.`
I was talking about both missing and unused include diagnostics generation (hence `e.g.`), similar to the early exit in `generateMissingIncludeDiagnostics`, we should have one that returns an empty set of diagnostics, when it's suppressed or not enabled.

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:535
+  Diag D;
+  D.Message = llvm::formatv("header {0} is missing", Spelling);
+  D.Name = "missing-includes";
----------------
VitaNuo wrote:
> kadircet wrote:
> > VitaNuo wrote:
> > > kadircet wrote:
> > > > i think the symbol name should also be part of the diagnostic. as editors can show these diagnostics without context (e.g. you've got a big file open, there's a diagnostic panel displaying messages/fixes for all of the file. you should be able to reason about the diagnostics and fixes without jumping all over the file). so maybe something like:
> > > > 
> > > > `formatv("{0} providing '{1}' is not directly included", Header, Symbol)`
> > > Sure. The new design does this, as well as skipping the header name.
> > i feel like there's actually value in keeping the header name around, i.e. the user will have some idea about the action, without triggering an extra interaction. this helps especially in cases where the finding is wrong, they'll discover this sooner, hence we'll be more likely to receive bug reports. but don't really have a strong preference, so feel free to keep it that way.
> Hm I'd expect we'll actually have a higher probability to receive a bug report if the user clicks on the "Quick fix" and gets a wrong header included, because that's annoying :)
> Having a full message on hover, seeing that it's wrong and just ignoring it might not be annoying enough to file a bug ;-)
> 
> But this is a pure speculation ofc.
> I think the point discussed on the design document makes sense: without mentioning the header name it will be a bit easier to extend this to suggesting multiple fixes. So if that's the general direction, I'd prefer to keep it the current way.
> 
> I think the point discussed on the design document makes sense: without mentioning the header name it will be a bit easier to extend this to suggesting multiple fixes. So if that's the general direction, I'd prefer to keep it the current way.

Well, it's unclear when we'll get there, and moreover my suggestion was actually to still mention the header name when there's only a single provider (which will be the case most of the time).

> But this is a pure speculation ofc.

But yeah, my point of view is also mostly speculation. So feel free to keep it this way, I'll just be grumpy :P

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.h:41
+struct MissingIncludeDiagInfo {
+  std::string SymbolName;
+  syntax::FileRange SymRefRange;
----------------
VitaNuo wrote:
> kadircet wrote:
> > it seems unfortunate that we're duplicating these strings for each diag we want to emit. it might be better to just store a Symbol here (similar to Header) and delay spelling until needed.
> I'm not sure I can see your point re `delaying spelling until needed`.  
> Each `MissingIncludeDiagInfo` corresponds to one diagnostic.
> 
> Whether we resolve the `Symbol` to the `SymbolName` during analysis (i.e., `walkUsed`) or in diagnostic generation (i.e., `generateMissingIncludeDiagnostics`), does not change the fact that the same symbol will be resolved to its name multiple times.
> 
> As discussed in the doc, this results in simpler code than the version that maps a single `Symbol` object to multiple  `Range`s and `Provider`s. 
> AFAICS, the implementation of `Symbol` to `SymbolName` resolution does not seem to be very expensive either.
> Whether we resolve the Symbol to the SymbolName during analysis (i.e., walkUsed) or in diagnostic generation (i.e., generateMissingIncludeDiagnostics), does not change the fact that the same symbol will be resolved to its name multiple times.

we might not generate those diagnostics always, e.g. missing-includes is disabled, but unused-includes is on, or maybe we're going to use these results for something else like Hover responses.

> As discussed in the doc, this results in simpler code than the version that maps a single Symbol object to multiple Ranges and Providers.

I wasn't trying to suggest having a map here, I was suggesting just storing a `Symbol S` instead of `string Name`.

> AFAICS, the implementation of Symbol to SymbolName resolution does not seem to be very expensive either.

Well, generating strings are usually expensive (not asymptotically but in practice, as they tend to require lots of memory allocations).

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.h:46
+  bool operator==(const MissingIncludeDiagInfo &Other) const {
+    return SymbolName == Other.SymbolName && SymRefRange == Other.SymRefRange &&
+           Providers == Other.Providers;
----------------
VitaNuo wrote:
> kadircet wrote:
> > nit: `std::tie(SymbolName, SymRefRange, Providers) == std::tie(Other.SymbolName, ...);`
> > 
> > I'd also put the SymbolName match to be the last, as it's a string match and might be more costly (if we can bail out early)
> Ok, sure. I can see your point with the string comparisons. But I don't fully see the point of using `std::tie` here. What is so great about creating an extra object, even if it only stores references? Is it a purely stylistic suggestion?
right, the comments with `nit:` prefix are usually things that won't matter much in practice, but reflects my (well, in the general case the reviewer's) or codebase's preference.

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.h:53
+  std::vector<const Inclusion *> UnusedIncludes;
+  llvm::SmallVector<MissingIncludeDiagInfo> MissingIncludes;
+};
----------------
VitaNuo wrote:
> kadircet wrote:
> > I don't think we've much to gain by using SmallVector here, instead of std::vector
> Sure. I'm not so clear on the preferences yet. AFAIU your point, stdlib is to be preferred unless the llvm alternative is clearly beneficial. Is that the case?
> stdlib is to be preferred unless the llvm alternative is clearly beneficial

Right. `llvm::SmallVector` and `std::vector` have different use cases, we usually go for the former if we're sure that number of elements we want to store are going to be handful (literally less than 10) most of the time and the size of the objects themselves is not too big.

As `SmallVector` chooses to store objects internally (until it grows too much), rather than allocating a bunch of memory elsewhere and just storing a pointer (as std::vector does). This has benefits when your vector isn't going to grow beyond smallvector's limits (you don't need to pay for memory allocations, which are expensive), but has other costs (e.g. moving a std::vector is trivial as it's just assignment of a pointer and size, but moving a smallvector might not be as trivial (it at least needs to move all the elements) also the initial memory cost for smallvector is higher than std::vector (as it takes up the same space independent of it's fullness).

So in this example, we're unlikely to have a small number of `MissingIncludes`, `MissingIncludeDiagInfo` is a big enough struct (vector and filerange add up to more than 30 bytes). Hence std::vector feels like the better choice.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143496/new/

https://reviews.llvm.org/D143496