[PATCH] D143496: [clangd] Add support for missing includes analysis.

Thu Feb 23 08:46:25 PST 2023

VitaNuo added a comment.

Thanks for the comments!

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:408
+
+bool isFilteredByConfig(const Config &Cfg, llvm::StringRef HeaderSpelling) {
+  for (auto &Filter : Cfg.Diagnostics.Includes.IgnoreHeader) {
----------------
kadircet wrote:
> this shouldn't be spelling, it should be the resolved path of the include.
Ok thanks.

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:697
+            (*SpelledForExpanded)[(*SpelledForExpanded).size() - 1]);
+        std::string SymbolName;
+        switch (Ref.Target.kind()) {
----------------
kadircet wrote:
> as mentioned elsewhere, i think we should delay this symbol name spelling to diagnostic generation. to make sure core analysis we perform don't do work that might not get re-used (e.g. if we're not going to diagnose missing includes, or in the future when we don't care about spelling of all the symbols)
Ok should be done now.

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:712
+      getUnused(AST, Used, /*ReferencedPublicHeaders*/ {});
+  return {std::move(UnusedIncludes), std::move(MissingIncludes)};
 }
----------------
kadircet wrote:
> nit: I think logically it makes more sense for us to return set of `Used` includes here, and let the interaction that issues unused include diagnostics to derive this information from the set of used includes, and change the the missingincludes to a `vector< tuple<Symbol, Ref, Providers> >` (not only the unsatisfied ones) would represent the analysis better and make it more usable in the future (i.e. when we want to augment Hover responses, we can't re-use all the logic in here, we really need to implement another call to `walkUsed` because the analysis we get out of this call won't contain information for `satisfied` symbols.
> 
> no need to do it now though, we can perform that kind of refactoring as we're adding the features too (or maybe it'll actually look neater to just have another call in those features rather than try and re-use the logic here)
Thanks. This might very well be the case, but this comment also seems to suggest some premature optimization (in a way). It totally makes sense to re-use what's re-usable, but this sort of refactoring really only makes sense once we have a clear use case (and get there :)

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:359
+    TransformedInc.Angled = WrittenRef.starts_with("<");
+    auto FE = SM.getFileManager().getFile(Inc.Resolved);
+    if (!FE)
----------------
kadircet wrote:
> VitaNuo wrote:
> > kadircet wrote:
> > > unfortunately `getFile` returns an `llvm::Expected` which requires explicit error handling (or it'll trigger a crash). you can simply `elog` the issue:
> > > ```
> > > if (!FE) {
> > >   elog("IncludeCleaner: Failed to get an entry for resolved path {0}: {1}", Inc.Resolved, FE.takeError());
> > >   continue;
> > > }
> > > ```
> > It returns `llvm::ErrorOr`, if I am not mistaken. There was explicit error handling already (`if (!FE) continue` below), just without the `elog`. Are you trying to say it will crash without the logging? Not sure that's feasible :)
> > Added the logging.
> > It returns llvm::ErrorOr, if I am not mistaken. 
> 
> Ah you're right, I confused it with `getFileRef`. So in theory `ErrorOr` doesn't require explicit checking hence it won't trigger a crash if destroyed while containing an error.
> 
> > Are you trying to say it will crash without the logging? Not sure that's feasible :)
> 
> Right, it isn't the logging that'll prevent the crash but rather a combination of the call to `takeError` and the way `elog` consumes `Error` objects. But it isn't relevant here, as `ErrorOr` doesn't require mandatory handling.
thanks for the explanation.

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:596
+generateUnusedIncludeDiagnostics(PathRef FileName,
+                                 std::vector<const Inclusion *> UnusedIncludes,
+                                 llvm::StringRef Code) {
----------------
kadircet wrote:
> VitaNuo wrote:
> > kadircet wrote:
> > > no need to copy the vector by taking a `std::vector` here, you can take an `llvm::ArrayRef` instead.
> > Oh this isn't even my code, but as long as it's a small change, sure :)
> well, this is `our` code in the end :D
Sorry, wrong wording. I meant to say that this is not the code that has been touched in this patch. It might sometimes get annoying when comments on the patch dig too deep into code that's not in the diff.

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:691
+      getUnused(AST, Used, /*ReferencedPublicHeaders*/ {});
+  return {UnusedIncludes, MissingIncludes};
+}
----------------
kadircet wrote:
> kadircet wrote:
> > nit: you'd want `std::move`s here, around both of them
> oops, i forgot to put the surrounding `{}` it should've been `MissingIncludes.emplace_back({...});`
No this seems to be even more wrong.
`stl_vector.h(1303, 2): Candidate template ignored: substitution failure: deduced incomplete pack <(no value)> for template parameter '_Args'`.

This is for the version with braces.

And this is for no braces:

```
/usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bits/new_allocator.h:175:23: error: no matching constructor for initialization of 'clang::clangd::MissingIncludeDiagInfo'
        { ::new((void *)__p) _Up(std::forward<_Args>(__args)...); }
```

It seems that it just doesn't cooperate with structs.

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:718
+
+  if (!Cfg.Diagnostics.Suppress.contains("unused-includes")) {
+    switch (Cfg.Diagnostics.UnusedIncludes) {
----------------
kadircet wrote:
> VitaNuo wrote:
> > kadircet wrote:
> > > nit: it might be worth re-writing the following section as:
> > > ```
> > > std::vector<Diag> Result = generateUnusedIncludeDiagnostics(AST.tuPath(),
> > >                 Cfg.Diagnostics.UnusedIncludes == Strict ? computeUnusedIncludes(AST) : Findings.UnusedIncludes, Code);
> > > llvm::move(generateMissingIncludeDiagnostics(AST, MissingIncludes, Code),
> > >              std::back_inserter(Result));
> > > ```
> > > 
> > > and move the checks like `if (Cfg.Diagnostics.MissingIncludes == Config::IncludesPolicy::Strict && !Cfg.Diagnostics.Suppress.contains("missing-includes"))` into the specific function, e.g. `generateUnusedIncludeDiagnostics`, as they already do some of the diagnostic filtering logic.
> > > nit: it might be worth re-writing the following section as
> > 
> > This code seems to ignore the option `Config::IncludesPolicy::None`. It's saying to either return the old-style clangd results in case of `Strict` or `include-cleaner` results otherwise (incl. in case of `None`). Am I missing something?
> > 
> > > and move the checks like ...
> > 
> > Ok, moved into `generateMissingIncludeDiagnostics`.
> > This code seems to ignore the option Config::IncludesPolicy::None. It's saying to either return the old-style clangd results in case of Strict or include-cleaner results otherwise (incl. in case of None). Am I missing something?
> 
> Well that was to be addressed by second part of the comment `And move the checks like if (Cfg.Diagnostics.MissingIncludes == Config::IncludesPolicy::Strict && !Cfg.Diagnostics.Suppress.contains("missing-includes")) into the specific function, e.g. generateUnusedIncludeDiagnostics, as they already do some of the diagnostic filtering logic.`
> I was talking about both missing and unused include diagnostics generation (hence `e.g.`), similar to the early exit in `generateMissingIncludeDiagnostics`, we should have one that returns an empty set of diagnostics, when it's suppressed or not enabled.
Ok, I've refactored more of the config checking logic inside `generate..` functions.

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.h:41
+struct MissingIncludeDiagInfo {
+  std::string SymbolName;
+  syntax::FileRange SymRefRange;
----------------
kadircet wrote:
> VitaNuo wrote:
> > kadircet wrote:
> > > it seems unfortunate that we're duplicating these strings for each diag we want to emit. it might be better to just store a Symbol here (similar to Header) and delay spelling until needed.
> > I'm not sure I can see your point re `delaying spelling until needed`.  
> > Each `MissingIncludeDiagInfo` corresponds to one diagnostic.
> > 
> > Whether we resolve the `Symbol` to the `SymbolName` during analysis (i.e., `walkUsed`) or in diagnostic generation (i.e., `generateMissingIncludeDiagnostics`), does not change the fact that the same symbol will be resolved to its name multiple times.
> > 
> > As discussed in the doc, this results in simpler code than the version that maps a single `Symbol` object to multiple  `Range`s and `Provider`s. 
> > AFAICS, the implementation of `Symbol` to `SymbolName` resolution does not seem to be very expensive either.
> > Whether we resolve the Symbol to the SymbolName during analysis (i.e., walkUsed) or in diagnostic generation (i.e., generateMissingIncludeDiagnostics), does not change the fact that the same symbol will be resolved to its name multiple times.
> 
> we might not generate those diagnostics always, e.g. missing-includes is disabled, but unused-includes is on, or maybe we're going to use these results for something else like Hover responses.
> 
> > As discussed in the doc, this results in simpler code than the version that maps a single Symbol object to multiple Ranges and Providers.
> 
> I wasn't trying to suggest having a map here, I was suggesting just storing a `Symbol S` instead of `string Name`.
> 
> > AFAICS, the implementation of Symbol to SymbolName resolution does not seem to be very expensive either.
> 
> Well, generating strings are usually expensive (not asymptotically but in practice, as they tend to require lots of memory allocations).
Ok, agreed. Storing symbols now. Having only unused include analysis on is a convincing use case.

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.h:53
+  std::vector<const Inclusion *> UnusedIncludes;
+  llvm::SmallVector<MissingIncludeDiagInfo> MissingIncludes;
+};
----------------
kadircet wrote:
> VitaNuo wrote:
> > kadircet wrote:
> > > I don't think we've much to gain by using SmallVector here, instead of std::vector
> > Sure. I'm not so clear on the preferences yet. AFAIU your point, stdlib is to be preferred unless the llvm alternative is clearly beneficial. Is that the case?
> > stdlib is to be preferred unless the llvm alternative is clearly beneficial
> 
> Right. `llvm::SmallVector` and `std::vector` have different use cases, we usually go for the former if we're sure that number of elements we want to store are going to be handful (literally less than 10) most of the time and the size of the objects themselves is not too big.
> 
> As `SmallVector` chooses to store objects internally (until it grows too much), rather than allocating a bunch of memory elsewhere and just storing a pointer (as std::vector does). This has benefits when your vector isn't going to grow beyond smallvector's limits (you don't need to pay for memory allocations, which are expensive), but has other costs (e.g. moving a std::vector is trivial as it's just assignment of a pointer and size, but moving a smallvector might not be as trivial (it at least needs to move all the elements) also the initial memory cost for smallvector is higher than std::vector (as it takes up the same space independent of it's fullness).
> 
> So in this example, we're unlikely to have a small number of `MissingIncludes`, `MissingIncludeDiagInfo` is a big enough struct (vector and filerange add up to more than 30 bytes). Hence std::vector feels like the better choice.
thanks!

================
Comment at: clang-tools-extra/clangd/unittests/IncludeCleanerTests.cpp:427
+
+  IncludeCleanerFindings Findings = computeIncludeCleanerFindings(AST);
+  const SourceManager &SM = AST.getSourceManager();
----------------
kadircet wrote:
> i don't think there's much value in testing out analysis here, we should rather focus on diagnostics generation, which isn't part of `computeIncludeCleanerFindings`. existing tests were focused on analysis, because legacy implementation for include-cleaner was actually performing these analysis itself.
> 
> so I'd rather suggest having trivial test cases (from include-cleaner analysis perspective, no need for complicated directory/file layouts) and rather test things out through calls to `generateMissingIncludeDiagnostics` to make sure diagnostics has the right ranges, text and fix contents.
> 
> right now we're not testing:
> - header spelling
> - symbol name generation
> - ranges these diagnostics correspond to
> 
> and these are the main functionality we're adding on top of include-cleaner analysis.
> 
> you can take a look at the tests in llvm/llvm-project/clang-tools-extra/clangd/unittests/DiagnosticsTests.cpp to see how we're testing out diagnostics ranges, messages, fixes and what kind of helpers/matchers we have for them.
Thank you, this makes sense. However, I believe we need to use `issueIncludeCleanerDiagnostics`rather than `generateMissingIncludeDiagnostics`, since the latter is private.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143496/new/

https://reviews.llvm.org/D143496