[PATCH] D143496: [clangd] Add support for missing includes analysis.

Tue Feb 14 08:52:54 PST 2023

VitaNuo added a comment.

Thank you for the comments! I've addressed (almost) all of them. In some places, I'm not so happy about how the code has become very nested, but I don't have ideas on how to make it better atm.

================
Comment at: clang-tools-extra/clangd/Config.h:91

+  enum class MissingIncludesPolicy {
+    /// Diagnose missing includes.
----------------
kadircet wrote:
> rather than duplicating, what about renaming `UnusedIncludesPolicy` to `IncludesPolicy` and use it for both `UnusedIncludes` and `MissingIncludes` options below?
Sure.

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:516
+
+  std::vector<include_cleaner::SymbolReference> Macros =
+      collectMacroReferences(AST);
----------------
kadircet wrote:
> 
Why would you use `auto` here? The return type is not obvious from the function call.

The style guide says: "types that you and your reviewer experience as unnecessary clutter will very often provide useful information to others. For example, you can assume that the return type of `make_unique<Foo>()` is obvious, but the return type of `MyWidgetFactory()` probably isn't." 
(http://go/cstyle#Type_deduction)

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:547
+convertIncludes(const SourceManager &SM,
+                std::vector<Inclusion> MainFileIncludes) {
+  include_cleaner::Includes Includes;
----------------
kadircet wrote:
> you can just pass an `llvm::ArrayRef<Inclusion>` to prevent a copy
By preventing a copy, do you mean that the construction of `llvm::ArrayRef<Inclusion>` will only copy a pointer to the data rather than the whole vector? AFAIU `const std::vector<Inclusion>&` should be even better then, no copies involved. CMIIW.

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:550
+  for (const Inclusion &Inc : MainFileIncludes) {
+    llvm::ErrorOr<const FileEntry *> ResolvedOrError =
+        SM.getFileManager().getFile(Inc.Resolved);
----------------
kadircet wrote:
> you can re-write this as:
> ```
> include_cleaner::Include TransformedInc;
> TransformedInc.Spelled = Inc.Written.trim("\"<>");
> TransformedInc.HashLocation = SM.getComposedLoc(SM.getMainFileID(), Inc.HashOffset); // we should actually convert this from a SourceLocation to offset in include_cleaner::Include as well
> TransformedInc.Line = Inc.HashLine;
> TransformedInc.Angled = WrittenRef.starts_with("<");
> if(auto FE = SM.getFileManager().getFile(Inc.Resolved))
>   TransformedInc.Resolved = *FE;
> Includes.add(std::move(TransformedInc));
> ```
Thanks.
It seems like `std::string` does not have a `trim` or a `starts_with` method, so AFAIU I still need to call the `llvm::StringRef` constructor.

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:570
+computeMissingIncludes(ParsedAST &AST) {
+  std::vector<include_cleaner::SymbolReference> Macros =
+      collectMacroReferences(AST);
----------------
kadircet wrote:
> nit: `auto Macros = ..`
Same as above. I am not sure about the benefit of `auto` here..

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:577
+      convertIncludes(AST.getSourceManager(), MainFileIncludes);
+  std::string FileName =
+      AST.getSourceManager()
----------------
kadircet wrote:
> you can use `AST.tuPath()`
Thanks, but this actually seems unused. It was some debugging artefact too.. :(

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:582
+          .str();
+  if (FileName.find("foo") != std::string::npos) {
+    vlog("Include cleaner includes: {0}", IncludeCleanerIncludes.all().size());
----------------
kadircet wrote:
> looks like debugging artifact?
sorry.

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:593
+          llvm::ArrayRef<include_cleaner::Header> Providers) {
+        bool Satisfied = false;
+        for (const include_cleaner::Header &H : Providers) {
----------------
kadircet wrote:
> nit: you can check whether `Ref.RT` is `Explicit` at the top, and bail out early.
This seems obsolete after merging the missing and unused includes analyses together. There is no obvious place to insert the check.

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:597
+              H.physical() == MainFile) {
+            Satisfied = true;
+          }
----------------
kadircet wrote:
> nit: you can just `break` after satisfying the include (same below)
Not if the unused and missing include analyses are merged together.

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:605
+            Ref.RT == include_cleaner::RefType::Explicit) {
+          std::string SpelledHeader = include_cleaner::spellHeader(
+              Providers.front(), AST.getPreprocessor().getHeaderSearchInfo(),
----------------
kadircet wrote:
> clangd has some header spelling customizations. so we should actually be doing this through `URI::includeSpelling(URI::create(getCanonicalPath(Providers.front().physical(), SM)))` first, and fall back to `spellHeader` if it fails for physical header providers to make sure we're consistent.
> 
> this is used by $EMPLOYER$'s integration to always spell includes relative to depot root, rather than certain include search paths.
Thank you. There seem to be a couple of indirection levels involved, so I hope I got it (somewhat) right with all the checking.

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:609
+          std::pair<FileID, unsigned int> FileIDAndOffset =
+              AST.getSourceManager().getDecomposedLoc(Ref.RefLocation);
+          Missing.push_back({SpelledHeader, FileIDAndOffset.second});
----------------
kadircet wrote:
> `RefLocation` is not necessarily spelled token location, e.g. it might be pointing at a macro expansion.
> 
> you can make use of `TokenBuffer` inside `ParsedAST` to go from this expanded location to a token spelled inside the main file, e.g. `Tokens.spelledForExpanded(Tokens.expandedTokens(Ref.RefLocation));` and use the full spelled token range afterwards for diagnostic location.
Ok, thank you. Please have a look at the current version, hopefully it makes sense.

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:661
+    D.Fixes.back().Message = "#include " + Missing.first;
+    D.Fixes.back().Edits.emplace_back();
+    D.Fixes.back().Edits.back().newText = "#include " + Missing.first + "\n";
----------------
kadircet wrote:
> we've got some tooling library to generate these edits, see https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Tooling/Inclusions/HeaderIncludes.h#L76. that way they'll be placed in "correct" position among the existing includes. you can then use [replacementToEdit](https://github.com/llvm/llvm-project/blob/main/clang-tools-extra/clangd/SourceCode.h#L148) to convert into a clangd edit.
Thanks for the tips!

================
Comment at: clang-tools-extra/clangd/ParsedAST.cpp:695
+    auto MissingHeadersDiags =
+        issueMissingIncludesDiagnostics(Result, Inputs.Contents);
     Result.Diags->insert(Result.Diags->end(),
----------------
kadircet wrote:
> instead of introducing a new function here and duplicating logic what about something like:
> ```
> auto IncludeDiags = issueIncludeDiagnostics(...);
> ```
> 
> in IncludeCleaner.cpp:
> ```
> struct IncludeCleanerFindings {
>    std::vector<Inclusion*> Unused;
>    llvm::StringMap<std::vector<Range>> MissingIncludes; // Map from missing header spelling to ranges of references.
> };
> 
> IncludeCleanerFindings computeIncludeCleanerFindings(AST) {
>    auto Incs = convertIncludes();
>    walkUsed(...., {
>         // Update Missing includes, mark relevant main file includes as used.
>    });
>    // Create the unused includes set
>    return Findings;
> }
> 
> 
> std::vector<Diag> issueIncludeDiagnostics(...) {
>   std::vector<Inclusion*> UnusedIncludes;
>   llvm::StringMap<std::vector</*Location of reference*/clangd::Range>> MissingIncludes;
>   if (UnusedIncludePolicy is Strict) {
>      UnusedIncludes = computeUnusedIncludes(...);
>   } else if(UnusedIncludesPolicy is Experiment || MissingIncludesPolicy is Strict) {
>      auto Findings = computeIncludeCleanerFindings(AST);
>      if(UnusedIncludesPolicy is Experiment) {
>            UnusedIncludes = std::move(Findings.Unused);
>      }
>      if(MissingIncludesPolicy is Strict) {
>             MissingIncludes = std::move(Findings.Missing);
>      }
>   }
>   for(auto &Inc: UnusedIncludes) {
>        ... // create unused include diag.
>   }
>   for(auto &Missing: MissingIncludes) {
>        // create missing include diag.
>   }
>   return Diags;
> }
> ```
All right, it's probably a good idea to turn these two analyses into one.
The resulting code is somewhat tangled, though, and I am not sure how to make it simpler. Let's hope you have more ideas or are happy with the current state.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D143496/new/

https://reviews.llvm.org/D143496