[PATCH] D108194: [clangd] IncludeCleaner: Mark used headers

Wed Aug 18 03:23:38 PDT 2021

sammccall added a comment.

I realize many of the things I'll object to came from my own prototype, sorry about that :-\
I think/hope I gave you some forewarning about this!

================
Comment at: clang-tools-extra/clangd/Headers.h:137
+  // Maps including files (from) to included files (to).
+  using AbstractIncludeGraph = llvm::DenseMap<unsigned, SmallVector<unsigned>>;
+  // Decides usage for each file included by EntryPoint based on the set of
----------------
Hey, I recognize this code :-)

I think the key ideas here are that:
 - we're using opaque identifiers for the files, nothing is interesting about a file other than its (include) edges and (directly-referenced) color
 - the identity of headers is an impl detail of Headers.h rather than being something like a FileID, this allows us to hide messy details of files not having stable identity across preamble->main file

I think the second idea is important, but the first one might be a bit naive. I worry it's going to lead to certain rules being hard to implement, or being bundled into Headers.cpp instead of IncludeCleaner.

For example:
 - if a file is not self-contained, how does this affect the algorithm? (There's a FIXME for this, but it's in the wrong file!)
 - if a file is a standard library entrypoint?
 - if a file is a standard library impl detail?

I think the facts (is a file self-contained) should be part of IncludeStructure, but that we should expose them for IncludeCleaner to deal with, rather than trying to hide them in `markUsed`.
This means giving IncludeStructure a wider interface, which is we need to be careful about. It makes it harder to substitute algorithms by swapping out the UsedFunc, but I think this is only a cute trick and not actually important.

---

Concretely, I think I'd suggest just extending the public API to expose the "file index" concept:
 - expose the type `using IncludeStructure::File = unsigned` or so
 - add the file id to `Inclusion`
 - add File getFile(const FileEntry*)
 - add `ArrayRef<File> getIncludedFiles(File)`
 - in future, we can add e.g. `const char* isStandardLibraryEntrypoint(File)` or whatever

And implement all of markUsed in IncludeCleaner.
(Not 100% sure if we actually still need the `Used` ivar in Inclusion, come to think of it, maybe we just run this code in the diagnostic cycle)

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:144
+  std::vector<SourceLocation> Sorted{Locs.begin(), Locs.end()};
+  // Group by FileID.
+  llvm::sort(Sorted);
----------------
nit: move this to be a line comment on the sort() call?

I think it's sufficiently nonobvious that sorting groups by file ID that it becomes nonobvious exactly what code the comment refers to!

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.h:50
+/// Retrieves IDs of all files containing SourceLocations from \p Locs. Those
+/// locations could be within macro expansions and are not resolved to their
+/// spelling locations.
----------------
so when *do* we perform this expansion?

Seems like you've wired this up end-to-end in this patch and we're just going to hit the elog case.

I think it's reasonable to put this expansion into findReferencedFiles fwiw, it's a fairly simple second pass and in practice combining them won't interfere with reasonable tests

================
Comment at: clang-tools-extra/clangd/IncludeCleaner.h:51
+/// locations could be within macro expansions and are not resolved to their
+/// spelling locations.
+llvm::DenseSet<FileID> findReferencedFiles(const ReferencedLocations &Locs,
----------------
comment just says spelling, not spelling/expansion, not sure if this is significant.

Expansion is actually the more obvious, but I do think we need both.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D108194/new/

https://reviews.llvm.org/D108194