[PATCH] D108194: [clangd] IncludeCleaner: Mark used headers
Sam McCall via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Wed Aug 18 03:23:38 PDT 2021
sammccall added a comment.
I realize many of the things I'll object to came from my own prototype, sorry about that :-\
I think/hope I gave you some forewarning about this!
================
Comment at: clang-tools-extra/clangd/Headers.h:137
+ // Maps including files (from) to included files (to).
+ using AbstractIncludeGraph = llvm::DenseMap<unsigned, SmallVector<unsigned>>;
+ // Decides usage for each file included by EntryPoint based on the set of
----------------
Hey, I recognize this code :-)
I think the key ideas here are that:
- we're using opaque identifiers for the files, nothing is interesting about a file other than its (include) edges and (directly-referenced) color
- the identity of headers is an impl detail of Headers.h rather than being something like a FileID, this allows us to hide messy details of files not having stable identity across preamble->main file
I think the second idea is important, but the first one might be a bit naive. I worry it's going to lead to certain rules being hard to implement, or being bundled into Headers.cpp instead of IncludeCleaner.
For example:
- if a file is not self-contained, how does this affect the algorithm? (There's a FIXME for this, but it's in the wrong file!)
- if a file is a standard library entrypoint?
- if a file is a standard library impl detail?
I think the facts (is a file self-contained) should be part of IncludeStructure, but that we should expose them for IncludeCleaner to deal with, rather than trying to hide them in `markUsed`.
This means giving IncludeStructure a wider interface, which is we need to be careful about. It makes it harder to substitute algorithms by swapping out the UsedFunc, but I think this is only a cute trick and not actually important.
---
Concretely, I think I'd suggest just extending the public API to expose the "file index" concept:
- expose the type `using IncludeStructure::File = unsigned` or so
- add the file id to `Inclusion`
- add File getFile(const FileEntry*)
- add `ArrayRef<File> getIncludedFiles(File)`
- in future, we can add e.g. `const char* isStandardLibraryEntrypoint(File)` or whatever
And implement all of markUsed in IncludeCleaner.
(Not 100% sure if we actually still need the `Used` ivar in Inclusion, come to think of it, maybe we just run this code in the diagnostic cycle)
================
Comment at: clang-tools-extra/clangd/IncludeCleaner.cpp:144
+ std::vector<SourceLocation> Sorted{Locs.begin(), Locs.end()};
+ // Group by FileID.
+ llvm::sort(Sorted);
----------------
nit: move this to be a line comment on the sort() call?
I think it's sufficiently nonobvious that sorting groups by file ID that it becomes nonobvious exactly what code the comment refers to!
================
Comment at: clang-tools-extra/clangd/IncludeCleaner.h:50
+/// Retrieves IDs of all files containing SourceLocations from \p Locs. Those
+/// locations could be within macro expansions and are not resolved to their
+/// spelling locations.
----------------
so when *do* we perform this expansion?
Seems like you've wired this up end-to-end in this patch and we're just going to hit the elog case.
I think it's reasonable to put this expansion into findReferencedFiles fwiw, it's a fairly simple second pass and in practice combining them won't interfere with reasonable tests
================
Comment at: clang-tools-extra/clangd/IncludeCleaner.h:51
+/// locations could be within macro expansions and are not resolved to their
+/// spelling locations.
+llvm::DenseSet<FileID> findReferencedFiles(const ReferencedLocations &Locs,
----------------
comment just says spelling, not spelling/expansion, not sure if this is significant.
Expansion is actually the more obvious, but I do think we need both.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D108194/new/
https://reviews.llvm.org/D108194
More information about the cfe-commits
mailing list