[PATCH] D136723: [include-cleaner] Record main-file macro occurences and includes

Kadir Cetinkaya via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Oct 27 02:14:57 PDT 2022


kadircet added inline comments.


================
Comment at: clang-tools-extra/include-cleaner/include/clang-include-cleaner/Record.h:54
+struct RecordedPP {
+  // The callback (when installed into clang) tracks macros/includes in this.
+  std::unique_ptr<PPCallbacks> record(const Preprocessor &PP);
----------------
s/this/the main file/


================
Comment at: clang-tools-extra/include-cleaner/include/clang-include-cleaner/Record.h:75
+    //  - for a logical file like <vector>, we check Spelled
+    llvm::SmallVector<const Include *> match(Header H) const;
+
----------------
hokein wrote:
> sammccall wrote:
> > in the prototype I reimplemented this function in clangd, but I expect we can just reuse the RecordedIncludes class instead, copying from clangd's includes list is cheap.
> > 
> > (That's one argument for putting this in a different header, which I can do already if you prefer)
> that's an interesting idea, but we need to do it carefully because of `FileEntry*`, the `Include` structure has a filed of `FileEntry*`, it is not feasible for clangd to propagate it (clangd's `Inclusion` doesn't have it, instead it uses a `HeaderID` which is based on the `fs::UniqueID` to identify a physical file).
> 
> But I think this is not something we should worry about at the moment, and the current interfaces look quite good.
right, let's keep it to the minimum now. we can extend/move-around later on


================
Comment at: clang-tools-extra/include-cleaner/include/clang-include-cleaner/Record.h:81
+    llvm::StringMap<llvm::SmallVector<unsigned>> BySpelling;
+    llvm::DenseMap<const FileEntry *, llvm::SmallVector<unsigned>> ByFile;
+  } Includes;
----------------
it feels like having a set of includes corresponding to the same fileentry, rather than a single one, feels subtle enough to deserve a comment.

the only case i can think of is, two logically different files resolving to same physical file (i.e. symlinks). these will actually have completely different spellings, and probably it'll make things hard when deciding which include to "keep" or "insert". are there other cases where we can have multiple includes corresponding to the same fileentry?


================
Comment at: clang-tools-extra/include-cleaner/include/clang-include-cleaner/Types.h:80
+// Indicates that a piece of code refers to a symbol.
+struct SymbolReference {
+  // The symbol referred to.
----------------
hokein wrote:
> thinking more about it (no action required) -- this structure seems a natural good fit to be used in `UsedSymbolCB` in `Analysis.h` rather than using ` SourceLocation RefLoc, Symbol Target` separately.
> 
agreed. i'll adjust that with the refkind patch


================
Comment at: clang-tools-extra/include-cleaner/include/clang-include-cleaner/Types.h:117
+  llvm::StringRef Spelled;             // e.g. vector
+  const FileEntry *Resolved = nullptr; // e.g. /path/to/c++/v1/vector
+  SourceLocation HashLocation;         // of hash in #include <vector>
----------------
i think it's worth leaving a comment here mentioning that this can be null (couldn't resolve, or maybe we shouldn't be caring, e.g. logical stl include).

even better, maybe we should actually have a `Header` here? that way this would be conceptual equivalent of SymbolReference, but for includes?


================
Comment at: clang-tools-extra/include-cleaner/include/clang-include-cleaner/Types.h:41
+struct Macro {
+  IdentifierInfo *Name;
+  // The location of the Name where the macro is defined.
----------------
what about just a stringref to name?


================
Comment at: clang-tools-extra/include-cleaner/include/clang-include-cleaner/Types.h:42
+  IdentifierInfo *Name;
+  // The location of the Name where the macro is defined.
+  SourceLocation Definition;
----------------
triple slashes


================
Comment at: clang-tools-extra/include-cleaner/include/clang-include-cleaner/Types.h:46
+  bool operator==(const Macro &S) const {
+    return Name == S.Name && Definition == S.Definition;
+  }
----------------
OOC: can we really have multiple macro identifiers defined at the same source location? I guess it's cheap to compare another pointer, but might be easier if we actually didn't make the identifierinfo part of the identity.


================
Comment at: clang-tools-extra/include-cleaner/lib/Record.cpp:139
+  case Header::Physical:
+    for (unsigned I : ByFile.lookup(H.physical()))
+      Result.push_back(&All[I]);
----------------
we might have different handles (fileentry*) to the same file (if we end up having multiple filemanagers in action, which is unlikely as things stand today, but still). so what about using uniqueids for comparison here?

AFAICT, filemanager won't have multiple `fileenrty`s with the same `uniqueid` anyway. so we wouldn't really lose any generality.


================
Comment at: clang-tools-extra/include-cleaner/lib/Record.cpp:28
+                           FileID PrevFID) override {
+    Active = SM.isWrittenInMainFile(Loc);
+  }
----------------
we can keep a include depth based on the `Reason` and bail out on other operations when depth is not `1` (getFileID on Loc here probably hits the SourceManager cache anyways, so might not be important, but i feel like it's better to not rely on that).


================
Comment at: clang-tools-extra/include-cleaner/unittests/RecordTest.cpp:122
+// Matches an Include with a particular spelling.
+MATCHER_P(spelled, S, "") { return arg.Spelled == S; }
+
----------------
i know these look more like functions, but convention is to use `PascalCase` for matchers, so `Spelled` instead?


================
Comment at: clang-tools-extra/include-cleaner/unittests/RecordTest.cpp:186
+      Header.point("def"));
+  ASSERT_NE(Recorded.MacroReferences.size(), 0u);
+  Symbol OrigX = Recorded.MacroReferences.front().Symbol;
----------------
nit: `ASSERT_FALSE(Recorded.MacroReferences.empty())`


================
Comment at: clang-tools-extra/include-cleaner/unittests/RecordTest.cpp:213
+// Matches an Include* on the specified line;
+MATCHER_P(line, N, "") { return arg->Line == (unsigned)N; }
+
----------------
same as above, `Line`?


================
Comment at: clang-tools-extra/include-cleaner/unittests/RecordTest.cpp:223
+
+  RecordedPP::RecordedIncludes Includes;
+  Includes.add(Include{"a", A, SourceLocation(), 1});
----------------
can you also add an include with duplicated spelling to test spelling to set of includes mapping?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D136723/new/

https://reviews.llvm.org/D136723



More information about the cfe-commits mailing list