[PATCH] D139921: [include-cleaner] Ranking of providers based on hints

Sam McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Jan 19 07:27:46 PST 2023


sammccall accepted this revision.
sammccall added a comment.
This revision is now accepted and ready to land.
Herald added a subscriber: ChuanqiXu.

LG, just nits really!



================
Comment at: clang-tools-extra/include-cleaner/include/clang-include-cleaner/Types.h:133
+
+  Header(std::in_place_t, decltype(Storage) Sentinel)
+      : Storage(std::move(Sentinel)) {}
----------------
this is a bit confusing... kind of the opposite of what `std::in_place` means?
(All of the other constructors build the state in-place in some sense, but this one doesn't)

Maybe just a private `struct Sentinel{};` instead of in_place_t?

Symbol does this without any sentinel at all, if that's causing ambiguity then it might be worth fixing both to be the same.


================
Comment at: clang-tools-extra/include-cleaner/lib/AnalysisInternal.h:86
+///
+/// Hints represents the properties of the edges touched when finding headers
+/// that satisfy an AST node (AST node => symbols => locations => headers).
----------------
nit: touched => traversed? Since you haven't explicitly mentioned a graph here (which is fine), hint at it more specifically


================
Comment at: clang-tools-extra/include-cleaner/lib/AnalysisInternal.h:91
+/// to merge hints. These hints are merged by taking the union of all the
+/// properties along all the paths, hence these are all expressed positively.
+///
----------------
"positive" makes intuitive sense, but not *quite* the right concept. (existential vs universal quantification)

Maybe just give an example:
```
We choose the boolean sense accordingly, e.g. "Public" rather than "Private",
because a header is good if it provides any public definition, even if it also provides private ones.
```


================
Comment at: clang-tools-extra/include-cleaner/lib/AnalysisInternal.h:94
+/// Hints are sorted in ascending order of relevance.
+enum class Hint : uint8_t {
+  None = 0x00,
----------------
nit: maybe `Hints` over `Hint` to indicate it's a bitset


================
Comment at: clang-tools-extra/include-cleaner/lib/AnalysisInternal.h:110
+LLVM_ENABLE_BITMASK_ENUMS_IN_NAMESPACE();
+/// A wrapper to augment types with hints.
+template <typename T> struct Hinted : public T {
----------------
nit: augment **values** with hints? (or "augment types with `Hint`", but I think values is clearer)


================
Comment at: clang-tools-extra/include-cleaner/lib/AnalysisInternal.h:84
 
+/// Represents properties of a symbol provider.
+enum class Hint : uint8_t {
----------------
kadircet wrote:
> sammccall wrote:
> > along with `SymbolLocation`, I think mixing our fundamental types and analysis functions in this header makes the structure harder to understand and think they both belong in `Types.h`
> i still don't feel great about exposing these types as public API. would it help if I moved these into a `TypesInternal.h` ?
`AnalysisInternal` < `TypesInternal` < `Types` I think, so yes

(I think separating the concepts/data structures from the algorithm/implementation is the important concern here. I don't think defensively minimizing surface area is important as the library simply won't have many users, and that exposing these is a net win because it makes the structure of the system easier for human readers to understand. Anyway, up to you)


================
Comment at: clang-tools-extra/include-cleaner/lib/AnalysisInternal.h:95
+  /// Declaration is explicitly marked as canonical, e.g. with a IWYU private
+  /// pragma that points at this provider.
+  Canonical = 0x04,
----------------
sammccall wrote:
> hokein wrote:
> > This comment is hard to follow, it doesn't explain the `Canonical`. From the design doc, it means `NameMatch` and `Annotated` headers, I think it would be nice to document them in the comment.
> > 
> > I'd suggest avoid using the `Canonical` name, it remains me too much about the canonical decl in clang. Maybe `Preferred`?
> +1 to Preferred and also documenting the conditions when a signal is set here in some detail (also for Public and Complete).
> 
> I think we really have to understand what the signal is quite precisely to make correct use of it.
nit: the conditions seem well-documented now, but I think we should drop the `e.g.` these should be the actual definitions, not just examples.


================
Comment at: clang-tools-extra/include-cleaner/lib/FindHeaders.cpp:38
+llvm::SmallVector<Header> ranked(llvm::SmallVector<Hinted<Header>> Headers) {
+  llvm::stable_sort(Headers,
+                    [](const Hinted<Header> &LHS, const Hinted<Header> &RHS) {
----------------
just stable_sort(reverse(Headers))?

(reverse() returns an rbegin/rend view which should DTRT)


================
Comment at: clang-tools-extra/include-cleaner/lib/FindHeaders.cpp:46
+llvm::SmallVector<Hinted<Header>>
+nameMatch(llvm::StringRef DeclName, llvm::SmallVector<Hinted<Header>> Headers) {
+  for (auto &H : Headers) {
----------------
this signature seems unneccesarily complicated/hard to understand

what about `nameMatch(DeclName, Header) -> Hint` (or `->bool`) and write the trivial loop in the caller?


================
Comment at: clang-tools-extra/include-cleaner/lib/FindHeaders.cpp:60
+    }
+    llvm::errs() << "Checking name match for: " << SpelledH;
+    SpelledH = SpelledH.trim("<>\"");
----------------
debug stuff, and below


================
Comment at: clang-tools-extra/include-cleaner/lib/FindHeaders.cpp:61
+    llvm::errs() << "Checking name match for: " << SpelledH;
+    SpelledH = SpelledH.trim("<>\"");
+    if (auto LastSlash = SpelledH.rfind('/'); LastSlash != SpelledH.npos)
----------------
even though it's a few lines, if you pull out `StringRef basename(StringRef Header)` this code looks a lot clearer (and if returning `bool`, directly `return basename(H.physical()->getName()).equals_insensitive(DeclName)` etc inline above)


================
Comment at: clang-tools-extra/include-cleaner/lib/FindHeaders.cpp:118
     for (const auto &H : Loc.standard().headers())
-      Results.push_back(H);
+      Results.emplace_back(H, Hint::PreferredHeader | Hint::PublicHeader);
     return Results;
----------------
headers() returns the alternatives in preferred order, so you only want to set PreferredHeader on the first.


================
Comment at: clang-tools-extra/include-cleaner/lib/FindHeaders.cpp:130
+  for (auto &Loc : locateSymbol(S))
+    Headers.append(applyHints(findHeaders(Loc, SM, PI), Loc.Hints));
+  // If two Headers probably refer to the same file (e.g. Verbatim(foo.h) and
----------------
Somewhere say what you're doing: deduplicating headers and merging their hints


================
Comment at: clang-tools-extra/include-cleaner/lib/Types.cpp:114
+  case Header::Physical:
+    return physical() < RHS.physical();
+  case Header::Standard:
----------------
I think comparing pointers is going to give inconsistent results, and would suggest comparing Name (without trying to normalize, just for at least reproducibility across identical runs)


================
Comment at: clang-tools-extra/include-cleaner/unittests/FindHeadersTest.cpp:256
+
+TEST_F(HeadersForSymbolTest, RankingPreservesDiscoveryOrder) {
+  Inputs.Code = R"cpp(
----------------
I don't think it does, it looks like you're sorting by the `FileEntry` pointers - this may happen to be discovery order a lot of the time but it's of course not guaranteed.


================
Comment at: clang-tools-extra/include-cleaner/unittests/FindHeadersTest.cpp:161
 
+class HeadersForSymbolTest : public FindHeadersTest {
+protected:
----------------
kadircet wrote:
> sammccall wrote:
> > sammccall wrote:
> > > hokein wrote:
> > > > I think we should have some tests for the hints bit (verify the hint bits are set correctly).
> > > these tests seem to be in the wrong file
> > yes, I think this will be easier to maintain & make complete if you test the hints extracted, test the sort order, and only smoke-test the interaction between the two
> > these tests seem to be in the wrong file
> 
> I've chosen this one as the implementation lives in `FindHeaders.cpp`, do you think it belongs to AnalysisTests instead?
Nope, I think I was just confused, sorry


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139921/new/

https://reviews.llvm.org/D139921



More information about the cfe-commits mailing list