[PATCH] D139921: [include-cleaner] Ranking of providers based on hints

Kadir Cetinkaya via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Jan 10 01:58:44 PST 2023


kadircet marked 26 inline comments as done.
kadircet added inline comments.


================
Comment at: clang-tools-extra/include-cleaner/lib/AnalysisInternal.h:84
 
+/// Represents properties of a symbol provider.
+enum class Hint : uint8_t {
----------------
sammccall wrote:
> along with `SymbolLocation`, I think mixing our fundamental types and analysis functions in this header makes the structure harder to understand and think they both belong in `Types.h`
i still don't feel great about exposing these types as public API. would it help if I moved these into a `TypesInternal.h` ?


================
Comment at: clang-tools-extra/include-cleaner/lib/AnalysisInternal.h:85
+/// Represents properties of a symbol provider.
+enum class Hint : uint8_t {
+  None = 0x00,
----------------
sammccall wrote:
> hokein wrote:
> > IIUC, we're using a union bit for symbols and headers: some properties are for both (e.g. Complete); some properties are just for headers (e.g. Public, Canonical).
> > 
> > I found that this model is hard to follow (e.g. for each property, I need to keep thinking about whether this property is for just symbols/headers or both, the hint conversion step from Symbol to Header is not obvious).
> > 
> > What do you think about the following alternative ?
> > 
> > - define two hints (`SymbolHint`, `HeaderHint`)
> > - an explicit conversion from `SymbolHint` to `HeaderHint`
> > - `SymbolLocation` class has a `SymbolHint` member (for `Header` as well)
> > 
> > 
> > 
> > 
> I agree the model isn't obvious, but I think it is the right one and **should be explained**.
> Breaking into a bunch of different types would be a mistake IMO. It would make it look less like algebra and more like domain logic. The problem is the algebra is important, e.g. it tells you how to merge hints to rank providers.
> 
> I think the model is:
>  - we have a "provides" DAG, headers provide locations which provide symbols which satisfy AST nodes
>  - these hints are "capabilities" that can be provided by edges (or subpaths).
>  - if **any** path from Header => AST Node sees a capability, the header provides that capability.
>  - this explains why most hints are expressed positively, and we take the union. (if a header provides a complete definition and an incomplete one, we want to call that "complete")
Yes, as discussed offline and Sam mentions in the comment idea is to have all hints expressed positively and take union of all the capabilities on the path from an AST node to a provider (also union in case there are multiple paths ending at the same provider). Each hint is to be provided by at most a single stage of the traversal (not sure why you think Complete is for both), I'll make this more concrete by renaming hint types to encode that in the name. I'll also add some docs explaining the model.


================
Comment at: clang-tools-extra/include-cleaner/lib/AnalysisInternal.h:90
+  Complete = 0x01,
+  /// Provides a non-private declaration for the symbol. Only absent in the
+  /// cases where file is explicitly marked as such, e.g. non self-contained or
----------------
sammccall wrote:
> hokein wrote:
> > I think this is no non-private declaration. The `Public` property is for headers, it indicates the symbol is provided by a non-private header.
> I agree with this one: because hints aggregate as union, it would be a mistake to flag a Symbol as Public - that would mean any header that provided it would end up with the `Public` hint, even if the header file is private!
> 
> I wonder if we should use a naming-convention for this, e.g. `CompleteSymbol`, `PublicHeader` etc, indicating the layer at which the signal is expected to be set.
i agree with renaming these to indicate the stage that produced them.


================
Comment at: clang-tools-extra/include-cleaner/lib/AnalysisInternal.h:93
+  /// IWYU private pragmas.
+  Public = 0x02,
+  /// Declaration is explicitly marked as canonical, e.g. with a IWYU private
----------------
hokein wrote:
> In most cases, this bit is set. If we use a flipped bit (`Private`), we don't need to set it in many places.
as discussed, these need to be positively expressed (like others).


================
Comment at: clang-tools-extra/include-cleaner/lib/FindHeaders.cpp:89
+    if (!PI)
+      return {{FE, Hint::Public}};
     while (FE) {
----------------
sammccall wrote:
> why never canonical?
> 
> In general, handling `!PI` as a special case up front seems like a bad idea: the purpose of supporting it being nullable is to allow the code below to used/tested without one. Seems we can get away with guarding PI-bits with `if (PI)` below.
> (Avoiding the ternary initializer for `CurrentHints` would be a readability side-benefit!)
in the absence of `PI`, we can't really do much in the loop below (as we would assume each file is self-contained by default) and would bail out in the first run. Hence i don't see much difference between handling it here vs right after the calculation of CurrentHints down below (I am slightly inclined towards this version as it prevents thinking about both cases in the loop for no reason). I can move it into the body if you feel strongly about it though.

regarding canonicalness, right now it's inferred either by namematch (which is calculated in headersForSymbol) or via IWYU pragmas, which isn't available in absence of PI.


================
Comment at: clang-tools-extra/include-cleaner/lib/FindHeaders.cpp:97
       for (const auto *Export : PI->getExporters(FE, SM.getFileManager()))
-        Results.push_back(Header(Export));
+        Results.emplace_back(Export, CurrentHints);
 
----------------
sammccall wrote:
> hokein wrote:
> > I think `Exporter` is the case we need to handle -- the exporter header should have a higher value (we probably need to add another hint bit).
> I think we discussed and thought that was a bit unsafe. `export` can simply mean "downstream code can assumes this symbol is available", not "i'm the real owner of these symbols". 
> 
> Prominent example: https://github.com/protocolbuffers/protobuf/blob/14e579436fb53d0e8271c4d26d3d88c394d96454/src/google/protobuf/stubs/port.h#L90
> (it's not relevant that the exported header in question is a standardish one)
agreed, `export`s shouldn't really be boosted.


================
Comment at: clang-tools-extra/include-cleaner/lib/FindHeaders.cpp:100
+      if (auto Verbatim = PI->getPublic(FE); !Verbatim.empty()) {
+        Results.emplace_back(Verbatim, CurrentHints | Hint::Canonical);
         break;
----------------
sammccall wrote:
> hokein wrote:
> > 
> > if FE is `private`, do we want  the `Verbatim` set the public bit? My inclination is yes (the current code behavior is no), it feels counterintuitive that  headers from `getPublic()` is not marked as `Public`.
> +1, conceptually this is a reference to a sibling edge in the DAG, not an edge chaining off this one, so it should not inherit hints
oops you're right. this was an oversight. same applies to exporters.


================
Comment at: clang-tools-extra/include-cleaner/lib/FindHeaders.cpp:157
+  llvm::StringRef DeclName(SymbolName);
+  if (auto LastQualifier = DeclName.rfind("::"); LastQualifier != DeclName.npos)
+    DeclName = DeclName.drop_front(LastQualifier);
----------------
sammccall wrote:
> BTW This stringify-and-parse isn't quite correct, e.g. for `operator std::string` it yields `operator string` but for `operator typename T::foo()` it yields `foo`!
> 
> (I realize we do this in clangd, which isn't great...)
i think using the identifier name should be fine here, operators will be unnamed and won't get any name match.


================
Comment at: clang-tools-extra/include-cleaner/unittests/FindHeadersTest.cpp:161
 
+class HeadersForSymbolTest : public FindHeadersTest {
+protected:
----------------
sammccall wrote:
> sammccall wrote:
> > hokein wrote:
> > > I think we should have some tests for the hints bit (verify the hint bits are set correctly).
> > these tests seem to be in the wrong file
> yes, I think this will be easier to maintain & make complete if you test the hints extracted, test the sort order, and only smoke-test the interaction between the two
> these tests seem to be in the wrong file

I've chosen this one as the implementation lives in `FindHeaders.cpp`, do you think it belongs to AnalysisTests instead?


================
Comment at: clang-tools-extra/include-cleaner/unittests/FindHeadersTest.cpp:266
+  EXPECT_THAT(headersForFoo(),
+              ElementsAre(physicalHeader("test/foo.proto.h"),
+                          physicalHeader("public_complete.h")));
----------------
sammccall wrote:
> hokein wrote:
> > The result is correct according to the `(canonical, public, complete)` triple order.
> > 
> > But we can come up with a counterexample, e.g. 
> >  
> > ```
> > // transitive.h
> > #include "public_complete.h"
> > 
> > // main.cpp
> > #include "transitive.h"
> > Foo foo;
> > ```
> > 
> > We will remove the `transitive.h` and insert the `foo.proto.h`, then we cause a compiling error (incomplete type of `Foo`). Not sure what we should do about it.
> yeah, this is one of the tradeoffs of not examining completeness requirements at use sites.
> 
> However it's also a reason to believe that `operator<` on `Hints` may not be a simple lexicographic compare, several "lesser" hints may be able to overpower a "greater" one ==> another reason to make the comparison public & test it.
in addition to what Sam said, note that we'll preserve `public_complete.h` if it's already included directly.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139921/new/

https://reviews.llvm.org/D139921



More information about the cfe-commits mailing list