[PATCH] D138200: [include-cleaner] Make use of locateSymbol in WalkUsed

Tue Dec 6 02:00:03 PST 2022

sammccall added inline comments.

================
Comment at: clang-tools-extra/include-cleaner/lib/Analysis.cpp:44
+
+  // Cache for decl to header mappings, as the same decl might be referenced in
+  // multiple locations and finding providers for each location is expensive.
----------------
kadircet wrote:
> hokein wrote:
> > I agree the place here might be on a performance-critical path, but I think the idea of using cache probably needs some bits of design, the current implementation seems half baked (there is also a FIXME in the findAllHeaders saying we should implement another cache for it, so it is unclear to me what's the whole picture).
> > 
> > Can we just leave out all cache in this patch with a FIXME? And having a follow-up patch for that?
> i agree that the cache for `findHeaders` is a bit more nuanced, as it'll need to have assumptions about a given `SymbolLocation`, no matter how/where it was acquired will return the same results.
> but i am not sure what's complicated/unclear about the cache for decl to headers. The benefit is clear as explained in the comment around saving computations every time a `foo` is referenced inside the file.
> Since both the benefit is clear and the assumption around "a `decl` having the same set of providers" sounds simple and sane enough, i'd actually keep this one in.
I also think this needs a bit more examination. It seems cheap, but there are several overlapping concerns and addressing the first is always cheap but makes the others more expensive.

e.g. here, it's not obvious which of these is most desirable and therefore should probably be done first:
 - caching symbol => header
 - caching (part of) location => header
 - caching/optimizing location => fileid
 - propagating signals across these steps
 - exposing location from the API
 - keeping the implementation simple

I don't personally have a strong intuition of which of these steps should be cached for performance, if any of them will really matter, and whether we're better off caching end-to-end (cache a longer computation, lower hit-rate) vs individual steps (small computation, higher hit-rate).

To me this seems premature without *at least* measuring that it improves performance, but ideally showing that it's better than caching at a different level.

================
Comment at: clang-tools-extra/include-cleaner/test/html.cpp:7
 int n = foo();
+// CHECK: Symbol{{.*}}foo
+// CHECK-NEXT: int foo()
----------------
I'd like to keep a very simple smoke test of the HTML report.

If you're aiming to test something specific here about the HTML report, I'd suggest creating a new test describing what it is. (Though generally speaking, the HTML test has poor test coverage of its details, which I think is unfortunately the right tradeoff)

If you're aiming to provide an integration test (these locations are taken into account in making overall decisions), I think this should be a test using `-print=changes` rather than -html.

================
Comment at: clang-tools-extra/include-cleaner/unittests/AnalysisTest.cpp:61
+    llvm::SmallVector<Decl *> TopLevelDecls;
+    for (Decl *D : AST.context().getTranslationUnitDecl()->decls())
+      TopLevelDecls.emplace_back(D);
----------------
shouldn't we have a location filter for the decls, to simulate RecordAST?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138200/new/

https://reviews.llvm.org/D138200