[PATCH] D120306: [clangd] IncludeCleaner: Add support for IWYU pragma private

Kirill Bobyrev via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Feb 22 05:22:41 PST 2022


kbobyrev added a comment.

In D120306#3337374 <https://reviews.llvm.org/D120306#3337374>, @sammccall wrote:

> In D120306#3337288 <https://reviews.llvm.org/D120306#3337288>, @kbobyrev wrote:
>
>> In D120306#3337212 <https://reviews.llvm.org/D120306#3337212>, @sammccall wrote:
>>
>>> What's the policy this patch intends to implement?
>>>
>>> I'm a little concerned by building up maps of filenames with segment limits - it looks like some kind of heuristic/partial matching.
>>
>> Basically, handling the mappings that. IWYU pragmas provide.
>
> Yes, I'm asking you to describe that handling in a couple of sentences so I can understand what you intend it to be.
>
> e.g. for includecleaner in general, the approach is: walk the AST, map each node to decls it uses, map decls to locations, map locations to files, then check which includes point to files that were not identified.

I'm trying to reuse the information about `IWYU pragma private` that is collected in `Canoncalncludes` to find the header responsible for including the private ones. Canonicalncludes stores information about the headers with `IWYU pragma: prviate, include <XYZ>`. When we're figuring out the responsible header in IncludeCleaner, I'm checking if the header has this pragma and then I'm trying to find the `<XYZ>` header that is mentioned in the pragma comment. For that I'm using a heuristic that the include stack of the main file should have a header whose suffix is <XYZ>. This because we never really have the full header name in the pragma comment and it might not be visible from the private header (and hence can not be resolved from there). I'm not sure if this heuristic will work well in practice, but I couldn't figure out a better one.

An alternative would probably be:

- Walk #include directives and try to find the "umbrella headers"
- For each include directive, query the included file and figure out if it has "IWYU pragma: private, include <XYZ>" through `CanonicalIncludes`
- If it does, figure out if the supposed umbrella header name has the `<XYZ>` suffix, in that case attribute the private header to this public one

This all would need to be done after `Canonicalncludes` are complete, meaning this would probably belong to `IncludeStructure` instead. Maybe this is more precise, WDYT?

>> Are you concerned about the limit itself or the way I'm trying to find these headers in general? I was afraid of consuming too much memory, hence the limit; it's not crucial to the implementation, though, I can remove it if your concern is the limit rather than the approach.
>
> Mostly I'm unclear on:
>
> - whether we're comparing filenames among a small (e.g. the include stack - heuristics OK) or a large set (need to be very precise)

Yes, I'm using just the include stack of the main file and the preamble. When you say "large set", do you mean the whole set of project headers?

The problem here is that for the include stack we would miss the right diagnostic if the public header responsible for the private one is not accessible. Maybe we should throw an "unused" warning either way here, I don't know what would be right.

> - whether the partial matching is semantically important or a performance optimitzation

The matching itself is an optimisation, right: what I'm dealing with is that `gtest.h` header is called `/home/user/llvm-project/llvm/util/googletest/.../gtest/gtest.h` but the private headers will only say `include "gtest/gtest.h"`. What I'm trying to do is find the header whose real path name ends with `gtest/gtest.h`.

> - whether the concept of "real name" is significant or likely to cause problems

Hm, I don't know; what kind of problems do you think might appear?

> - whether storing another copy of the names of all the transitive headers is actually necessary for what you're trying to do

The problem here is that I'll need `FileID` in `headerResponsbile` but I can't reuse `FileID`s between preamble and main file, right? Logically, I'm afraid I don't have an option to store anything other than the name (or an index to it); I could uplift this logic to the `IncludeStructure` and loop over all visible headers in there, I wouldn't need another copy since I'm already storing it there. But that might be quite slow.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120306/new/

https://reviews.llvm.org/D120306



More information about the cfe-commits mailing list