[PATCH] D73413: [clang-tidy] Add check to detect external definitions with no header declaration

Tony Lewis via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Feb 3 02:36:13 PST 2020


tonyelewis added inline comments.


================
Comment at: clang-tools-extra/clang-tidy/misc/MissingHeaderFileDeclarationCheck.cpp:75
+      continue;
+    if (AnyHeader || File->NamePart.equals_lower(ThisFile->NamePart))
+      return; // Found a good candidate for matching decl
----------------
njames93 wrote:
> tonyelewis wrote:
> > njames93 wrote:
> > > tonyelewis wrote:
> > > > njames93 wrote:
> > > > > gribozavr2 wrote:
> > > > > > njames93 wrote:
> > > > > > > gribozavr2 wrote:
> > > > > > > > njames93 wrote:
> > > > > > > > > gribozavr2 wrote:
> > > > > > > > > > This heuristic should be a lot more complex. In practice people have more complex naming conventions (for example, in Clang, Sema.h corresponds to many files named SemaSomethingSomething.cpp).
> > > > > > > > > > 
> > > > > > > > > > I think there is a high chance that checking only a header with a matching name will satisfy too few projects to be worth implementing.
> > > > > > > > > > 
> > > > > > > > > > On the other hand, if you could use C++ or Clang modules to narrow down the list of headers where the declaration should appear, that would be a much stronger signal.
> > > > > > > > > That is the reason I added the CheckAnyHeader option. For small projects the matching name would be usually be enough, but for big complex projects there is no feasible check that would work. Falling back to making sure every external definition has a declaration in at least one header will always be a good warning
> > > > > > > > That's the thing -- due to the lack of consistency in projects and style guides for C++, I think most projects will have to turn on CheckAnyHeader. We get implementation complexity in ClangTidy, false positives in high-profile projects, force users to configure ClangTidy to work well for their projects (unless we set CheckAnyHeader=1 by default... which then nobody would flip back to zero), and little benefit for end users.
> > > > > > > Would you say the best way would be check if the source file name starts with the header file name by default. Or is that very specific to Clang?
> > > > > > > 
> > > > > > > ```
> > > > > > > /// <SomeHeaderImpl.cpp>
> > > > > > > #include "SomeHeader.h"
> > > > > > > ```
> > > > > > > This file would check for declarations in `SomeHeader.h`
> > > > > > > 
> > > > > > > Or maybe check if the header file name starts with the source file?
> > > > > > > 
> > > > > > > ```
> > > > > > > /// <SomeHeader.cpp>
> > > > > > > #include "SomeHeaderImpl.h"
> > > > > > > ```
> > > > > > > This file would check for declarations in `SomeHeaderImpl.h`.
> > > > > > > Or even check both?
> > > > > > > Would you say the best way would be check if the source file name starts with the header file name by default. Or is that very specific to Clang?
> > > > > > 
> > > > > > I don't think it is too specific, it should cover many projects I think.
> > > > > > 
> > > > > > > Or maybe check if the header file name starts with the source file?
> > > > > > 
> > > > > > Seems like an unusual case to me, but I'm willing to be convinced otherwise.
> > > > > I thought it was an unusual case and wasn't thinking it would be a good idea to add. I'll just set it up so that it will always look in header files whose names appear at the start of the main source file.
> > > > FWIW, my instinct is that there are two separate questions:
> > > > 
> > > >  1. What sort of files should have their external-linkage definitions checked?
> > > >  2. What sort of included files should be part of the search for adequate declarations that match those definitions?
> > > > 
> > > > ...and that my answers, in reverse order, are:
> > > > 
> > > >  2. All included files should be included (ie a check for a given external-linkage definition should be appeased by _any_ matching declaration in _any_ included file)
> > > >  1. Only main files. And (to prevent lots of false-positives when the tool is run over headers), only "c", "cc", "cxx", "cpp" files by default but with an option to check in _all_ main files.
> > > > 
> > > > I think this has the merits that it:
> > > >  * allows users to put their valid declaration in files with any extension they like
> > > >  * defaults to only firing when run against c/cc/cxx/cpp files but provides the option to fire when run against a file with _any_ extension
> > > > 
> > > > 
> > > > So I propose that there be one option and it be as follows:
> > > > 
> > > > 
> > > > > .. option:: CheckExtDefnsInAllMainFiles
> > > > > 
> > > > > If set to `0` only check external-linkage definitions in main files with typical source-file extensions ("c", "cc", "cxx", "cpp").
> > > > > If set to `1` check external linkage definitions in all main files.
> > > > > Default value is `0`.
> > > So current behaviour is kind of explained that you can either search in headers that are a starting substring of the source file or just search for all headers, I tried to explain it in the documentation. By default it only checks for definitions in files with the extension `c`, `cc`, `cpp` or `cxx`. I could add an option to check all definitions expanded in the main file, regardless of the extension.
> > Yep - I //think// I understand the current behaviour (but could well be wrong about that of course :) ).
> > 
> > But IIUC, I share @gribozavr2's concerns that the current behaviour's default of only accepting matching declarations in included files //that match a specific pattern// is going to trigger a lot of false-positives in a lot of projects. And I don't think this check should care about //where// the matching declaration is found, so long as there is such a declaration in a file included within the TU.
> > 
> > To my thinking, the implication of having a default policy that restricts the names of files in which we'll accept matching declarations is that we're policing the way that users name and organise their files (by spamming them with lots of false-positives if they don't do it in a way that satisfies our rule). I'd argue that file naming/organisation doesn't belong as part of this check (or even as part of any other clang-tidy check).
> > 
> > The other half of my previous comment was about a `CheckExtDefnsInAllMainFiles` option but I think that that's less important and that having a restriction to `c`, `cc`, `cpp` and `cxx` seems pretty reasonable to me—it could always be broadened in the future.
> > 
> > Overall, I think it's better to err on the side of //not// triggering the warning so as to avoid excessive false-positives; the check can always be made more sensitive in the future. I see the implications of that stance as being:
> >  * default to only checking the definitions in files that look like source files (so I think restricting to `c`, `cc`, `cpp` and `cxx` is good)
> >  * default to accepting matching declarations in all included files.
> > 
> The reason for the unrelated header check is to prevent potential bugs. If a declaration is in a completely unrelated header that happens to get included, it more than likely could be a collision which could then break ODR rules during linking. Perhaps a better thing to do would be if the decl isn't in a "corresponding" header, emit a warning, but say there was a decl found in `example.h`
Thanks.

When you say "warning", do you mean something different from a check violation? Does that mean the check isn't fired but a warning message is printed?

Either way, my feeling is that it would be better to avoid diagnosing this situation (of a definition in a source-file matching a declaration in a source file with a filename stem that isn't a leading substring of the source file's name) at all.

One minor aspect: if this ODR-motivated diagnostic message is tied to the `misc-missing-header-file-declaration check`, that has the potential to be confusing to users. The existing check is about one specific issue ("you said you want X to be accessible in multiple TUs but you haven't provided a declaration for other TUs to share") which IMHO would be muddied by being conflated with a quite different issue ("the names of the files containing the definition and declaration of X are quite different, which we think smells a bit like an ODR violation").

Another issue: it feels like an inappropriate thing to be enforcing. C++ gives users complete freedom about the way they name their source-code files (indeed I don't think the standard even assumes source-code comes in files) and I don't think there's anything like universally agreed upon best-practice that deviating from my_stuff.cpp / my_stuff.hpp pairs is always bad. So it feels wrong to start making life difficult for users who deviate from that pattern for any reason.

And I think the rest of my concern relates to two empirical questions:
 * what fraction of the all such instances in the world's source code are actually OK (not an ODR violation)?
 * what fraction of projects contain such instances?

If I was convinced these numbers were both really low, I'd feel much more comfortable with clang-tidy firing warnings/check-violations. In that world, we could say: even though it's not //wrong// to deviate from matching my_stuff.cpp / my_stuff.hpp pairs, it's worth //complete// adherence to that pattern for the resultant ODR-violation-detection gains. (Side note: another empirical question is how many ODR-violations this check would detect that aren't already caught by other ODR-violation detecting tools, eg linkers' and [[ https://github.com/google/sanitizers/wiki/AddressSanitizerOneDefinitionRuleViolation | ASan's ]])

But my instinct is that those fractions are both pretty high: that lots of projects involve some file-naming structures that would fall afoul of the rule and cause lots of false positives.

In other words, my instinct is that this pattern isn't very strong evidence of an ODR violation and that the line-noise/inconvenience cost would far outweigh the ODR-detecting benefit.

So I think that the check should always accept matching declarations in //any// header (or at least that that should be the default policy and that users explicitly opt in to the stricter policy).

Again: thanks for all the work on this check - it's really great and I really appreciate it.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D73413/new/

https://reviews.llvm.org/D73413





More information about the cfe-commits mailing list