[PATCH] D27621: [clang-tidy] check to find declarations declaring more than one name

Thu Jan 5 09:40:49 PST 2017

aaron.ballman added inline comments.

================
Comment at: clang-tidy/readability/OneNamePerDeclarationCheck.cpp:34
+  // a tag declaration (e.g. struct, class etc.):
+  // class A { } Object1, Object2;  <-- won't be matched
+  Finder->addMatcher(
----------------
firolino wrote:
> aaron.ballman wrote:
> > firolino wrote:
> > > aaron.ballman wrote:
> > > > firolino wrote:
> > > > > aaron.ballman wrote:
> > > > > > firolino wrote:
> > > > > > > firolino wrote:
> > > > > > > > firolino wrote:
> > > > > > > > > firolino wrote:
> > > > > > > > > > aaron.ballman wrote:
> > > > > > > > > > > firolino wrote:
> > > > > > > > > > > > aaron.ballman wrote:
> > > > > > > > > > > > > Why do we not want to match this?
> > > > > > > > > > > > If we decide, whether we transform 
> > > > > > > > > > > > ```
> > > > > > > > > > > > class A { 
> > > > > > > > > > > > } Object1, Object2;
> > > > > > > > > > > > ``` 
> > > > > > > > > > > > to
> > > > > > > > > > > > ```
> > > > > > > > > > > > class A { 
> > > > > > > > > > > > } Object1, 
> > > > > > > > > > > > Object2;
> > > > > > > > > > > > ``` 
> > > > > > > > > > > > or
> > > > > > > > > > > > ```
> > > > > > > > > > > > class A { 
> > > > > > > > > > > > } 
> > > > > > > > > > > > Object1, 
> > > > > > > > > > > > Object2;
> > > > > > > > > > > > ``` 
> > > > > > > > > > > > I might consider adding support for it. Moreover, this kind of definition is usually seen globally and I don't know how to handle globals yet. See http://lists.llvm.org/pipermail/cfe-dev/2015-November/046262.html
> > > > > > > > > > > I think this should be handled. It can be handled in either of the forms you show, or by saying:
> > > > > > > > > > > ```
> > > > > > > > > > > A Object1;
> > > > > > > > > > > A Object2;
> > > > > > > > > > > ```
> > > > > > > > > > > If all of these turn out to be a problem, we can still diagnose without providing a fixit.
> > > > > > > > > > > 
> > > > > > > > > > > As for globals in general, they can be handled in a separate patch once we figure out the declaration grouping.
> > > > > > > > > > OK. I will try to split the object definition from the class definition, as you have suggested. Thus, I can kick out the tagDecl-matcher as well. If there is no easy way to do this, it will be reported anyway but without a fixit.
> > > > > > > > > > 
> > > > > > > > > > Note for me: Update documentation!
> > > > > > > > > What about
> > > > > > > > > ```
> > > > > > > > > struct S {
> > > > > > > > > } S1;
> > > > > > > > > ```
> > > > > > > > > I would like to report this too, since two names are being declared here. `S` and `S1`. What do you think?
> > > > > > > > ```
> > > > > > > > struct {
> > > > > > > > } nn1, nn2;
> > > > > > > > ```
> > > > > > > > Shall we ignore anonymous definitions?
> > > > > > > To be more precise: Warn and provide a fixit for `struct S {} S1`. Only warn for `struct {} nn1, nn2`.
> > > > > > > What about
> > > > > > > 
> > > > > > > ```
> > > > > > > struct S {
> > > > > > > } S1;
> > > > > > > ```
> > > > > > > I would like to report this too, since two names are being declared here. S and S1. What do you think?
> > > > > > 
> > > > > > I don't think that this should be diagnosed. For one, this is a relatively common pattern (arguably more common than `struct S { } S1, S2;`), but also, it names two very distinct entities (a type and a variable).
> > > > > > 
> > > > > > > ```
> > > > > > > struct {
> > > > > > > } nn1, nn2;
> > > > > > >```
> > > > > > > Shall we ignore anonymous definitions?
> > > > > > 
> > > > > > Yes, we basically have to.
> > > > > > 
> > > > > Transforming
> > > > > ```
> > > > > typedef struct X { int t; } X, Y;
> > > > > ```
> > > > > to
> > > > > ```
> > > > > typedef struct X { int t; };
> > > > > typedef X X;
> > > > > typedef X Y;
> > > > > ```
> > > > > will be valid, but looks odd.
> > > > I am on the fence about this transformation -- it does not declare new objects, but instead declares new types. A very common pattern in some libraries (such as the Win32 SDK) is to declare:
> > > > ```
> > > > typedef struct Blah {
> > > > 
> > > > } Blah, *PBlah, **PPBlah;
> > > > ```
> > > > Because of that pattern, diagnosing the above typedef is likely to be very chatty in some projects, and I don't know that the fixit provides much real benefit for types.
> > > > 
> > > > At least for the CERT DCL04-C version of this rule, that code should not be flagged. Do you think any of the variations of this check should flag it?
> > > Maybe we should reconsider where to split and where not. The original idea was to minimize confusion on code like
> > > ```
> > > int* a,b;
> > > ```
> > > whereas 
> > > ```
> > > struct S {} *s1, s2;
> > > ```
> > > seems not so confusing as above. However, 
> > > ```
> > > struct S {}* s1, s2;
> > > ```
> > > looks dangerous again.
> > > 
> > > Even following code looks unfamiliar to me.
> > > ```
> > > typedef struct Blah {} Blah, *PBlah;
> > > ```
> > > 
> > > It really depends on the beholder. So, how about letting this check split **everything** and provide options for maximum flexibility? So specific rulesets or the user may configure it as they wish. We could add for example split-after-tag-def, split-tag-typedef in addition to cppcore and cert and use a default setting (split-after-tag-def=true, split-tag-typedef=false).
> > I've typically found that users go with whatever the default options are, rather than specify a slew of options, and if the default options produce an onerous check, they simply disable the entire check. I'd rather not push the design off to the users unless there's really no better alternative.
> > 
> > I think that the only "dangerous" operations this sort of check is meant to catch is when the declaration group uses different specifiers for the declarator or when there's an initializer on one declarator but none of the rest. I think the other operations are more likely to be a consistency thing than a bug-prone thing. e.g.,
> > ```
> > // Differing specifiers
> > int i, j, k; // Not at all likely to be mistaken code.
> > int i, *j, &k; // Not really likely to be mistaken code.
> > int *i, j; // More likely to be mistaken code.
> > int i, *j; // Not really likely to be mistaken code.
> > typedef int *one, two, three; // More likely to be mistaken code.
> > typedef int one, *two, **three; // Not really likely to be mistaken code.
> > 
> > // Initialization
> > int i = 10, j = 10; // Not at all likely to be mistaken code.
> > int i = 10, j; // Not really likely to be mistaken code.
> > int i, j = 10; // More likely to be mistaken code.
> > ```
> > I think declarations that declare both types and variables should ignore the type name and focus only on the variable name following the same heuristics. e.g.,
> > ```
> > struct S {} s; // Not at all likely to be mistaken code.
> > struct S {} s, *t; // Not really likely to be mistaken code.
> > struct S {} *s, t; // More likely to be mistaken code.
> > ```
> > I could imagine an option like `FlagPossiblyDangerous` or something else; if the flag is set, the check diagnoses based on some heuristic of likely dangerous operations and if the flag is not set, the check flags all declarators in the same declaration.
> > 
> > What do you think?
> We are judging from our perspective what is "likely dangerous" and this might not agree with the users view. This is why I don't wanna force any limitations based on our decisions. This check might not be used only to reduce "readability error" but even for a simple style guide enforcement that wants one variable declaration per line.
> 
> I know that `int i, j, k;` is not bad at all or likely to be mistaken, but style guides from [[ https://www.openssl.org/policies/codingstyle.txt | openssl]] or the [[ https://www.kernel.org/doc/Documentation/process/coding-style.rst | linux kernel ]] are enforcing one declaration per line.
> 
> > I've typically found that users go with whatever the default options are
> So, why not simple use options and set the default whatever we think is right. We are not loosing anything with this way, just getting more flexible. Anyone else can reset the options to their needs.
> 
> I would think of following options:
> `Style = Default, CppCore, Cert`
> `SplitTagDef = True, False`
> `SplitTagTypeDef = True, False`
> 
> and with default:
> ```
> Style = Default     # splits anything, regardless of tokens and initializers
> SplitTagDef = True
> SplitTagTypeDef = False
> ```
> 
> The check will be registered under cert with `Style = Cert` and `CppCore` for cppcore respectively. 
> 
> And since it is topic related, there are still two things on my todo list:
>   - attributes
>   - members
> 
> ```
> struct alignas(8) S {} s1, s2;
> alignas(8) char a, b;
> [[maybe_unused]] bool x, y;
> 
> struct S {
>   int a, b;
> };
> ```
> Should we include this as well, or leave it for future work?
> 
> So, why not simple use options and set the default whatever we think is right. We are not loosing anything with this way, just getting more flexible. Anyone else can reset the options to their needs.

I think we are describing the same solution, but perhaps from different angles. I am proposing that we have this checker flag everything, but that we have a sensible set of default options that reduce the number of diagnostics issued for existing code bases. I fear that if the default behavior of this check is to point out *every* declaration group with more than one name being declared, the output will be way too chatty and so the entire check is disabled. I think that if we had a default behavior that only pointed out the suspicious declarations, that would reduce the "noise" by default. One way to test this would be to run the check over some large code bases of various style guides to see how many results there are. I would recommend running it over the LLVM and Clang code bases (which do not have a restriction on declarations) and OpenSSL (which does have this restriction) and report back the diagnostics you get from each (as well as totals). Feel free to pick other/additional code bases to test again.

> Should we include this as well, or leave it for future work?

It's up to you, but I think it's reasonable to leave for future work. Attributes are likely to be somewhat harder than members in many regards due to where attributes can be written. I suspect that members can be handled like variable declarations in general.

https://reviews.llvm.org/D27621