[PATCH] D44602: [clang-tidy] readability-function-size: add VariableThreshold param.

Tue Apr 10 14:19:16 PDT 2018

lebedev.ri added inline comments.

================
Comment at: clang-tidy/readability/FunctionSizeCheck.cpp:31
+        !(isa<ParmVarDecl>(VD) || isa<DecompositionDecl>(VD)) &&
+        !VD->getLocation().isMacroID())
+      Info.Variables++;
----------------
aaron.ballman wrote:
> This isn't the restriction I was envisioning. I was thinking more along the lines of:
> ```
> bool VisitStmtExpr(StmtExpr *SE) override {
>   ++StructNesting;
>   Base::VisitStmtExpr(SE);
>   --StructNesting;
>   return true;
> }
> ```
> Basically -- treat a statement expression the same as an inner struct or lambda because the variables declared within it are scoped *only* to the statement expression.
> 
> You could argue that we should also add: `if (!SE->getLocation.isMacroID()) { Base::VisitStmtExpr(SE); }` to the top of the function so that you only treat statement expressions that are themselves in a macro expansion get this special treatment. e.g.,
> ```
> void one_var() {
>   (void)({int a = 12; a;});
> }
> 
> #define M ({int a = 12; a;})
> void zero_var() {
>   (void)M;
> }
> ```
> I don't have strong opinions on this, but weakly lean towards treating all statement expressions the same way.
Ah, i did not really knew about `GNU Statement Expression`, so i did not ignore them.
Test added, and ignored.

================
Comment at: test/clang-tidy/readability-function-size.cpp:207-212
+void variables_8() {
+  int a, b;
+  struct A {
+    A(int c, int d);
+  };
+}
----------------
aaron.ballman wrote:
> JonasToth wrote:
> > lebedev.ri wrote:
> > > aaron.ballman wrote:
> > > > lebedev.ri wrote:
> > > > > aaron.ballman wrote:
> > > > > > JonasToth wrote:
> > > > > > > lebedev.ri wrote:
> > > > > > > > aaron.ballman wrote:
> > > > > > > > > lebedev.ri wrote:
> > > > > > > > > > aaron.ballman wrote:
> > > > > > > > > > > lebedev.ri wrote:
> > > > > > > > > > > > aaron.ballman wrote:
> > > > > > > > > > > > > lebedev.ri wrote:
> > > > > > > > > > > > > > aaron.ballman wrote:
> > > > > > > > > > > > > > > I think the current behavior here is correct and the previous behavior was incorrect. However, it brings up an interesting question about what to do here:
> > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > void f() {
> > > > > > > > > > > > > > >   struct S {
> > > > > > > > > > > > > > >     void bar() {
> > > > > > > > > > > > > > >       int a, b;
> > > > > > > > > > > > > > >     }
> > > > > > > > > > > > > > >   };
> > > > > > > > > > > > > > > }
> > > > > > > > > > > > > > > ```
> > > > > > > > > > > > > > > Does `f()` contain zero variables or two? I would contend that it has no variables because S::bar() is a different scope than f(). But I can see a case being made about the complexity of f() being increased by the presence of the local class definition. Perhaps this is a different facet of the test about number of types?
> > > > > > > > > > > > > > As previously briefly discussed in IRC, i **strongly** believe that the current behavior is correct, and `readability-function-size`
> > > > > > > > > > > > > > should analyze/diagnose the function as a whole, including all sub-classes/sub-functions.
> > > > > > > > > > > > > Do you know of any coding standards related to this check that weigh in on this?
> > > > > > > > > > > > > 
> > > > > > > > > > > > > What do you think about this:
> > > > > > > > > > > > > ```
> > > > > > > > > > > > > #define SWAP(x, y) ({__typeof__(x) temp = x; x = y; y = x;})
> > > > > > > > > > > > > 
> > > > > > > > > > > > > void f() {
> > > > > > > > > > > > >   int a = 10, b = 12;
> > > > > > > > > > > > >   SWAP(a, b);
> > > > > > > > > > > > > }
> > > > > > > > > > > > > ```
> > > > > > > > > > > > > Does f() have two variables or three? Should presence of the `SWAP` macro cause this code to be more complex due to having too many variables?
> > > > > > > > > > > > Datapoint: the doc (`docs/clang-tidy/checks/readability-function-size.rst`) actually already states that macros *are* counted.
> > > > > > > > > > > > 
> > > > > > > > > > > > ```
> > > > > > > > > > > > .. option:: StatementThreshold
> > > > > > > > > > > > 
> > > > > > > > > > > >    Flag functions exceeding this number of statements. This may differ
> > > > > > > > > > > >    significantly from the number of lines for macro-heavy code. The default is
> > > > > > > > > > > >    `800`.
> > > > > > > > > > > > ```
> > > > > > > > > > > > ```
> > > > > > > > > > > > .. option:: NestingThreshold
> > > > > > > > > > > > 
> > > > > > > > > > > >     Flag compound statements which create next nesting level after
> > > > > > > > > > > >     `NestingThreshold`. This may differ significantly from the expected value
> > > > > > > > > > > >     for macro-heavy code. The default is `-1` (ignore the nesting level).
> > > > > > > > > > > > ```
> > > > > > > > > > > My concerns relate to what's considered a "variable declared in the body" (per the documentation) in relation to function complexity. To me, if the variable is not accessible lexically within the body of the function, it's not adding to the function's complexity *for local variables*. It may certainly be adding other complexity, of course.
> > > > > > > > > > > 
> > > > > > > > > > > I would have a very hard time explaining to a user that variables they cannot see or change (assuming the macro is in a header file out of their control) contribute to their function's complexity. Similarly, I would have difficulty explaining that variables in an locally declared class member function contribute to the number of variables in the outer function body, but the class data members somehow do not.
> > > > > > > > > > > 
> > > > > > > > > > > (per the documentation) 
> > > > > > > > > > 
> > > > > > > > > > Please note that the word `complexity` is not used in the **documentation**, only `size` is.
> > > > > > > > > > 
> > > > > > > > > > There also is the other side of the coin:
> > > > > > > > > > 
> > > > > > > > > > ```
> > > > > > > > > > #define simple_macro_please_ignore \
> > > > > > > > > >   the; \
> > > > > > > > > >   actual; \
> > > > > > > > > >   content; \
> > > > > > > > > >   of; \
> > > > > > > > > >   the; \
> > > > > > > > > >   foo();
> > > > > > > > > > 
> > > > > > > > > > // Very simple function, nothing to see.
> > > > > > > > > > void foo() {
> > > > > > > > > >   simple_macro_please_ignore();
> > > > > > > > > > }
> > > > > > > > > > 
> > > > > > > > > > #undef simple_macro_please_ignore
> > > > > > > > > > ```
> > > > > > > > > > 
> > > > > > > > > > In other words, if we ignore macros, it would be possible to abuse them to artificially reduce complexity, by hiding it in the macros.
> > > > > > > > > > I agree that it's total abuse of macros, but macros are in general not nice, and it would not be good to give such things a pass.
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > > My concerns relate to what's considered a "variable declared in the body" (per the documentation) in relation to function complexity.
> > > > > > > > > > 
> > > > > > > > > > Could you please clarify, at this point, your concerns are only about this new part of the check (variables), or for the entire check?
> > > > > > > > > > In other words, if we ignore macros, it would be possible to abuse them to artificially reduce complexity, by hiding it in the macros.
> > > > > > > > > 
> > > > > > > > > I don't disagree, that's why I'm trying to explore the boundaries. Your example does artificially reduce complexity. My example using swap does not -- it's an idiomatic swap macro where the inner variable declaration adds no complexity to the calling function as it's not exposed to the calling function.
> > > > > > > > > 
> > > > > > > > > > Could you please clarify, at this point, your concerns are only about this new part of the check (variables), or for the entire check?
> > > > > > > > > 
> > > > > > > > > Only the new part of the check involving variables.
> > > > > > > > > > Could you please clarify, at this point, your concerns are only about this new part of the check (variables), or for the entire check?
> > > > > > > > 
> > > > > > > > > Only the new part of the check involving variables.
> > > > > > > > 
> > > > > > > > OK.
> > > > > > > > 
> > > > > > > > This should be split into two boundaries:
> > > > > > > > * macros
> > > > > > > > * the nested functions/classes/methods in classes.
> > > > > > > > 
> > > > > > > > I *think* it may make sense to give the latter a pass, no strong opinion here.
> > > > > > > > But not macros.
> > > > > > > > (Also, i think it would be good to treat macros consistently within the check.)
> > > > > > > > 
> > > > > > > > Does anyone else has an opinion on how that should be handled?
> > > > > > > what is the current behaviour for aarons nested function?
> > > > > > > i checked cppcoreguidelines and hicpp and they did not mention such a case and i do not recall any rule that might relate to it.
> > > > > > > 
> > > > > > > I think aaron has a good point with:
> > > > > > > > I would have a very hard time explaining to a user that variables they cannot see or change (assuming the macro is in a header file out of their control) contribute to their function's complexity. Similarly, I would have difficulty explaining that variables in an locally declared class member function contribute to the number of variables in the outer function body, but the class data members somehow do not.
> > > > > > > 
> > > > > > > But I see no way to distinguish between "good" and "bad" macros, so macro expansions should add to the variable count, even though your swap macro is a valid counter example.
> > > > > > > But I see no way to distinguish between "good" and "bad" macros, so macro expansions should add to the variable count, even though your swap macro is a valid counter example.
> > > > > > 
> > > > > > I would constrain it this way: variables declared in local class member function definitions and expression statements within a macro expansion do not contribute to the variable count, all other local variables do. e.g.,
> > > > > > ```
> > > > > > #define SWAP(x, y) ({__typeof__(x) temp = x; x = y; y = x;})
> > > > > > 
> > > > > > void two_variables() {
> > > > > >   int a = 10, b = 12;
> > > > > >   SWAP(a, b);
> > > > > > }
> > > > > > 
> > > > > > void three_variables() {
> > > > > >   int a = 10, b = 12;
> > > > > >   ({__typeof__(x) temp = x; x = y; y = x;})
> > > > > > }
> > > > > > 
> > > > > > void one_variable() {
> > > > > >   int i = 12;
> > > > > >   class C {
> > > > > >     void four_variables() {
> > > > > >       int a, b, c, d;
> > > > > >     }
> > > > > >   };
> > > > > > }
> > > > > > 
> > > > > > #define FOO(x) (x + ({int i = 12; i;}))
> > > > > > 
> > > > > > void five_variables() {
> > > > > >   int a, b, c, d = FOO(100);
> > > > > >   float f;
> > > > > > }
> > > > > > ```
> > > > > > I would constrain it this way: variables declared in local class member function definitions and expression statements within a macro expansion do not contribute to the variable count, all other local variables do.
> > > > > 
> > > > > But we do already count statements, branches and compound statements in all those cases in this check.
> > > > > Why should variables be an exception?
> > > > > But we do already count statements, branches and compound statements in all those cases in this check.
> > > > Why should variables be an exception?
> > > > 
> > > > Why should variables that are entirely inaccessible to the function count towards the function's variable complexity?
> > > > 
> > > > Things like macros count towards a function's line count because the macros are expanded into the function. I don't agree with this choice, but I can at least explain it to someone I'm teaching. In the case of variable declarations, I have no justification for those variables adding complexity because they cannot be named within the function even though the macro is expanded in the function. Yet the check doesn't count global variables which do add to function complexity when used within the function.
> > > > 
> > > > For those design reasons, I'd also be opposed to diagnosing this (assume it requires 2 variables to trigger the diagnostic):
> > > > ```
> > > > void one_variable() {
> > > >   auto lambda = []() { int a = 12, b = 100; return a + b; };
> > > > }
> > > > ```
> > > > which is functionally equivalent to:
> > > > ```
> > > > void one_variable() {
> > > >   struct S {
> > > >     int operator()() { int a = 12, b = 100; return a + b; }
> > > >   } lambda;
> > > > }
> > > > ```
> > > Ok, done. But this raises another question:
> > > ```
> > > #define vardecl(type, name) type name;
> > > void variables_15() {
> > >   // FIXME: surely we should still warn here?
> > >   vardecl(int, a);
> > >   vardecl(int, b);
> > > }
> > > ```
> > > I'm guessing we want to still warn in cases like this? 
> > how would you differentiate? I am against trying to get all macro cases right, either warn for everything in macros or nothing.
> > I'm guessing we want to still warn in cases like this?
> 
> That would be nice, yes. That's why the cut-point I was recommending were situations where the declared variables are not accessible within the function.
Ok. Any hint on how to actually do that?
I guess i could look at `DeclRefExpr`, but that would only count those macro variables,
that are actually used, while the check is trying not to differentiate.

Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D44602