[PATCH] D124500: [clang-tidy] Support expressions of literals in modernize-macro-to-enum

Mon May 16 04:33:36 PDT 2022

aaron.ballman added inline comments.

================
Comment at: clang-tools-extra/clang-tidy/modernize/IntegralLiteralExpressionMatcher.cpp:204
+  return true;
+}
+
----------------
LegalizeAdulthood wrote:
> LegalizeAdulthood wrote:
> > aaron.ballman wrote:
> > > LegalizeAdulthood wrote:
> > > > LegalizeAdulthood wrote:
> > > > > aaron.ballman wrote:
> > > > > > LegalizeAdulthood wrote:
> > > > > > > aaron.ballman wrote:
> > > > > > > > Comma operator?
> > > > > > > Remember that the use case here is identifying expressions that are initializers for an enum.  If you were doing a code review and you saw this:
> > > > > > > ```
> > > > > > > enum {
> > > > > > >     FOO = (2, 3)
> > > > > > > };
> > > > > > > ```
> > > > > > > Would you be OK with that?  I wouldn't.  Clang even warns about it: https://godbolt.org/z/Y641cb8Y9
> > > > > > > 
> > > > > > > Therefore I deliberately left comma operator out of the grammar.
> > > > > > This is another case where I think you're predicting that users won't be using the full expressivity of the language and we'll get bug reports later. Again, in insolation, I tend to agree that I wouldn't be happy seeing that code. However, users write some very creative code and there's no technical reason why we can't or shouldn't handle comma operators.
> > > > > "Don't let the perfect be the enemy of the good."
> > > > > 
> > > > > My inclination is to simply explicitly state that comma operator is not recognized in the documentation.  It's already implicit by it's absence from the list of recognized operators.
> > > > > 
> > > > > Again, the worst that happens is that your macro isn't converted.
> > > > > 
> > > > > I'm open to being convinced that it's important, but you haven't convinced me yet `:)`
> > > > It wasn't much extra work/code to add comma operator support so I've done that.
> > > > "Don't let the perfect be the enemy of the good."
> > > 
> > > This is a production compiler toolchain. Correctness is important and that sometimes means caring more about perfection than you otherwise would like to.
> > > 
> > > > I'm open to being convinced that it's important, but you haven't convinced me yet :)
> > > 
> > > It's less about importance and more about maintainability coupled with correctness. With your approach, we get something that will have a long tail of bugs. If you used Clang's parser, you don't get the same issue -- maintenance largely comes along for free, and the bugs are far less likely.
> > > 
> > > About the only reason I like your current approach over using clang's parsing is that it quite likely performs much better than doing an actual token parsing of the expression. But as you pointed out, about the worst thing for a check can do is take correct code and make it incorrect -- doing that right requires some amount of semantic evaluation of the expressions (which you're not doing). For example:
> > > ```
> > > #define FINE 1LL << 30LL;
> > > #define BAD 1LL << 31LL;
> > > #define ALSO_BAD 1LL << 32L;
> > > ```
> > > We'll convert this into an enumeration and break `-pedantic-errors` builds in C. If we had a `ConstantExpr` object, we could see what it's value is and note that it's greater than what fits into an `int` and decide to do something smarter.
> > > 
> > > So I continue to see the current approach as being somewhat reasonable (especially for experimentation), but incorrect in the long run. Not sufficiently incorrect for me to block this patch on, but incorrect enough that the first time this check becomes a maintenance burden, I'll be asking more strongly to do this the correct way.
> > > > "Don't let the perfect be the enemy of the good."
> > > 
> > > This is a production compiler toolchain. Correctness is important and that sometimes means caring more about perfection than you otherwise would like to.
> > 
> > That's fair.
> > 
> > > For example:
> > > ```
> > > #define FINE 1LL << 30LL;
> > > #define BAD 1LL << 31LL;
> > > #define ALSO_BAD 1LL << 32L;
> > > ```
> > 
> > Oh this brings up the pesky "semicolons disappear from the AST" issue.  I wonder what happens when we're just processing tokens, though.  I will add a test to see.  This could be a case where my approach results in more correctness than `clangParse`!
> > 
> > > Not sufficiently incorrect for me to block this patch on, but incorrect enough that the first time this check becomes a maintenance burden, I'll be asking more strongly to do this the correct way.
> > 
> > I agree.
> So I was research the C standard for what it says are acceptable initializer values for an enum and it //disallows// the comma operator:
> 
> https://en.cppreference.com/w/c/language/enum
> 
> > integer constant expression whose value is representable as a value of type int
> 
> https://en.cppreference.com/w/c/language/constant_expression
> 
> > An integer constant expression is an expression that consists only of
> > - operators other than assignment, increment, decrement, function-call, or comma, except that cast operators can only cast arithmetic types to integer types
> 
> So I'll have to reject initializing expressions that use the comma operator when processing C files.
> So I'll have to reject initializing expressions that use the comma operator when processing C files.

Be careful when you do so: https://godbolt.org/z/dTMKv3a4v

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124500/new/

https://reviews.llvm.org/D124500