[PATCH] D117522: [clang-tidy] Add modernize-macro-to-enum check

Aaron Ballman via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Mar 22 09:20:37 PDT 2022


aaron.ballman added inline comments.


================
Comment at: clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp:47
+    CRLF,
+    CRLFCR,
+  };
----------------
LegalizeAdulthood wrote:
> aaron.ballman wrote:
> > LegalizeAdulthood wrote:
> > > aaron.ballman wrote:
> > > > I'm a bit confused by this one as this is not a valid line ending (it's either three valid line endings or two valid line endings, depending on how you look at it). Can you explain why this is needed?
> > > It's a state machine, where the states are named for what we've seen so far and we're looking for //two// consecutive line endings, not just one.  Does it make sense now?
> > Thanks, I understood it was a state machine, but it's a confused one to me. `\r` was the line ending on Mac Classic, I've not seen it used outside of that platform (and I've not seen anyone write code for that platform in a long time). So, to me, the only valid combinations of line endings to worry about are: `LF LF`; `CRLF CRLF`; `CRLF LF`; `LF CRLF`.
> > 
> > `LF LF` returns false (Nothing -> LF -> return false)
> > `CRLF CRLF` returns false (Nothing -> CR -> CRLF -> CRLFCR -> return false)
> > `CRLF LF` returns true (Nothing -> CR -> CRLF -> LF -> finish loop)
> > `LF CRLF` returns true (Nothing -> LF -> CR -> CRLF -> finish loop)
> > 
> > (If you also intend to support Mac Classic line endings for some reason, this gets even more complicated.)
> I was trying to follow "be liberal in what you accept as input and conservative in what you generate as output" maxim.  I can remove the `CR` as a line ending case if you think it's too obscure.
If Clang supports it as a line ending, we probably should too, but... how do we handle CRLF vs "I mixed a CR with an LF by accident" kind of inputs? (Maybe we just treat that as CRLF and if the behavior is bad, the user shouldn't mix their line endings that way; I think that's defensible.) That seems to be similar to the scenario that's confusing me above where the user mixed an LF and CRLF by accident.


================
Comment at: clang-tools-extra/clang-tidy/modernize/MacroToEnumCheck.cpp:318
+
+  // It can be +Lit, -Lit or just Lit.
+  Token Tok = Info->tokens().front();
----------------
LegalizeAdulthood wrote:
> LegalizeAdulthood wrote:
> > aaron.ballman wrote:
> > > LegalizeAdulthood wrote:
> > > > aaron.ballman wrote:
> > > > > It's worth pointing out that both of these are expressions that operate on a literal, and if we're going to allow expressions that operator on a literal, why only these two? e.g. why not allow `#define FOO ~0U` or `#define BAR FOO + 1`? Someday (not today), it would be nice for this to work on any expression that's a valid constant expression.
> > > > A later enhancement can generalize to literal expressions (which are valid initializers for an enumeration), but I wanted to cover the most common case of simple negative integers in this first pass.
> > > I'm less worried about the arbitrary constant expressions than I am about not supporting `~` because `~0U` is not uncommon in macros as a way to set all bits to 1. It's certainly more common than seeing a unary `+`, at least in my experience. However, an even more important use case that I should have thought of first is surrounding the value in parens (which is another common idiom with macros). e.g, `#define ONE (1)`
> > > 
> > > Some examples of this in the wild (search the files for `~0U`):
> > > 
> > > https://codesearch.isocpp.org/actcd19/main/l/linux/linux_4.15.17-1/drivers/gpu/drm/i915/gvt/handlers.c
> > > https://codesearch.isocpp.org/actcd19/main/w/wine/wine_4.0-1/dlls/d3d8/d3d8_private.h
> > > https://codesearch.isocpp.org/actcd19/main/q/qtwebengine-opensource-src/qtwebengine-opensource-src_5.11.3+dfsg-2/src/3rdparty/chromium/third_party/vulkan/include/vulkan/vulkan.h
> > > 
> > > (There's plenty of other examples to be found on the same site.)
> > > 
> > > I'm fine not completing the set in the initial patch, but the current behavior is a bit confusing (`+` is almost of negligible importance). I think `~` and parens need to be supported; I'd prefer in this patch, but I'm fine if it comes in a subsequent patch so long as those two are supported before we release.
> > The difficulty in supporting more complex expressions is that we have **NO** AST support here and it involves manually matching tokens in the macro definition.
> > 
> > However, your point about `~` is well taken and that's easy to add based on what I've got here.  I thought it important to handle negative literals, so I added support for unary `-`.  I added support for unary `+` out of symmetry.
> I've added support for bitwise negated integers.  I didn't go further and try to recognize parenthesized literals (this just seems dumb, anyway... the extra parentheses do nothing and aren't ever needed).
Thanks for adding the `~` support! I think we'll want parens supported as well because there's very common guidance to parenthesize EVERYTHING in the macro expansion list, but at the same time, I think that can be done once we get a bug report from a user.


================
Comment at: clang-tools-extra/test/clang-tidy/checkers/modernize-macro-to-enum.cpp:67-68
+
+// Undefining a macro invalidates adjacent macros
+// from being considered as an enum.
+#define REMOVED1 1
----------------
LegalizeAdulthood wrote:
> aaron.ballman wrote:
> > LegalizeAdulthood wrote:
> > > LegalizeAdulthood wrote:
> > > > aaron.ballman wrote:
> > > > > What about an #undef that's not adjacent to any macros? e.g.,
> > > > > ```
> > > > > #define FOO 1
> > > > > #define BAR 2
> > > > > #define BAZ 3
> > > > > 
> > > > > int i = 12;
> > > > > 
> > > > > #if defined(FROBBLE)
> > > > > #undef FOO
> > > > > #endif
> > > > > ```
> > > > > I'm worried that perhaps other code elsewhere will be checking `defined(FOO)` perhaps in cases conditionally compiled away, and switching `FOO` to be an enum constant will break other configurations. To be honest, I'm a bit worried about that for all of the transformations here... and I don't know a good way to address that aside from "don't use the check". It'd be interesting to know what kind of false positive rate we have for the fixes if we ran it over a large corpus of code.
> > > > Yeah, the problem arises whenever you make any changes to a header file.  Did you also change all translation units that include the header?  What about conditionally compiled code that was "off" in the translation unit for the automated change?  Currently, we don't have a way of analyzing a group of translation units together for a cohesive change, nor do we have any way of inspecting more deeply into conditionally compiled code.  Addressing those concerns is beyond the scope of this check (or any clang-tidy check) as it involves improvements to the entire infrastructure.
> > > > 
> > > > However, I think it is worth noting in the documentation about possible caveats.  I think the way clang-tidy avoids this problem now is that you have to request fixes and the default mode is to issue warnings and leave it up to the reader as to whether or not they should apply the fixes.
> > > > 
> > > > I believe I already have logic to disqualify any cluster of macros where any one of them are used in a preprocessor condition (that was the last functional change I made to this check).  Looks like I need to extend that slightly to include checking for macros that are `#undef`'ed.
> > > OK, looks like I was already handling this, LOL.  See line 135
> > > 
> > > ```
> > > // Undefining an enum-like macro results in the enum set being dropped.
> > > ```
> > Yeah, you already have the code for handling this somewhat (that's one of the reasons why I brought this particular use case up). My greater concern is: how many false positives does this check generate on real world code? Documentation may help alleviate those concerns well enough, but if the false positive rate is sufficiently high that you basically have to disable this check for real world code, we need to do better. I don't fully trust my intuition on this one because preprocessor code in the real world has 40+ years worth of accumulated oddities, so having some actual measurements against real world code would be very informative.
> In the latest diff, I added a test case to make it clear that even if you `#undef` a macro later in the file, it invalidates all the surrounding macros in the cluster containing the undef'ed macro.
> 
> I'm open to suggestions for code bases on which to run this.  Since this was motivated by my modernization of the fractint code base, I will run it on an old version of the code from my repo before I had manually converted the defines to enums.
> 
> LLVM isn't a good test case here, because LLVM doesn't use macros as enums `:)`
> 
> For this scenario (a macro that is later undef'ed), I don't believe I will emit any false positives.
> In the latest diff, I added a test case to make it clear that even if you #undef a macro later in the file, it invalidates all the surrounding macros in the cluster containing the undef'ed macro.

Oh, thank you! I had the impression we only cared if it was undefed within the same "run" we used to determine the macro.

> I'm open to suggestions for code bases on which to run this.

I'd recommend some FreeBSD or Linux packages (or the whole distro if that's easy enough), as those tend to have a fair number of surprises. If that's harder and you'd rather just do one big project, I'd recommend something openssl, sqlite3, maybe postgres for C projects likely to make a fair amount of use of macros (from what I recall of them anyway).


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D117522/new/

https://reviews.llvm.org/D117522



More information about the cfe-commits mailing list