[PATCH] clang-tidy checker that enforce proper parentheses in macros

Aaron Ballman aaron.ballman at gmail.com
Mon May 11 09:36:53 PDT 2015


On Wed, May 6, 2015 at 10:16 AM, Daniel Marjamäki
<daniel.marjamaki at evidente.se> wrote:
> Hi aaron.ballman, silvas,
>
> This clang-tidy checker will make sure macros have proper parentheses.
>
> I previously made a compiler-check for this but that only warned when bad usage was seen. This clang-tidy checker warns when dangerous macros are seen.
>
> The rules are:
>   - Expressions must be surrounded with parentheses.
>   - Arguments must be surrounded with parentheses.
>
> I have tested this on 193 debian projects so far. there was 47668 warnings. I have not looked at every warning in detail but overall it looks very good. The majority of the warnings point out dangerous macros that can be misused, and I therefore classify them as true positives. Maybe there are some 1-5% that are "false positives", in most of these cases it would not hurt to add parentheses.
>
> Example false positives:
>
> ```
> #define   A     (INT)&a
> ```
>
> Is the & a bitand or a address-of operator. It's not trivial to know when looking at the tokens. Putting parentheses around the macro doesn't hurt.
>
> ```
> #define   A     INT*
> ```
>
> ... I warn about this. Can be fixed by using typedef instead. But maybe user wants to have it this way.
>
>
> and then sometimes there are warnings inside ({..}) that I think ideally would not be shown. I could bailout for instance when a { is seen in the macro, but most warnings are true positives so I think such bailout would mostly cause false negatives.

I have some fairly minor nits below, but my biggest concern is the
number of diagnostics you are seeing. ~48k new diagnostics issues is
*really* chatty!

> Index: clang-tidy/misc/MiscTidyModule.cpp
> ===================================================================
> --- clang-tidy/misc/MiscTidyModule.cpp
> +++ clang-tidy/misc/MiscTidyModule.cpp
> @@ -16,6 +16,7 @@
>  #include "BoolPointerImplicitConversionCheck.h"
>  #include "InaccurateEraseCheck.h"
>  #include "InefficientAlgorithmCheck.h"
> +#include "MacroParenthesesCheck.h"
>  #include "StaticAssertCheck.h"
>  #include "SwappedArgumentsCheck.h"
>  #include "UndelegatedConstructor.h"
> @@ -41,6 +42,8 @@
>          "misc-inaccurate-erase");
>      CheckFactories.registerCheck<InefficientAlgorithmCheck>(
>          "misc-inefficient-algorithm");
> +    CheckFactories.registerCheck<MacroParenthesesCheck>(
> +        "misc-macro-parentheses");
>      CheckFactories.registerCheck<StaticAssertCheck>(
>          "misc-static-assert");
>      CheckFactories.registerCheck<SwappedArgumentsCheck>(
> Index: clang-tidy/misc/CMakeLists.txt
> ===================================================================
> --- clang-tidy/misc/CMakeLists.txt
> +++ clang-tidy/misc/CMakeLists.txt
> @@ -7,6 +7,7 @@
>    BoolPointerImplicitConversionCheck.cpp
>    InaccurateEraseCheck.cpp
>    InefficientAlgorithmCheck.cpp
> +  MacroParenthesesCheck.cpp
>    MiscTidyModule.cpp
>    StaticAssertCheck.cpp
>    SwappedArgumentsCheck.cpp
> Index: clang-tidy/misc/MacroParenthesesCheck.cpp
> ===================================================================
> --- clang-tidy/misc/MacroParenthesesCheck.cpp
> +++ clang-tidy/misc/MacroParenthesesCheck.cpp
> @@ -0,0 +1,166 @@
> +//===--- MacroParenthesesCheck.cpp - clang-tidy----------------------------===//
> +//
> +//                     The LLVM Compiler Infrastructure
> +//
> +// This file is distributed under the University of Illinois Open Source
> +// License. See LICENSE.TXT for details.
> +//
> +//===----------------------------------------------------------------------===//
> +
> +#include "MacroParenthesesCheck.h"
> +#include "clang/Frontend/CompilerInstance.h"
> +#include "clang/Lex/PPCallbacks.h"
> +#include "clang/Lex/Preprocessor.h"
> +
> +namespace clang {
> +namespace tidy {
> +
> +namespace {
> +class MacroParenthesesPPCallbacks : public PPCallbacks {
> +public:
> +  explicit MacroParenthesesPPCallbacks(Preprocessor *PP,
> +                                       MacroParenthesesCheck *Check)
> +      : PP(PP), Check(Check) {}
> +
> +  void MacroDefined(const Token &MacroNameTok,
> +                    const MacroDirective *MD) override {
> +    replacementList(MacroNameTok, MD);
> +    argument(MacroNameTok, MD);
> +  }
> +
> +private:
> +  /** replacement list with calculations should be enclosed in parentheses */

Here and elsewhere, comments should be complete sentences
(capitalization and punctuation included).


> +  void replacementList(const Token &MacroNameTok, const MacroDirective *MD);
> +
> +  /** arguments should be enclosed in parentheses */
> +  void argument(const Token &MacroNameTok, const MacroDirective *MD);
> +
> +  Preprocessor *PP;
> +  MacroParenthesesCheck *Check;
> +};
> +}
> +
> +void MacroParenthesesPPCallbacks::replacementList(const Token &MacroNameTok,
> +                                                  const MacroDirective *MD) {

Why are you passing in a MacroDirective object when you only use it to
get the MacroInfo object?

> +  const MacroInfo *MI = MD->getMacroInfo();
> +
> +  // bigger than 0 => we are inside parentheses / braces / squares
> +  int Indent = 0;

I'm not certain I understand why this is called Indent?

> +
> +  // Is there a previous comma?
> +  bool Comma = false;
> +
> +  // Have we seen some calculation outside parentheses/braces/squares
> +  bool Err = false;
> +
> +  for (auto TI = MI->tokens_begin(), TE = MI->tokens_end(); TI != TE; ++TI) {
> +    const Token &Tok = *TI;
> +    if (Tok.is(tok::l_paren) || Tok.is(tok::l_brace) || Tok.is(tok::l_square)) {
> +      ++Indent;
> +    } else if (Tok.is(tok::r_paren) || Tok.is(tok::r_brace) ||
> +               Tok.is(tok::r_square)) {
> +      --Indent;

Do we also have to worry about angle brackets for C++ code?

> +    } else if (Indent <= 0 && Tok.is(tok::comma)) {
> +      if (Comma)
> +        Err = false;
> +      Comma = true;
> +    } else if (Indent <= 0 &&
> +               (Tok.is(tok::plus) || Tok.is(tok::minus) || Tok.is(tok::star) ||
> +                Tok.is(tok::slash) || Tok.is(tok::percent) ||
> +                Tok.is(tok::amp) || Tok.is(tok::pipe) || Tok.is(tok::caret) ||
> +                Tok.is(tok::question) || Tok.is(tok::colon))) {

I would prefer to see this list split out into a separate helper
function with more explanation as to why something would qualify to be
in the list.

> +      // heuristic for macros that are clearly not intended to be enclosed in
> +      // parentheses, macro contains just an operator and a number
> +      // #define X     *10
> +      if (TI == MI->tokens_begin() && !Tok.is(tok::plus) &&
> +          !Tok.is(tok::minus) && (TI + 1) != TE &&
> +          (TI + 1)->is(tok::numeric_constant) && (TI + 2) == TE) {

This expression is kind of obtuse.

> +        return;
> +      }
> +      Err = true;
> +      if (!Comma)
> +        break;
> +    }
> +  }
> +  if (Err) {
> +    SourceLocation Loc = MI->tokens_begin()->getLocation();
> +    Check->diag(Loc,
> +                "macro replacement list should be enclosed in parentheses");
> +  }
> +}
> +
> +void MacroParenthesesPPCallbacks::argument(const Token &MacroNameTok,
> +                                           const MacroDirective *MD) {
> +  const MacroInfo *MI = MD->getMacroInfo();

Why are you passing in a MacroDirective object when you only use it to
get the MacroInfo object?

> +  enum { START, AFTER_LPAR, EXPECT_RPAR, AFTER_EQ, EXPECT_SEMI } State = START;

Please do not use all caps enumerators.

> +  for (auto TI = MI->tokens_begin(), TE = MI->tokens_end(); TI != TE; ++TI) {
> +    const Token &Tok = *TI;
> +    if (Tok.is(tok::hash) || Tok.is(tok::hashhash)) {
> +      // Skip next token
> +      ++TI;
> +      if (TI == TE)
> +        break;

I don't think that you need the if check; the continue will take care
of that for you.

> +      continue;
> +    } else if (Tok.is(tok::l_paren) || Tok.is(tok::l_square)) {
> +      State = AFTER_LPAR;
> +    } else if (Tok.is(tok::r_paren) || Tok.is(tok::r_square)) {
> +      State = START;
> +    } else if (State == EXPECT_SEMI && Tok.is(tok::semi)) {
> +      State = START;
> +    } else if ((Tok.is(tok::identifier) || Tok.is(tok::raw_identifier)) &&
> +               MI->getArgumentNum(Tok.getIdentifierInfo()) >= 0) {
> +      // Ignore this if it's the first token
> +      if (TI == MI->tokens_begin())
> +        continue;
> +      // Ignore this if previous token is a string literal, identifier, or a
> +      // period/arrow
> +      if (TI != MI->tokens_begin() &&
> +          ((TI - 1)->is(tok::string_literal) || (TI - 1)->is(tok::arrow) ||
> +           (TI - 1)->is(tok::period) || (TI - 1)->is(tok::identifier) ||
> +           (TI - 1)->is(tok::raw_identifier)))
> +        continue;
> +      // Ignore this if next token is ##, a string literal, or a identifier
> +      if ((TI + 1) != TE &&
> +          ((TI + 1)->is(tok::hashhash) || (TI + 1)->is(tok::string_literal) ||
> +           (TI + 1)->is(tok::identifier) || (TI + 1)->is(tok::raw_identifier)))
> +        continue;

The prev and next expressions are pretty obtuse and repetetive.

> +      if (State == AFTER_LPAR) {
> +        State = EXPECT_RPAR;
> +      } else if (State == AFTER_EQ) {
> +        State = EXPECT_SEMI;
> +      } else {
> +        Check->diag(Tok.getLocation(), "macro argument should be enclosed in "
> +                                       "parentheses");
> +        return;
> +      }
> +    } else if (Tok.is(tok::comma)) {
> +      State = AFTER_LPAR;
> +    } else if (Tok.is(tok::period)) {
> +      // Skip next token
> +      ++TI;
> +      if (TI == TE)
> +        break;

I don't think the if check is required here either.

> +      continue;
> +    } else if (Tok.is(tok::equal) && State == START) {
> +      State = AFTER_EQ;
> +    } else if (Tok.is(tok::star) && State == EXPECT_RPAR) {
> +      /* cast - stay in AFTER_ID */
> +    } else if (State == AFTER_LPAR) {
> +      State = START;
> +    } else if (State == EXPECT_RPAR || State == EXPECT_SEMI) {
> +      auto Prev = TI - 1;

I think you should assert that we're not at the start of the token
list, to be safe.

> +      Check->diag(Prev->getLocation(), "macro argument should be enclosed in "
> +                                       "parentheses");
> +      return;
> +    }
> +  }
> +}
> +
> +void MacroParenthesesCheck::registerPPCallbacks(CompilerInstance &Compiler) {
> +  Compiler.getPreprocessor().addPPCallbacks(
> +      llvm::make_unique<MacroParenthesesPPCallbacks>(
> +          &Compiler.getPreprocessor(), this));
> +}
> +
> +} // namespace tidy
> +} // namespace clang
> Index: clang-tidy/misc/MacroParenthesesCheck.h
> ===================================================================
> --- clang-tidy/misc/MacroParenthesesCheck.h
> +++ clang-tidy/misc/MacroParenthesesCheck.h
> @@ -0,0 +1,29 @@
> +//===--- MacroParenthesesCheck.h - clang-tidy--------------------*- C++ -*-===//
> +//
> +//                     The LLVM Compiler Infrastructure
> +//
> +// This file is distributed under the University of Illinois Open Source
> +// License. See LICENSE.TXT for details.
> +//
> +//===----------------------------------------------------------------------===//
> +
> +#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_MISC_MACRO_PARENTHESES_H
> +#define LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_MISC_MACRO_PARENTHESES_H
> +
> +#include "../ClangTidy.h"
> +
> +namespace clang {
> +namespace tidy {
> +
> +class MacroParenthesesCheck : public ClangTidyCheck {
> +public:
> +  MacroParenthesesCheck(StringRef Name, ClangTidyContext *Context)
> +      : ClangTidyCheck(Name, Context) {}
> +  void registerPPCallbacks(CompilerInstance &Compiler) override;
> +};
> +
> +} // namespace tidy
> +} // namespace clang
> +
> +#endif // LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_MISC_MACRO_PARENTHESES_H
> +
> Index: test/clang-tidy/misc-macro-parentheses.cpp
> ===================================================================
> --- test/clang-tidy/misc-macro-parentheses.cpp
> +++ test/clang-tidy/misc-macro-parentheses.cpp
> @@ -0,0 +1,32 @@
> +// RUN: $(dirname %s)/check_clang_tidy.sh %s misc-macro-parentheses %t
> +// REQUIRES: shell
> +
> +#define BAD1              -1
> +// CHECK-MESSAGES: :[[@LINE-1]]:27: warning: macro replacement list should be enclosed in parentheses [misc-macro-parentheses]
> +#define BAD2              1+2
> +// CHECK-MESSAGES: :[[@LINE-1]]:27: warning: macro replacement list should be enclosed in parentheses [misc-macro-parentheses]
> +#define BAD3(A)           (A+1)
> +// CHECK-MESSAGES: :[[@LINE-1]]:28: warning: macro argument should be enclosed in parentheses [misc-macro-parentheses]
> +#define BAD4(x)           ((unsigned char)(x & 0xff))
> +// CHECK-MESSAGES: :[[@LINE-1]]:44: warning: macro argument should be enclosed in parentheses [misc-macro-parentheses]
> +
> +#define GOOD1             1
> +#define GOOD2             (1+2)
> +#define GOOD3(A)          #A
> +#define GOOD4(A,B)        A ## B
> +#define GOOD5(T)          ((T*)0)
> +#define GOOD6(B)          "A" B "C"
> +#define GOOD7(b)          A b
> +#define GOOD8(a)          a B
> +#define GOOD9(type)       (type(123))
> +#define GOOD10(car, ...)  car
> +#define GOOD11            a[b+c]
> +#define GOOD12(x)         a[x]
> +#define GOOD13(x)         a.x
> +#define GOOD14(x)         a->x
> +#define GOOD14(x)         ({ int a = x; a+4; })
> +#define GOOD15(x)         a_ ## x, b_ ## x = c_ ## x - 1,
> +
> +// These are allowed for now..
> +#define MAYBE1            *12.34
> +#define MAYBE2            <<3
>

~Aaron




More information about the cfe-commits mailing list