[PATCH] clang-tidy checker that enforce proper parentheses in macros
Aaron Ballman
aaron.ballman at gmail.com
Mon May 11 09:36:53 PDT 2015
On Wed, May 6, 2015 at 10:16 AM, Daniel Marjamäki
<daniel.marjamaki at evidente.se> wrote:
> Hi aaron.ballman, silvas,
>
> This clang-tidy checker will make sure macros have proper parentheses.
>
> I previously made a compiler-check for this but that only warned when bad usage was seen. This clang-tidy checker warns when dangerous macros are seen.
>
> The rules are:
> - Expressions must be surrounded with parentheses.
> - Arguments must be surrounded with parentheses.
>
> I have tested this on 193 debian projects so far. there was 47668 warnings. I have not looked at every warning in detail but overall it looks very good. The majority of the warnings point out dangerous macros that can be misused, and I therefore classify them as true positives. Maybe there are some 1-5% that are "false positives", in most of these cases it would not hurt to add parentheses.
>
> Example false positives:
>
> ```
> #define A (INT)&a
> ```
>
> Is the & a bitand or a address-of operator. It's not trivial to know when looking at the tokens. Putting parentheses around the macro doesn't hurt.
>
> ```
> #define A INT*
> ```
>
> ... I warn about this. Can be fixed by using typedef instead. But maybe user wants to have it this way.
>
>
> and then sometimes there are warnings inside ({..}) that I think ideally would not be shown. I could bailout for instance when a { is seen in the macro, but most warnings are true positives so I think such bailout would mostly cause false negatives.
I have some fairly minor nits below, but my biggest concern is the
number of diagnostics you are seeing. ~48k new diagnostics issues is
*really* chatty!
> Index: clang-tidy/misc/MiscTidyModule.cpp
> ===================================================================
> --- clang-tidy/misc/MiscTidyModule.cpp
> +++ clang-tidy/misc/MiscTidyModule.cpp
> @@ -16,6 +16,7 @@
> #include "BoolPointerImplicitConversionCheck.h"
> #include "InaccurateEraseCheck.h"
> #include "InefficientAlgorithmCheck.h"
> +#include "MacroParenthesesCheck.h"
> #include "StaticAssertCheck.h"
> #include "SwappedArgumentsCheck.h"
> #include "UndelegatedConstructor.h"
> @@ -41,6 +42,8 @@
> "misc-inaccurate-erase");
> CheckFactories.registerCheck<InefficientAlgorithmCheck>(
> "misc-inefficient-algorithm");
> + CheckFactories.registerCheck<MacroParenthesesCheck>(
> + "misc-macro-parentheses");
> CheckFactories.registerCheck<StaticAssertCheck>(
> "misc-static-assert");
> CheckFactories.registerCheck<SwappedArgumentsCheck>(
> Index: clang-tidy/misc/CMakeLists.txt
> ===================================================================
> --- clang-tidy/misc/CMakeLists.txt
> +++ clang-tidy/misc/CMakeLists.txt
> @@ -7,6 +7,7 @@
> BoolPointerImplicitConversionCheck.cpp
> InaccurateEraseCheck.cpp
> InefficientAlgorithmCheck.cpp
> + MacroParenthesesCheck.cpp
> MiscTidyModule.cpp
> StaticAssertCheck.cpp
> SwappedArgumentsCheck.cpp
> Index: clang-tidy/misc/MacroParenthesesCheck.cpp
> ===================================================================
> --- clang-tidy/misc/MacroParenthesesCheck.cpp
> +++ clang-tidy/misc/MacroParenthesesCheck.cpp
> @@ -0,0 +1,166 @@
> +//===--- MacroParenthesesCheck.cpp - clang-tidy----------------------------===//
> +//
> +// The LLVM Compiler Infrastructure
> +//
> +// This file is distributed under the University of Illinois Open Source
> +// License. See LICENSE.TXT for details.
> +//
> +//===----------------------------------------------------------------------===//
> +
> +#include "MacroParenthesesCheck.h"
> +#include "clang/Frontend/CompilerInstance.h"
> +#include "clang/Lex/PPCallbacks.h"
> +#include "clang/Lex/Preprocessor.h"
> +
> +namespace clang {
> +namespace tidy {
> +
> +namespace {
> +class MacroParenthesesPPCallbacks : public PPCallbacks {
> +public:
> + explicit MacroParenthesesPPCallbacks(Preprocessor *PP,
> + MacroParenthesesCheck *Check)
> + : PP(PP), Check(Check) {}
> +
> + void MacroDefined(const Token &MacroNameTok,
> + const MacroDirective *MD) override {
> + replacementList(MacroNameTok, MD);
> + argument(MacroNameTok, MD);
> + }
> +
> +private:
> + /** replacement list with calculations should be enclosed in parentheses */
Here and elsewhere, comments should be complete sentences
(capitalization and punctuation included).
> + void replacementList(const Token &MacroNameTok, const MacroDirective *MD);
> +
> + /** arguments should be enclosed in parentheses */
> + void argument(const Token &MacroNameTok, const MacroDirective *MD);
> +
> + Preprocessor *PP;
> + MacroParenthesesCheck *Check;
> +};
> +}
> +
> +void MacroParenthesesPPCallbacks::replacementList(const Token &MacroNameTok,
> + const MacroDirective *MD) {
Why are you passing in a MacroDirective object when you only use it to
get the MacroInfo object?
> + const MacroInfo *MI = MD->getMacroInfo();
> +
> + // bigger than 0 => we are inside parentheses / braces / squares
> + int Indent = 0;
I'm not certain I understand why this is called Indent?
> +
> + // Is there a previous comma?
> + bool Comma = false;
> +
> + // Have we seen some calculation outside parentheses/braces/squares
> + bool Err = false;
> +
> + for (auto TI = MI->tokens_begin(), TE = MI->tokens_end(); TI != TE; ++TI) {
> + const Token &Tok = *TI;
> + if (Tok.is(tok::l_paren) || Tok.is(tok::l_brace) || Tok.is(tok::l_square)) {
> + ++Indent;
> + } else if (Tok.is(tok::r_paren) || Tok.is(tok::r_brace) ||
> + Tok.is(tok::r_square)) {
> + --Indent;
Do we also have to worry about angle brackets for C++ code?
> + } else if (Indent <= 0 && Tok.is(tok::comma)) {
> + if (Comma)
> + Err = false;
> + Comma = true;
> + } else if (Indent <= 0 &&
> + (Tok.is(tok::plus) || Tok.is(tok::minus) || Tok.is(tok::star) ||
> + Tok.is(tok::slash) || Tok.is(tok::percent) ||
> + Tok.is(tok::amp) || Tok.is(tok::pipe) || Tok.is(tok::caret) ||
> + Tok.is(tok::question) || Tok.is(tok::colon))) {
I would prefer to see this list split out into a separate helper
function with more explanation as to why something would qualify to be
in the list.
> + // heuristic for macros that are clearly not intended to be enclosed in
> + // parentheses, macro contains just an operator and a number
> + // #define X *10
> + if (TI == MI->tokens_begin() && !Tok.is(tok::plus) &&
> + !Tok.is(tok::minus) && (TI + 1) != TE &&
> + (TI + 1)->is(tok::numeric_constant) && (TI + 2) == TE) {
This expression is kind of obtuse.
> + return;
> + }
> + Err = true;
> + if (!Comma)
> + break;
> + }
> + }
> + if (Err) {
> + SourceLocation Loc = MI->tokens_begin()->getLocation();
> + Check->diag(Loc,
> + "macro replacement list should be enclosed in parentheses");
> + }
> +}
> +
> +void MacroParenthesesPPCallbacks::argument(const Token &MacroNameTok,
> + const MacroDirective *MD) {
> + const MacroInfo *MI = MD->getMacroInfo();
Why are you passing in a MacroDirective object when you only use it to
get the MacroInfo object?
> + enum { START, AFTER_LPAR, EXPECT_RPAR, AFTER_EQ, EXPECT_SEMI } State = START;
Please do not use all caps enumerators.
> + for (auto TI = MI->tokens_begin(), TE = MI->tokens_end(); TI != TE; ++TI) {
> + const Token &Tok = *TI;
> + if (Tok.is(tok::hash) || Tok.is(tok::hashhash)) {
> + // Skip next token
> + ++TI;
> + if (TI == TE)
> + break;
I don't think that you need the if check; the continue will take care
of that for you.
> + continue;
> + } else if (Tok.is(tok::l_paren) || Tok.is(tok::l_square)) {
> + State = AFTER_LPAR;
> + } else if (Tok.is(tok::r_paren) || Tok.is(tok::r_square)) {
> + State = START;
> + } else if (State == EXPECT_SEMI && Tok.is(tok::semi)) {
> + State = START;
> + } else if ((Tok.is(tok::identifier) || Tok.is(tok::raw_identifier)) &&
> + MI->getArgumentNum(Tok.getIdentifierInfo()) >= 0) {
> + // Ignore this if it's the first token
> + if (TI == MI->tokens_begin())
> + continue;
> + // Ignore this if previous token is a string literal, identifier, or a
> + // period/arrow
> + if (TI != MI->tokens_begin() &&
> + ((TI - 1)->is(tok::string_literal) || (TI - 1)->is(tok::arrow) ||
> + (TI - 1)->is(tok::period) || (TI - 1)->is(tok::identifier) ||
> + (TI - 1)->is(tok::raw_identifier)))
> + continue;
> + // Ignore this if next token is ##, a string literal, or a identifier
> + if ((TI + 1) != TE &&
> + ((TI + 1)->is(tok::hashhash) || (TI + 1)->is(tok::string_literal) ||
> + (TI + 1)->is(tok::identifier) || (TI + 1)->is(tok::raw_identifier)))
> + continue;
The prev and next expressions are pretty obtuse and repetetive.
> + if (State == AFTER_LPAR) {
> + State = EXPECT_RPAR;
> + } else if (State == AFTER_EQ) {
> + State = EXPECT_SEMI;
> + } else {
> + Check->diag(Tok.getLocation(), "macro argument should be enclosed in "
> + "parentheses");
> + return;
> + }
> + } else if (Tok.is(tok::comma)) {
> + State = AFTER_LPAR;
> + } else if (Tok.is(tok::period)) {
> + // Skip next token
> + ++TI;
> + if (TI == TE)
> + break;
I don't think the if check is required here either.
> + continue;
> + } else if (Tok.is(tok::equal) && State == START) {
> + State = AFTER_EQ;
> + } else if (Tok.is(tok::star) && State == EXPECT_RPAR) {
> + /* cast - stay in AFTER_ID */
> + } else if (State == AFTER_LPAR) {
> + State = START;
> + } else if (State == EXPECT_RPAR || State == EXPECT_SEMI) {
> + auto Prev = TI - 1;
I think you should assert that we're not at the start of the token
list, to be safe.
> + Check->diag(Prev->getLocation(), "macro argument should be enclosed in "
> + "parentheses");
> + return;
> + }
> + }
> +}
> +
> +void MacroParenthesesCheck::registerPPCallbacks(CompilerInstance &Compiler) {
> + Compiler.getPreprocessor().addPPCallbacks(
> + llvm::make_unique<MacroParenthesesPPCallbacks>(
> + &Compiler.getPreprocessor(), this));
> +}
> +
> +} // namespace tidy
> +} // namespace clang
> Index: clang-tidy/misc/MacroParenthesesCheck.h
> ===================================================================
> --- clang-tidy/misc/MacroParenthesesCheck.h
> +++ clang-tidy/misc/MacroParenthesesCheck.h
> @@ -0,0 +1,29 @@
> +//===--- MacroParenthesesCheck.h - clang-tidy--------------------*- C++ -*-===//
> +//
> +// The LLVM Compiler Infrastructure
> +//
> +// This file is distributed under the University of Illinois Open Source
> +// License. See LICENSE.TXT for details.
> +//
> +//===----------------------------------------------------------------------===//
> +
> +#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_MISC_MACRO_PARENTHESES_H
> +#define LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_MISC_MACRO_PARENTHESES_H
> +
> +#include "../ClangTidy.h"
> +
> +namespace clang {
> +namespace tidy {
> +
> +class MacroParenthesesCheck : public ClangTidyCheck {
> +public:
> + MacroParenthesesCheck(StringRef Name, ClangTidyContext *Context)
> + : ClangTidyCheck(Name, Context) {}
> + void registerPPCallbacks(CompilerInstance &Compiler) override;
> +};
> +
> +} // namespace tidy
> +} // namespace clang
> +
> +#endif // LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_MISC_MACRO_PARENTHESES_H
> +
> Index: test/clang-tidy/misc-macro-parentheses.cpp
> ===================================================================
> --- test/clang-tidy/misc-macro-parentheses.cpp
> +++ test/clang-tidy/misc-macro-parentheses.cpp
> @@ -0,0 +1,32 @@
> +// RUN: $(dirname %s)/check_clang_tidy.sh %s misc-macro-parentheses %t
> +// REQUIRES: shell
> +
> +#define BAD1 -1
> +// CHECK-MESSAGES: :[[@LINE-1]]:27: warning: macro replacement list should be enclosed in parentheses [misc-macro-parentheses]
> +#define BAD2 1+2
> +// CHECK-MESSAGES: :[[@LINE-1]]:27: warning: macro replacement list should be enclosed in parentheses [misc-macro-parentheses]
> +#define BAD3(A) (A+1)
> +// CHECK-MESSAGES: :[[@LINE-1]]:28: warning: macro argument should be enclosed in parentheses [misc-macro-parentheses]
> +#define BAD4(x) ((unsigned char)(x & 0xff))
> +// CHECK-MESSAGES: :[[@LINE-1]]:44: warning: macro argument should be enclosed in parentheses [misc-macro-parentheses]
> +
> +#define GOOD1 1
> +#define GOOD2 (1+2)
> +#define GOOD3(A) #A
> +#define GOOD4(A,B) A ## B
> +#define GOOD5(T) ((T*)0)
> +#define GOOD6(B) "A" B "C"
> +#define GOOD7(b) A b
> +#define GOOD8(a) a B
> +#define GOOD9(type) (type(123))
> +#define GOOD10(car, ...) car
> +#define GOOD11 a[b+c]
> +#define GOOD12(x) a[x]
> +#define GOOD13(x) a.x
> +#define GOOD14(x) a->x
> +#define GOOD14(x) ({ int a = x; a+4; })
> +#define GOOD15(x) a_ ## x, b_ ## x = c_ ## x - 1,
> +
> +// These are allowed for now..
> +#define MAYBE1 *12.34
> +#define MAYBE2 <<3
>
~Aaron
More information about the cfe-commits
mailing list