[clang-tools-extra] [clang-tidy] Add new check bugprone-tagged-union-member-count (PR #89925)

via cfe-commits cfe-commits at lists.llvm.org
Wed Apr 24 06:31:45 PDT 2024


https://github.com/tigbr created https://github.com/llvm/llvm-project/pull/89925

This patch introduces a new check to find mismatches between the number of data members in a union and the number enum values present in variant-like structures.

Variant-like types can look something like this:

```c++
struct variant {
    enum {
        tag1,
        tag2,
    } kind;
    union {
        int i;
        char c;
  } data;
};
```

The kind data member of the variant is supposed to tell which data member of the union is valid, however if there are fewer enum values than union members, then it is likely a mistake.

The opposite is not that obvious, because it might be fine to have more enum values than union data members, but for the time being I am curious how many real bugs can be caught if we give a warning regardless.

This patch also contains a heuristic where we try to guess whether the last enum constant is actually supposed to be a tag value for the variant or whether it is just holding how many enum constants have been created.

I am still collecting data on how many bugs this check can catch on open source software and I will update the pull-request with the results.

>From 0dff511a0f63480dea527b6e823dcf230755f312 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?G=C3=A1bor=20T=C3=B3thv=C3=A1ri?= <tigbrcode at protonmail.com>
Date: Sun, 3 Mar 2024 22:30:32 +0100
Subject: [PATCH] [clang-tidy] Add new check bugprone-tagged-union-member-count

This patch introduces a new check to find mismatches between the number
of data members in a union and the number enum values present in
variant-like structures.

Variant-like types can look something like this:

```c++
struct variant {
    enum {
        tag1,
        tag2,
    } kind;
    union {
        int i;
        char c;
  } data;
};
```

The kind data member of the variant is supposed to tell which data
member of the union is valid, however if there are fewer enum values
than union members, then it is likely a mistake.

The opposite is not that obvious, because it might be fine to have more
enum values than union data members, but for the time being I am curious
how many real bugs can be caught if we give a warning regardless.

This patch also contains a heuristic where we try to guess whether the
last enum constant is actually supposed to be a tag value for the
variant or whether it is just holding how many enum constants have been
created.

I am still collecting data on how many bugs this check can catch on open
source software and I will update the pull-request with the results.
---
 .../bugprone/BugproneTidyModule.cpp           |   3 +
 .../clang-tidy/bugprone/CMakeLists.txt        |   1 +
 .../bugprone/TaggedUnionMemberCountCheck.cpp  | 125 ++++++++++++
 .../bugprone/TaggedUnionMemberCountCheck.h    |  31 +++
 clang-tools-extra/docs/ReleaseNotes.rst       |   6 +
 .../bugprone/tagged-union-member-count.rst    |  66 +++++++
 .../docs/clang-tidy/checks/list.rst           |   1 +
 .../bugprone/tagged-union-member-count.cpp    | 181 ++++++++++++++++++
 8 files changed, 414 insertions(+)
 create mode 100644 clang-tools-extra/clang-tidy/bugprone/TaggedUnionMemberCountCheck.cpp
 create mode 100644 clang-tools-extra/clang-tidy/bugprone/TaggedUnionMemberCountCheck.h
 create mode 100644 clang-tools-extra/docs/clang-tidy/checks/bugprone/tagged-union-member-count.rst
 create mode 100644 clang-tools-extra/test/clang-tidy/checkers/bugprone/tagged-union-member-count.cpp

diff --git a/clang-tools-extra/clang-tidy/bugprone/BugproneTidyModule.cpp b/clang-tools-extra/clang-tidy/bugprone/BugproneTidyModule.cpp
index 2931325d8b5798..d9e094e11bdf09 100644
--- a/clang-tools-extra/clang-tidy/bugprone/BugproneTidyModule.cpp
+++ b/clang-tools-extra/clang-tidy/bugprone/BugproneTidyModule.cpp
@@ -76,6 +76,7 @@
 #include "SuspiciousStringviewDataUsageCheck.h"
 #include "SwappedArgumentsCheck.h"
 #include "SwitchMissingDefaultCaseCheck.h"
+#include "TaggedUnionMemberCountCheck.h"
 #include "TerminatingContinueCheck.h"
 #include "ThrowKeywordMissingCheck.h"
 #include "TooSmallLoopVariableCheck.h"
@@ -223,6 +224,8 @@ class BugproneModule : public ClangTidyModule {
         "bugprone-suspicious-stringview-data-usage");
     CheckFactories.registerCheck<SwappedArgumentsCheck>(
         "bugprone-swapped-arguments");
+    CheckFactories.registerCheck<TaggedUnionMemberCountCheck>(
+        "bugprone-tagged-union-member-count");
     CheckFactories.registerCheck<TerminatingContinueCheck>(
         "bugprone-terminating-continue");
     CheckFactories.registerCheck<ThrowKeywordMissingCheck>(
diff --git a/clang-tools-extra/clang-tidy/bugprone/CMakeLists.txt b/clang-tools-extra/clang-tidy/bugprone/CMakeLists.txt
index 081ba67efe1538..5e7fa08aece02e 100644
--- a/clang-tools-extra/clang-tidy/bugprone/CMakeLists.txt
+++ b/clang-tools-extra/clang-tidy/bugprone/CMakeLists.txt
@@ -71,6 +71,7 @@ add_clang_library(clangTidyBugproneModule
   SuspiciousSemicolonCheck.cpp
   SuspiciousStringCompareCheck.cpp
   SwappedArgumentsCheck.cpp
+  TaggedUnionMemberCountCheck.cpp
   TerminatingContinueCheck.cpp
   ThrowKeywordMissingCheck.cpp
   TooSmallLoopVariableCheck.cpp
diff --git a/clang-tools-extra/clang-tidy/bugprone/TaggedUnionMemberCountCheck.cpp b/clang-tools-extra/clang-tidy/bugprone/TaggedUnionMemberCountCheck.cpp
new file mode 100644
index 00000000000000..6a7b5a92902c86
--- /dev/null
+++ b/clang-tools-extra/clang-tidy/bugprone/TaggedUnionMemberCountCheck.cpp
@@ -0,0 +1,125 @@
+//===--- TaggedUnionMemberCountCheck.cpp - clang-tidy ---------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "TaggedUnionMemberCountCheck.h"
+#include "clang/AST/ASTContext.h"
+#include "clang/AST/Expr.h"
+#include "clang/ASTMatchers/ASTMatchFinder.h"
+
+#include "clang/AST/PrettyPrinter.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/Support/Casting.h"
+#include <limits>
+
+using namespace clang::ast_matchers;
+
+namespace clang::tidy::bugprone {
+
+void TaggedUnionMemberCountCheck::registerMatchers(MatchFinder *Finder) {
+  Finder->addMatcher(
+      recordDecl(
+          allOf(isStruct(),
+                has(fieldDecl(hasType(recordDecl(isUnion()).bind("union")))),
+                has(fieldDecl(hasType(enumDecl().bind("tags"))))))
+          .bind("root"),
+      this);
+}
+
+static bool hasMultipleUnionsOrEnums(const RecordDecl *rec) {
+  int tags = 0;
+  int unions = 0;
+  for (const FieldDecl *r : rec->fields()) {
+    TypeSourceInfo *info = r->getTypeSourceInfo();
+    QualType qualtype = info->getType();
+    const Type *type = qualtype.getTypePtr();
+    if (type->isUnionType())
+      unions += 1;
+    else if (type->isEnumeralType())
+      tags += 1;
+    if (tags > 1 || unions > 1)
+      return true;
+  }
+  return false;
+}
+
+static int64_t getNumberOfValidEnumValues(const EnumDecl *ed) {
+  int64_t maxTagValue = std::numeric_limits<int64_t>::min();
+  int64_t minTagValue = std::numeric_limits<int64_t>::max();
+
+  // Heuristic for counter enum constants.
+  //
+  //   enum tag_with_counter {
+  //     tag1,
+  //     tag2,
+  //     tag_count, <-- Searching for these enum constants
+  //   };
+  //
+  // The 'ce' prefix is used to abbreviate counterEnum.
+  // The final tag count is decreased by 1 if and only if:
+  // 1. The number of counting enum constants = 1,
+  int ceCount = 0;
+  // 2. The counting enum constant is the last enum constant that is defined,
+  int ceFirstIndex = 0;
+  // 3. The value of the counting enum constant is the largest out of every enum constant.
+  int64_t ceValue = 0;
+
+  int64_t enumConstantsCount = 0;
+  for (auto En : llvm::enumerate(ed->enumerators())) {
+    enumConstantsCount += 1;
+
+    int64_t enumValue = En.value()->getInitVal().getExtValue();
+    StringRef enumName = En.value()->getName();
+
+    if (enumValue > maxTagValue)
+      maxTagValue = enumValue;
+    if (enumValue < minTagValue)
+      minTagValue = enumValue;
+
+    if (enumName.ends_with_insensitive("count")) {
+      if (ceCount == 0) {
+        ceFirstIndex = En.index();
+      }
+      ceValue = enumValue;
+      ceCount += 1;
+    }
+  }
+
+  int64_t validValuesCount = maxTagValue - minTagValue + 1;
+  if (ceCount == 1 &&
+      ceFirstIndex == enumConstantsCount - 1 &&
+      ceValue == maxTagValue) {
+    validValuesCount -= 1;
+  }
+  return validValuesCount;
+}
+
+void TaggedUnionMemberCountCheck::check(
+    const MatchFinder::MatchResult &Result) {
+  const auto *root = Result.Nodes.getNodeAs<RecordDecl>("root");
+  const auto *unionMatch = Result.Nodes.getNodeAs<RecordDecl>("union");
+  const auto *tagMatch = Result.Nodes.getNodeAs<EnumDecl>("tags");
+
+  if (hasMultipleUnionsOrEnums(root))
+    return;
+
+  int64_t unionMemberCount = llvm::range_size(unionMatch->fields());
+  int64_t tagCount = getNumberOfValidEnumValues(tagMatch);
+
+  // FIXME: Maybe a emit a note when a counter enum constant was found.
+  if (unionMemberCount > tagCount) {
+    diag(root->getLocation(), "Tagged union has more data members than tags! "
+                              "Data members: %0 Tags: %1")
+        << unionMemberCount << tagCount;
+  } else if (unionMemberCount < tagCount) {
+    diag(root->getLocation(), "Tagged union has fewer data members than tags! "
+                              "Data members: %0 Tags: %1")
+        << unionMemberCount << tagCount;
+  }
+}
+
+} // namespace clang::tidy::bugprone
diff --git a/clang-tools-extra/clang-tidy/bugprone/TaggedUnionMemberCountCheck.h b/clang-tools-extra/clang-tidy/bugprone/TaggedUnionMemberCountCheck.h
new file mode 100644
index 00000000000000..bde42da23ff38f
--- /dev/null
+++ b/clang-tools-extra/clang-tidy/bugprone/TaggedUnionMemberCountCheck.h
@@ -0,0 +1,31 @@
+//===--- TaggedUnionMemberCountCheck.h - clang-tidy -------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_BUGPRONE_TAGGEDUNIONMEMBERCOUNTCHECK_H
+#define LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_BUGPRONE_TAGGEDUNIONMEMBERCOUNTCHECK_H
+
+#include "../ClangTidyCheck.h"
+
+namespace clang::tidy::bugprone {
+
+// Gives warnings for tagged unions, where the number of tags is
+// different from the number of data members inside the union.
+//
+// For the user-facing documentation see:
+// http://clang.llvm.org/extra/clang-tidy/checks/bugprone/tagged-union-member-count.html
+class TaggedUnionMemberCountCheck : public ClangTidyCheck {
+public:
+  TaggedUnionMemberCountCheck(StringRef Name, ClangTidyContext *Context)
+      : ClangTidyCheck(Name, Context) {}
+  void registerMatchers(ast_matchers::MatchFinder *Finder) override;
+  void check(const ast_matchers::MatchFinder::MatchResult &Result) override;
+};
+
+} // namespace clang::tidy::bugprone
+
+#endif // LLVM_CLANG_TOOLS_EXTRA_CLANG_TIDY_BUGPRONE_TAGGEDUNIONMEMBERCOUNTCHECK_H
diff --git a/clang-tools-extra/docs/ReleaseNotes.rst b/clang-tools-extra/docs/ReleaseNotes.rst
index 78b09d23d4427f..9f9069a8d08923 100644
--- a/clang-tools-extra/docs/ReleaseNotes.rst
+++ b/clang-tools-extra/docs/ReleaseNotes.rst
@@ -129,6 +129,12 @@ New checks
   Replaces certain conditional statements with equivalent calls to
   ``std::min`` or ``std::max``.
 
+- New :doc:`bugprone-tagged-union-member-count
+  <clang-tidy/checks/bugprone/tagged-union-member-count>` check.
+
+  Warns about suspicious looking tagged unions where the number of enum
+  constants and the number of union data members are not equal.
+
 New check aliases
 ^^^^^^^^^^^^^^^^^
 
diff --git a/clang-tools-extra/docs/clang-tidy/checks/bugprone/tagged-union-member-count.rst b/clang-tools-extra/docs/clang-tidy/checks/bugprone/tagged-union-member-count.rst
new file mode 100644
index 00000000000000..7eac93c6c96021
--- /dev/null
+++ b/clang-tools-extra/docs/clang-tidy/checks/bugprone/tagged-union-member-count.rst
@@ -0,0 +1,66 @@
+.. title:: clang-tidy - bugprone-tagged-union-member-count
+
+bugprone-tagged-union-member-count
+==================================
+
+Gives warnings for tagged unions, where the number of tags is
+different from the number of data members inside the union.
+
+A struct or a class is considered to be a tagged union if it has
+exactly one union data member and exactly one enum data member and
+any number of other data members that are neither unions or enums.
+
+Example
+-------
+
+.. code-block:: c++
+
+  enum tags2 {
+    tag1,
+    tag2,
+  };
+
+  struct taggedunion { // warning: Tagged union has more data members than tags! Data members: 3 Tags: 2 [bugprone-tagged-union-member-count]
+    enum tags2 kind;
+    union {
+      int i;
+      float f;
+      char *str;
+    } data;
+  };
+
+  enum tags4 {
+    tag1,
+    tag2,
+    tag3,
+    tag4,
+  };
+  
+  struct taggedunion { // warning: Tagged union has more fewer members than tags! Data members: 3 Tags: 4 [bugprone-tagged-union-member-count]
+    enum tags4 kind;
+    union {
+      int i;
+      float f;
+      char *str;
+    } data;
+  };
+
+Counting enum constant heuristic
+--------------------------------
+
+Some programmers use enums in such a way, where the value of the last enum 
+constant is used to keep track of how many enum constants have been declared.
+
+.. code-block:: c++
+
+  enum tags_with_counter {
+    tag1, // is 0
+    tag2, // is 1
+    tag3, // is 2
+    tags_count, // is 3
+  };
+
+The checker detects this usage pattern heuristically and does not include
+the counter enum constant in the final tag count, since the counter is not
+meant to indicate the valid variant member.
+
diff --git a/clang-tools-extra/docs/clang-tidy/checks/list.rst b/clang-tools-extra/docs/clang-tidy/checks/list.rst
index 79e81dd174e4f3..3058bee1357fc9 100644
--- a/clang-tools-extra/docs/clang-tidy/checks/list.rst
+++ b/clang-tools-extra/docs/clang-tidy/checks/list.rst
@@ -142,6 +142,7 @@ Clang-Tidy Checks
    :doc:`bugprone-suspicious-stringview-data-usage <bugprone/suspicious-stringview-data-usage>`,
    :doc:`bugprone-swapped-arguments <bugprone/swapped-arguments>`, "Yes"
    :doc:`bugprone-switch-missing-default-case <bugprone/switch-missing-default-case>`,
+   :doc:`bugprone-tagged-union-member-count <bugprone/tagged-union-member-count>`, "Yes"
    :doc:`bugprone-terminating-continue <bugprone/terminating-continue>`, "Yes"
    :doc:`bugprone-throw-keyword-missing <bugprone/throw-keyword-missing>`,
    :doc:`bugprone-too-small-loop-variable <bugprone/too-small-loop-variable>`,
diff --git a/clang-tools-extra/test/clang-tidy/checkers/bugprone/tagged-union-member-count.cpp b/clang-tools-extra/test/clang-tidy/checkers/bugprone/tagged-union-member-count.cpp
new file mode 100644
index 00000000000000..91165f1bd64e08
--- /dev/null
+++ b/clang-tools-extra/test/clang-tidy/checkers/bugprone/tagged-union-member-count.cpp
@@ -0,0 +1,181 @@
+// RUN: %check_clang_tidy %s bugprone-tagged-union-member-count %t
+
+enum tags3 {
+	tags3_1,
+	tags3_2,
+	tags3_3,
+};
+
+enum tags4 {
+	tags4_1,
+	tags4_2,
+	tags4_3,
+	tags4_4,
+};
+
+enum tags5 {
+	tags5_1,
+	tags5_2,
+	tags5_3,
+	tags5_4,
+	tags5_5,
+};
+
+union union3 {
+	short *shorts;
+	int *ints;
+	float *floats;
+};
+
+union union4 {
+	short *shorts;
+	double *doubles;
+	int *ints;
+	float *floats;
+};
+
+// It is not obvious which enum is the tag for the union.
+struct taggedunion2 { // No warnings expected.
+	enum tags3 tagA;
+	enum tags4 tagB;
+	union union4 data;
+};
+
+// It is not obvious which union does the tag belong to.
+struct taggedunion4 { // No warnings expected.
+	enum tags3 tag;
+	union union3 dataB;
+	union union3 dataA;
+};
+
+struct taggedunion1 { // CHECK-MESSAGES: :[[@LINE]]:8: warning: Tagged union has more data members than tags! Data members: 4 Tags: 3 [bugprone-tagged-union-member-count]
+	enum tags3 tag;
+    union union4 data;
+};
+
+struct taggedunion5 { // CHECK-MESSAGES: :[[@LINE]]:8: warning: Tagged union has more data members than tags! Data members: 4 Tags: 3 [bugprone-tagged-union-member-count]
+	enum tags3 tag;
+    union {
+		int *ints;
+		char characters[13];
+		struct {
+			double re;
+			double im;
+		} complex;
+		long l;
+    } data;
+};
+
+struct taggedunion7 { // CHECK-MESSAGES: :[[@LINE]]:8: warning: Tagged union has more data members than tags! Data members: 4 Tags: 3 [bugprone-tagged-union-member-count]
+	enum {
+		tag1,
+		tag2,
+		tag3,
+	} tag;
+	union union4 data;
+};
+
+struct taggedunion8 { // CHECK-MESSAGES: :[[@LINE]]:8: warning: Tagged union has more data members than tags! Data members: 4 Tags: 3 [bugprone-tagged-union-member-count]
+	enum {
+		tag1,
+		tag2,
+		tag3,
+	} tag;
+	union {
+		int *ints;
+		char characters[13];
+		struct {
+			double re;
+			double im;
+		} complex;
+		long l;
+	} data;
+};
+
+struct nested1 { // CHECK-MESSAGES: :[[@LINE]]:8: warning: Tagged union has more data members than tags! Data members: 4 Tags: 3 [bugprone-tagged-union-member-count]
+	enum tags3 tag;
+	union {
+		char c;
+		short s;
+		int i;
+		struct { // CHECK-MESSAGES: :[[@LINE]]:3: warning: Tagged union has fewer data members than tags! Data members: 4 Tags: 5 [bugprone-tagged-union-member-count]
+			enum tags5 tag;
+			union union4 data;
+		} inner;
+	} data;
+};
+
+struct nested2 {
+	enum tags3 tag;
+	union {
+		float f;
+		int i;
+		struct { // CHECK-MESSAGES: :[[@LINE]]:3: warning: Tagged union has more data members than tags! Data members: 4 Tags: 3 [bugprone-tagged-union-member-count]
+			enum tags3 tag;
+			union union4 data;
+		} inner;
+	} data;
+};
+
+struct nested3 { // CHECK-MESSAGES: :[[@LINE]]:8: warning: Tagged union has fewer data members than tags! Data members: 2 Tags: 3 [bugprone-tagged-union-member-count]
+	enum tags3 tag;
+	union {
+		float f;
+		int i;
+		struct innerdecl { // CHECK-MESSAGES: :[[@LINE]]:10: warning: Tagged union has more data members than tags! Data members: 4 Tags: 3 [bugprone-tagged-union-member-count]
+			enum tags3 tag;
+			union union4 data;
+		}; 
+	} data;
+};
+
+enum tag_with_counter_lowercase {
+	node_type_loop,
+	node_type_branch,
+	node_type_function,
+	node_type_count,
+};
+
+enum tag_with_counter_uppercase {
+	NODE_TYPE_LOOP,
+	NODE_TYPE_BRANCH,
+	NODE_TYPE_FUNCTION,
+	NODE_TYPE_COUNT,
+};
+
+struct taggedunion10 { // CHECK-MESSAGES: :[[@LINE]]:8: warning: Tagged union has more data members than tags! Data members: 4 Tags: 3 [bugprone-tagged-union-member-count]
+	enum tag_with_counter_lowercase tag;
+	union union4 data;
+};
+
+struct taggedunion11 { // CHECK-MESSAGES: :[[@LINE]]:8: warning: Tagged union has more data members than tags! Data members: 4 Tags: 3 [bugprone-tagged-union-member-count]
+	enum tag_with_counter_uppercase tag;
+	union union4 data;
+};
+
+// Without the counter enum constant the number of tags
+// and the number data members are equal.
+struct taggedunion12 { // No warnings expected.
+	enum tag_with_counter_uppercase tag;
+	union union3 data;
+};
+
+// Technically this means that every enum value is defined from 0-256 and therefore a warning is given.
+enum mycolor {
+	mycolor_black = 0x00,
+	mycolor_gray  = 0xcc,
+	mycolor_white = 0xff,
+};
+
+struct taggedunion9 { // CHECK-MESSAGES: :[[@LINE]]:8: warning: Tagged union has fewer data members than tags! Data members: 3 Tags: 256 [bugprone-tagged-union-member-count]
+	enum mycolor tag;
+	union {
+		int a;
+		float b;
+		struct {
+			double re;
+			double im;
+		} complex;
+	} data;
+};
+



More information about the cfe-commits mailing list