[clang] [analyzer] Make recognition of hardened __FOO_chk functions explicit (PR #86536)

via cfe-commits cfe-commits at lists.llvm.org
Wed Mar 27 05:03:42 PDT 2024


https://github.com/NagyDonat updated https://github.com/llvm/llvm-project/pull/86536

>From fdde1056e8a34ad642f50eef120dbc8ee08f8825 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= <donat.nagy at ericsson.com>
Date: Mon, 18 Mar 2024 14:47:57 +0100
Subject: [PATCH 1/2] [analyzer] Make recognition of hardened __FOO_chk
 functions explicit

In builds that use source hardening (-D_FORTIFY_SOURCE), many standard
functions are implemented as macros that expand to calls of hardened
functions that take one additional argument compared to the "usual"
variant and perform additional input validation. For example, a `memcpy`
call may expand to `__memcpy_chk()` or `__builtin___memcpy_chk()`.

Before this commit, `CallDescription`s created with the matching mode
`CDM::CLibrary` automatically matched these hardened variants (in a
addition to the "usual" function) with a fairly lenient heuristic.

Unfortunately this heuristic meant that the `CLibrary` matching mode was
only usable by checkers that were prepared to handle matches with an
unusual number of arguments.

This commit limits the recognition of the hardened functions to a
separate matching mode `CDM::CLibraryMaybeHardened` and applies this
mode for functions that have hardened variants and were previously
recognized with `CDM::CLibrary`.

This way checkers that are prepared to handle the hardened variants will
be able to detect them easily; while other checkers can simply use
`CDM::CLibrary` for matching C library functions (and they won't
encounter surprising argument counts).

The initial motivation for refactoring this area was that previously
`CDM::CLibrary` accepted calls with more arguments/parameters than the
expected number, so I wasn't able to use it for `malloc` without
accidentally matching calls to the 3-argument BSD kernel malloc.

After this commit this "may have more args/params" logic will only
activate when we're actually matching a hardened variant function (in
`CDM::CLibraryMaybeHardened` mode). The recognition of "sprintf()" and
"snprintf()" in CStringChecker was refactored, because previously it was
abusing the behavior that extra arguments are accepted even if the
matched function is not a hardened variant.

This commit also fixes the oversight that the old code would've
recognized e.g. `__wmemcpy_chk` as a hardened variant of `memcpy`.

After this commit I'm planning to create several follow-up commits that
ensure that checkers looking for C library functions use `CDM::CLibrary`
as a "sane default" matching mode.

This commit is not truly NFC (it eliminates some buggy corner cases),
but it does not intentionally modify the behavior of CSA on real-world
non-crazy code.

As a minor unrelated change I'm eliminating the argument/variable
"IsBuiltin" from the evalSprintf function family in CStringChecker,
because it was completely unused.
---
 .../Core/PathSensitive/CallDescription.h      |  26 ++--
 .../Core/PathSensitive/CheckerContext.h       |  24 +++-
 .../Checkers/CStringChecker.cpp               |  76 +++++++----
 .../Checkers/GenericTaintChecker.cpp          |  27 ++--
 .../StaticAnalyzer/Core/CallDescription.cpp   | 122 +++++++++---------
 .../StaticAnalyzer/Core/CheckerContext.cpp    |  19 ++-
 .../StaticAnalyzer/CallDescriptionTest.cpp    |  54 ++++++--
 7 files changed, 217 insertions(+), 131 deletions(-)

diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallDescription.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallDescription.h
index b4e1636130ca7c..ccfe8d47c290bc 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallDescription.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallDescription.h
@@ -32,19 +32,22 @@ namespace ento {
 class CallDescription {
 public:
   enum class Mode {
-    /// Match calls to functions from the C standard library. On some platforms
-    /// some functions may be implemented as macros that expand to calls to
-    /// built-in variants of the given functions, so in this mode we use some
-    /// heuristics to recognize these implementation-defined variants:
-    ///  - We also accept calls where the name is derived from the specified
-    ///    name by adding "__builtin" or similar prefixes/suffixes.
-    ///  - We also accept calls where the number of arguments or parameters is
-    ///    greater than the specified value.
+    /// Match calls to functions from the C standard library. This also
+    /// recognizes builtin variants whose name is derived by adding
+    /// "__builtin", "__inline" or similar prefixes or suffixes; but only
+    /// matches functions than are externally visible and are declared either
+    /// directly within a TU or in the namespace 'std'.
     /// For the exact heuristics, see CheckerContext::isCLibraryFunction().
-    /// (This mode only matches functions that are declared either directly
-    /// within a TU or in the namespace `std`.)
     CLibrary,
 
+    /// An extended version of the `CLibrary` mode that also matches the
+    /// hardened variants like __FOO_chk() and __builtin__FOO_chk() that take
+    /// additional arguments compared to the "regular" function FOO().
+    /// This is not the default behavior of `CLibrary` because in this case the
+    /// checker code must be prepared to handle the different parametrization.
+    /// For the exact heuristics, see CheckerContext::isHardenedVariantOf().
+    CLibraryMaybeHardened,
+
     /// Matches "simple" functions that are not methods. (Static methods are
     /// methods.)
     SimpleFunc,
@@ -187,6 +190,9 @@ class CallDescription {
 private:
   bool matchesImpl(const FunctionDecl *Callee, size_t ArgCount,
                    size_t ParamCount) const;
+
+  bool matchNameOnly(const NamedDecl *ND) const;
+  bool matchQualifiedNameParts(const Decl *D) const;
 };
 
 /// An immutable map from CallDescriptions to arbitrary data. Provides a unified
diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h
index 9923c41e6ad2d1..0365f9e41312df 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h
@@ -366,19 +366,31 @@ class CheckerContext {
     return getCalleeName(FunDecl);
   }
 
-  /// Returns true if the callee is an externally-visible function in the
-  /// top-level namespace, such as \c malloc.
+  /// Returns true if the given function is an externally-visible function in
+  /// the top-level namespace, such as \c malloc.
   ///
   /// If a name is provided, the function must additionally match the given
   /// name.
   ///
-  /// Note that this deliberately excludes C++ library functions in the \c std
-  /// namespace, but will include C library functions accessed through the
-  /// \c std namespace. This also does not check if the function is declared
-  /// as 'extern "C"', or if it uses C++ name mangling.
+  /// Note that this also accepts functions from the \c std namespace (because
+  /// headers like <cstdlib> declare them there) and does not check if the
+  /// function is declared as 'extern "C"' or if it uses C++ name mangling.
   static bool isCLibraryFunction(const FunctionDecl *FD,
                                  StringRef Name = StringRef());
 
+  /// In builds that use source hardening (-D_FORTIFY_SOURCE), many standard
+  /// functions are implemented as macros that expand to calls of hardened
+  /// functions that take additional arguments compared to the "usual"
+  /// variant and perform additional input validation. For example, a `memcpy`
+  /// call may expand to `__memcpy_chk()` or `__builtin___memcpy_chk()`.
+  ///
+  /// This method returns true if `FD` declares a fortified variant of the
+  /// standard library function `Name`.
+  ///
+  /// NOTE: This method relies on heuristics; extend it if you need to handle a
+  /// hardened variant that's not yet covered by it.
+  static bool isHardenedVariantOf(const FunctionDecl *FD, StringRef Name);
+
   /// Depending on wither the location corresponds to a macro, return
   /// either the macro name or the token spelling.
   ///
diff --git a/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
index 59be236ca1c769..7e18072caf76e0 100644
--- a/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
@@ -124,34 +124,45 @@ class CStringChecker : public Checker< eval::Call,
                                      const CallEvent &)>;
 
   CallDescriptionMap<FnCheck> Callbacks = {
-      {{CDM::CLibrary, {"memcpy"}, 3},
+      {{CDM::CLibraryMaybeHardened, {"memcpy"}, 3},
        std::bind(&CStringChecker::evalMemcpy, _1, _2, _3, CK_Regular)},
-      {{CDM::CLibrary, {"wmemcpy"}, 3},
+      {{CDM::CLibraryMaybeHardened, {"wmemcpy"}, 3},
        std::bind(&CStringChecker::evalMemcpy, _1, _2, _3, CK_Wide)},
-      {{CDM::CLibrary, {"mempcpy"}, 3},
+      {{CDM::CLibraryMaybeHardened, {"mempcpy"}, 3},
        std::bind(&CStringChecker::evalMempcpy, _1, _2, _3, CK_Regular)},
-      {{CDM::Unspecified, {"wmempcpy"}, 3},
+      {{CDM::CLibraryMaybeHardened, {"wmempcpy"}, 3},
        std::bind(&CStringChecker::evalMempcpy, _1, _2, _3, CK_Wide)},
       {{CDM::CLibrary, {"memcmp"}, 3},
        std::bind(&CStringChecker::evalMemcmp, _1, _2, _3, CK_Regular)},
       {{CDM::CLibrary, {"wmemcmp"}, 3},
        std::bind(&CStringChecker::evalMemcmp, _1, _2, _3, CK_Wide)},
-      {{CDM::CLibrary, {"memmove"}, 3},
+      {{CDM::CLibraryMaybeHardened, {"memmove"}, 3},
        std::bind(&CStringChecker::evalMemmove, _1, _2, _3, CK_Regular)},
-      {{CDM::CLibrary, {"wmemmove"}, 3},
+      {{CDM::CLibraryMaybeHardened, {"wmemmove"}, 3},
        std::bind(&CStringChecker::evalMemmove, _1, _2, _3, CK_Wide)},
-      {{CDM::CLibrary, {"memset"}, 3}, &CStringChecker::evalMemset},
+      {{CDM::CLibraryMaybeHardened, {"memset"}, 3},
+       &CStringChecker::evalMemset},
       {{CDM::CLibrary, {"explicit_memset"}, 3}, &CStringChecker::evalMemset},
-      {{CDM::CLibrary, {"strcpy"}, 2}, &CStringChecker::evalStrcpy},
-      {{CDM::CLibrary, {"strncpy"}, 3}, &CStringChecker::evalStrncpy},
-      {{CDM::CLibrary, {"stpcpy"}, 2}, &CStringChecker::evalStpcpy},
-      {{CDM::CLibrary, {"strlcpy"}, 3}, &CStringChecker::evalStrlcpy},
-      {{CDM::CLibrary, {"strcat"}, 2}, &CStringChecker::evalStrcat},
-      {{CDM::CLibrary, {"strncat"}, 3}, &CStringChecker::evalStrncat},
-      {{CDM::CLibrary, {"strlcat"}, 3}, &CStringChecker::evalStrlcat},
-      {{CDM::CLibrary, {"strlen"}, 1}, &CStringChecker::evalstrLength},
+      /* FIXME: C23 introduces 'memset_explicit', maybe also model that */
+      {{CDM::CLibraryMaybeHardened, {"strcpy"}, 2},
+       &CStringChecker::evalStrcpy},
+      {{CDM::CLibraryMaybeHardened, {"strncpy"}, 3},
+       &CStringChecker::evalStrncpy},
+      {{CDM::CLibraryMaybeHardened, {"stpcpy"}, 2},
+       &CStringChecker::evalStpcpy},
+      {{CDM::CLibraryMaybeHardened, {"strlcpy"}, 3},
+       &CStringChecker::evalStrlcpy},
+      {{CDM::CLibraryMaybeHardened, {"strcat"}, 2},
+       &CStringChecker::evalStrcat},
+      {{CDM::CLibraryMaybeHardened, {"strncat"}, 3},
+       &CStringChecker::evalStrncat},
+      {{CDM::CLibraryMaybeHardened, {"strlcat"}, 3},
+       &CStringChecker::evalStrlcat},
+      {{CDM::CLibraryMaybeHardened, {"strlen"}, 1},
+       &CStringChecker::evalstrLength},
       {{CDM::CLibrary, {"wcslen"}, 1}, &CStringChecker::evalstrLength},
-      {{CDM::CLibrary, {"strnlen"}, 2}, &CStringChecker::evalstrnLength},
+      {{CDM::CLibraryMaybeHardened, {"strnlen"}, 2},
+       &CStringChecker::evalstrnLength},
       {{CDM::CLibrary, {"wcsnlen"}, 2}, &CStringChecker::evalstrnLength},
       {{CDM::CLibrary, {"strcmp"}, 2}, &CStringChecker::evalStrcmp},
       {{CDM::CLibrary, {"strncmp"}, 3}, &CStringChecker::evalStrncmp},
@@ -162,9 +173,19 @@ class CStringChecker : public Checker< eval::Call,
       {{CDM::CLibrary, {"bcmp"}, 3},
        std::bind(&CStringChecker::evalMemcmp, _1, _2, _3, CK_Regular)},
       {{CDM::CLibrary, {"bzero"}, 2}, &CStringChecker::evalBzero},
-      {{CDM::CLibrary, {"explicit_bzero"}, 2}, &CStringChecker::evalBzero},
-      {{CDM::CLibrary, {"sprintf"}, 2}, &CStringChecker::evalSprintf},
-      {{CDM::CLibrary, {"snprintf"}, 2}, &CStringChecker::evalSnprintf},
+      {{CDM::CLibraryMaybeHardened, {"explicit_bzero"}, 2},
+       &CStringChecker::evalBzero},
+
+      // When recognizing calls to the following variadic functions, we accept
+      // any number of arguments in the call (std::nullopt = accept any
+      // number), but check that in the declaration there are 2 and 3
+      // parameters respectively. (Note that the parameter count does not
+      // include the "...". Calls where the number of arguments is too small
+      // will be discarded by the callback.)
+      {{CDM::CLibraryMaybeHardened, {"sprintf"}, std::nullopt, 2},
+       &CStringChecker::evalSprintf},
+      {{CDM::CLibraryMaybeHardened, {"snprintf"}, std::nullopt, 3},
+       &CStringChecker::evalSnprintf},
   };
 
   // These require a bit of special handling.
@@ -218,7 +239,7 @@ class CStringChecker : public Checker< eval::Call,
   void evalSprintf(CheckerContext &C, const CallEvent &Call) const;
   void evalSnprintf(CheckerContext &C, const CallEvent &Call) const;
   void evalSprintfCommon(CheckerContext &C, const CallEvent &Call,
-                         bool IsBounded, bool IsBuiltin) const;
+                         bool IsBounded) const;
 
   // Utility methods
   std::pair<ProgramStateRef , ProgramStateRef >
@@ -2467,27 +2488,26 @@ void CStringChecker::evalBzero(CheckerContext &C, const CallEvent &Call) const {
 void CStringChecker::evalSprintf(CheckerContext &C,
                                  const CallEvent &Call) const {
   CurrentFunctionDescription = "'sprintf'";
-  const auto *CE = cast<CallExpr>(Call.getOriginExpr());
-  bool IsBI = CE->getBuiltinCallee() == Builtin::BI__builtin___sprintf_chk;
-  evalSprintfCommon(C, Call, /* IsBounded */ false, IsBI);
+  evalSprintfCommon(C, Call, /* IsBounded = */ false);
 }
 
 void CStringChecker::evalSnprintf(CheckerContext &C,
                                   const CallEvent &Call) const {
   CurrentFunctionDescription = "'snprintf'";
-  const auto *CE = cast<CallExpr>(Call.getOriginExpr());
-  bool IsBI = CE->getBuiltinCallee() == Builtin::BI__builtin___snprintf_chk;
-  evalSprintfCommon(C, Call, /* IsBounded */ true, IsBI);
+  evalSprintfCommon(C, Call, /* IsBounded = */ true);
 }
 
 void CStringChecker::evalSprintfCommon(CheckerContext &C, const CallEvent &Call,
-                                       bool IsBounded, bool IsBuiltin) const {
+                                       bool IsBounded) const {
   ProgramStateRef State = C.getState();
   const auto *CE = cast<CallExpr>(Call.getOriginExpr());
   DestinationArgExpr Dest = {{Call.getArgExpr(0), 0}};
 
   const auto NumParams = Call.parameters().size();
-  assert(CE->getNumArgs() >= NumParams);
+  if (CE->getNumArgs() < NumParams) {
+    // This is an invalid call, let's just ignore it.
+    return;
+  }
 
   const auto AllArguments =
       llvm::make_range(CE->getArgs(), CE->getArgs() + CE->getNumArgs());
diff --git a/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
index 4c48fdf498f7fc..89054512d65ad6 100644
--- a/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
@@ -718,20 +718,23 @@ void GenericTaintChecker::initTaintRules(CheckerContext &C) const {
       {{{"isupper"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
       {{{"isxdigit"}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
 
-      {{CDM::CLibrary, {BI.getName(Builtin::BIstrncat)}},
+      {{CDM::CLibraryMaybeHardened, {BI.getName(Builtin::BIstrncat)}},
        TR::Prop({{1, 2}}, {{0, ReturnValueIndex}})},
-      {{CDM::CLibrary, {BI.getName(Builtin::BIstrlcpy)}},
+      {{CDM::CLibraryMaybeHardened, {BI.getName(Builtin::BIstrlcpy)}},
        TR::Prop({{1, 2}}, {{0}})},
-      {{CDM::CLibrary, {BI.getName(Builtin::BIstrlcat)}},
+      {{CDM::CLibraryMaybeHardened, {BI.getName(Builtin::BIstrlcat)}},
        TR::Prop({{1, 2}}, {{0}})},
-      {{CDM::CLibrary, {{"snprintf"}}},
+      {{CDM::CLibraryMaybeHardened, {{"snprintf"}}},
        TR::Prop({{1}, 3}, {{0, ReturnValueIndex}})},
-      {{CDM::CLibrary, {{"sprintf"}}},
+      {{CDM::CLibraryMaybeHardened, {{"sprintf"}}},
        TR::Prop({{1}, 2}, {{0, ReturnValueIndex}})},
-      {{CDM::CLibrary, {{"strcpy"}}}, TR::Prop({{1}}, {{0, ReturnValueIndex}})},
-      {{CDM::CLibrary, {{"stpcpy"}}}, TR::Prop({{1}}, {{0, ReturnValueIndex}})},
-      {{CDM::CLibrary, {{"strcat"}}}, TR::Prop({{1}}, {{0, ReturnValueIndex}})},
-      {{CDM::CLibrary, {{"wcsncat"}}},
+      {{CDM::CLibraryMaybeHardened, {{"strcpy"}}},
+       TR::Prop({{1}}, {{0, ReturnValueIndex}})},
+      {{CDM::CLibraryMaybeHardened, {{"stpcpy"}}},
+       TR::Prop({{1}}, {{0, ReturnValueIndex}})},
+      {{CDM::CLibraryMaybeHardened, {{"strcat"}}},
+       TR::Prop({{1}}, {{0, ReturnValueIndex}})},
+      {{CDM::CLibraryMaybeHardened, {{"wcsncat"}}},
        TR::Prop({{1}}, {{0, ReturnValueIndex}})},
       {{CDM::CLibrary, {{"strdup"}}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
       {{CDM::CLibrary, {{"strdupa"}}}, TR::Prop({{0}}, {{ReturnValueIndex}})},
@@ -759,13 +762,13 @@ void GenericTaintChecker::initTaintRules(CheckerContext &C) const {
        TR::Sink({{0}, 1}, MsgUncontrolledFormatString)},
 
       // SinkProps
-      {{CDM::CLibrary, BI.getName(Builtin::BImemcpy)},
+      {{CDM::CLibraryMaybeHardened, BI.getName(Builtin::BImemcpy)},
        TR::SinkProp({{2}}, {{1, 2}}, {{0, ReturnValueIndex}},
                     MsgTaintedBufferSize)},
-      {{CDM::CLibrary, {BI.getName(Builtin::BImemmove)}},
+      {{CDM::CLibraryMaybeHardened, {BI.getName(Builtin::BImemmove)}},
        TR::SinkProp({{2}}, {{1, 2}}, {{0, ReturnValueIndex}},
                     MsgTaintedBufferSize)},
-      {{CDM::CLibrary, {BI.getName(Builtin::BIstrncpy)}},
+      {{CDM::CLibraryMaybeHardened, {BI.getName(Builtin::BIstrncpy)}},
        TR::SinkProp({{2}}, {{1, 2}}, {{0, ReturnValueIndex}},
                     MsgTaintedBufferSize)},
       {{CDM::CLibrary, {BI.getName(Builtin::BIstrndup)}},
diff --git a/clang/lib/StaticAnalyzer/Core/CallDescription.cpp b/clang/lib/StaticAnalyzer/Core/CallDescription.cpp
index 459e854cd44d9a..dcf6a2625b66f4 100644
--- a/clang/lib/StaticAnalyzer/Core/CallDescription.cpp
+++ b/clang/lib/StaticAnalyzer/Core/CallDescription.cpp
@@ -75,6 +75,48 @@ bool ento::CallDescription::matchesAsWritten(const CallExpr &CE) const {
   return matchesImpl(FD, CE.getNumArgs(), FD->param_size());
 }
 
+bool ento::CallDescription::matchNameOnly(const NamedDecl *ND) const {
+  DeclarationName Name = ND->getDeclName();
+  if (const auto *NameII = Name.getAsIdentifierInfo()) {
+    if (!II)
+      II = &ND->getASTContext().Idents.get(getFunctionName());
+
+    return NameII == *II; // Fast case.
+  }
+
+  // Fallback to the slow stringification and comparison for:
+  // C++ overloaded operators, constructors, destructors, etc.
+  // FIXME This comparison is way SLOWER than comparing pointers.
+  // At some point in the future, we should compare FunctionDecl pointers.
+  return Name.getAsString() == getFunctionName();
+};
+
+bool ento::CallDescription::matchQualifiedNameParts(const Decl *D) const {
+  const auto FindNextNamespaceOrRecord =
+      [](const DeclContext *Ctx) -> const DeclContext * {
+    while (Ctx && !isa<NamespaceDecl, RecordDecl>(Ctx))
+      Ctx = Ctx->getParent();
+    return Ctx;
+  };
+
+  auto QualifierPartsIt = begin_qualified_name_parts();
+  const auto QualifierPartsEndIt = end_qualified_name_parts();
+
+  // Match namespace and record names. Skip unrelated names if they don't
+  // match.
+  const DeclContext *Ctx = FindNextNamespaceOrRecord(D->getDeclContext());
+  for (; Ctx && QualifierPartsIt != QualifierPartsEndIt;
+       Ctx = FindNextNamespaceOrRecord(Ctx->getParent())) {
+    // If not matched just continue and try matching for the next one.
+    if (cast<NamedDecl>(Ctx)->getName() != *QualifierPartsIt)
+      continue;
+    ++QualifierPartsIt;
+  }
+
+  // We matched if we consumed all expected qualifier segments.
+  return QualifierPartsIt == QualifierPartsEndIt;
+};
+
 bool ento::CallDescription::matchesImpl(const FunctionDecl *FD, size_t ArgCount,
                                         size_t ParamCount) const {
   if (!FD)
@@ -88,76 +130,34 @@ bool ento::CallDescription::matchesImpl(const FunctionDecl *FD, size_t ArgCount,
   if (MatchAs == Mode::CXXMethod && !isMethod)
     return false;
 
-  if (MatchAs == Mode::CLibrary) {
-    return CheckerContext::isCLibraryFunction(FD, getFunctionName()) &&
-           (!RequiredArgs || *RequiredArgs <= ArgCount) &&
-           (!RequiredParams || *RequiredParams <= ParamCount);
-  }
-
-  if (!II) {
-    II = &FD->getASTContext().Idents.get(getFunctionName());
-  }
-
-  const auto MatchNameOnly = [](const CallDescription &CD,
-                                const NamedDecl *ND) -> bool {
-    DeclarationName Name = ND->getDeclName();
-    if (const auto *II = Name.getAsIdentifierInfo())
-      return II == *CD.II; // Fast case.
-
-    // Fallback to the slow stringification and comparison for:
-    // C++ overloaded operators, constructors, destructors, etc.
-    // FIXME This comparison is way SLOWER than comparing pointers.
-    // At some point in the future, we should compare FunctionDecl pointers.
-    return Name.getAsString() == CD.getFunctionName();
-  };
-
-  const auto ExactMatchArgAndParamCounts =
-      [](size_t ArgCount, size_t ParamCount,
-         const CallDescription &CD) -> bool {
-    const bool ArgsMatch = !CD.RequiredArgs || *CD.RequiredArgs == ArgCount;
-    const bool ParamsMatch =
-        !CD.RequiredParams || *CD.RequiredParams == ParamCount;
-    return ArgsMatch && ParamsMatch;
-  };
-
-  const auto MatchQualifiedNameParts = [](const CallDescription &CD,
-                                          const Decl *D) -> bool {
-    const auto FindNextNamespaceOrRecord =
-        [](const DeclContext *Ctx) -> const DeclContext * {
-      while (Ctx && !isa<NamespaceDecl, RecordDecl>(Ctx))
-        Ctx = Ctx->getParent();
-      return Ctx;
-    };
-
-    auto QualifierPartsIt = CD.begin_qualified_name_parts();
-    const auto QualifierPartsEndIt = CD.end_qualified_name_parts();
-
-    // Match namespace and record names. Skip unrelated names if they don't
-    // match.
-    const DeclContext *Ctx = FindNextNamespaceOrRecord(D->getDeclContext());
-    for (; Ctx && QualifierPartsIt != QualifierPartsEndIt;
-         Ctx = FindNextNamespaceOrRecord(Ctx->getParent())) {
-      // If not matched just continue and try matching for the next one.
-      if (cast<NamedDecl>(Ctx)->getName() != *QualifierPartsIt)
-        continue;
-      ++QualifierPartsIt;
+  if (MatchAs == Mode::CLibraryMaybeHardened) {
+    // In addition to accepting FOO() with CLibrary rules, we also want to
+    // accept calls to __FOO_chk() and __builtin___FOO_chk().
+    if (CheckerContext::isCLibraryFunction(FD) &&
+        CheckerContext::isHardenedVariantOf(FD, getFunctionName())) {
+      // Check that the actual argument/parameter counts are greater or equal
+      // to the required counts. (Setting a requirement to std::nullopt matches
+      // anything, so in that case value_or ensures that the value is compared
+      // with itself.)
+      return (RequiredArgs.value_or(ArgCount) <= ArgCount &&
+              RequiredParams.value_or(ParamCount) <= ParamCount);
     }
+  }
 
-    // We matched if we consumed all expected qualifier segments.
-    return QualifierPartsIt == QualifierPartsEndIt;
-  };
-
-  // Let's start matching...
-  if (!ExactMatchArgAndParamCounts(ArgCount, ParamCount, *this))
+  if (RequiredArgs.value_or(ArgCount) != ArgCount ||
+      RequiredParams.value_or(ParamCount) != ParamCount)
     return false;
 
-  if (!MatchNameOnly(*this, FD))
+  if (MatchAs == Mode::CLibrary || MatchAs == Mode::CLibraryMaybeHardened)
+    return CheckerContext::isCLibraryFunction(FD, getFunctionName());
+
+  if (!matchNameOnly(FD))
     return false;
 
   if (!hasQualifiedNameParts())
     return true;
 
-  return MatchQualifiedNameParts(*this, FD);
+  return matchQualifiedNameParts(FD);
 }
 
 ento::CallDescriptionSet::CallDescriptionSet(
diff --git a/clang/lib/StaticAnalyzer/Core/CheckerContext.cpp b/clang/lib/StaticAnalyzer/Core/CheckerContext.cpp
index 1a9bff529e9bb1..3af92452eaa290 100644
--- a/clang/lib/StaticAnalyzer/Core/CheckerContext.cpp
+++ b/clang/lib/StaticAnalyzer/Core/CheckerContext.cpp
@@ -110,13 +110,24 @@ bool CheckerContext::isCLibraryFunction(const FunctionDecl *FD,
   if (FName.starts_with("__inline") && FName.contains(Name))
     return true;
 
-  if (FName.starts_with("__") && FName.ends_with("_chk") &&
-      FName.contains(Name))
-    return true;
-
   return false;
 }
 
+bool CheckerContext::isHardenedVariantOf(const FunctionDecl *FD,
+                                         StringRef Name) {
+  const IdentifierInfo *II = FD->getIdentifier();
+  if (!II)
+    return false;
+
+  StringRef FName = II->getName();
+  std::string ChkName = "__" + std::string(Name) + "_chk";
+
+  // This is using `equals()` instead of more lenient prefix/suffix/substring
+  // checks because we don't want to say that e.g. `__builtin___vsprintf_chk()`
+  // is a hardened variant of `sprintf()`.
+  return FName.equals(ChkName) || FName.equals("__builtin_" + ChkName);
+}
+
 StringRef CheckerContext::getMacroNameOrSpelling(SourceLocation &Loc) {
   if (Loc.isMacroID())
     return Lexer::getImmediateMacroName(Loc, getSourceManager(),
diff --git a/clang/unittests/StaticAnalyzer/CallDescriptionTest.cpp b/clang/unittests/StaticAnalyzer/CallDescriptionTest.cpp
index 3aac1f81f523db..238f954d713310 100644
--- a/clang/unittests/StaticAnalyzer/CallDescriptionTest.cpp
+++ b/clang/unittests/StaticAnalyzer/CallDescriptionTest.cpp
@@ -488,16 +488,21 @@ TEST(CallDescription, NegativeMatchQualifiedNames) {
 }
 
 TEST(CallDescription, MatchBuiltins) {
-  // Test CDM::CLibrary - a flag that allows matching weird builtins.
-  EXPECT_TRUE(tooling::runToolOnCode(
-      std::unique_ptr<FrontendAction>(new CallDescriptionAction<>(
-          {{{{"memset"}, 3}, false}, {{CDM::CLibrary, {"memset"}, 3}, true}})),
-      "void foo() {"
-      "  int x;"
-      "  __builtin___memset_chk(&x, 0, sizeof(x),"
-      "                         __builtin_object_size(&x, 0));"
-      "}"));
-
+  // Test the matching modes CDM::CLibrary and CDM::CLibraryMaybeHardened,
+  // which can recognize builtin variants of C library functions.
+  {
+    SCOPED_TRACE("hardened variants of functions");
+    EXPECT_TRUE(tooling::runToolOnCode(
+        std::unique_ptr<FrontendAction>(new CallDescriptionAction<>(
+            {{{CDM::Unspecified, {"memset"}, 3}, false},
+             {{CDM::CLibrary, {"memset"}, 3}, false},
+             {{CDM::CLibraryMaybeHardened, {"memset"}, 3}, true}})),
+        "void foo() {"
+        "  int x;"
+        "  __builtin___memset_chk(&x, 0, sizeof(x),"
+        "                         __builtin_object_size(&x, 0));"
+        "}"));
+  }
   {
     SCOPED_TRACE("multiple similar builtins");
     EXPECT_TRUE(tooling::runToolOnCode(
@@ -518,6 +523,35 @@ TEST(CallDescription, MatchBuiltins) {
             __builtin_wmemcpy(x, y, sizeof(wchar_t));
           })"));
   }
+  {
+    SCOPED_TRACE("multiple similar builtins with hardened variant");
+    EXPECT_TRUE(tooling::runToolOnCode(
+        std::unique_ptr<FrontendAction>(new CallDescriptionAction<>(
+            {{{CDM::CLibraryMaybeHardened, {"memcpy"}, 3}, false},
+             {{CDM::CLibraryMaybeHardened, {"wmemcpy"}, 3}, true}})),
+        R"(typedef __typeof(sizeof(int)) size_t;
+          extern wchar_t *__wmemcpy_chk (wchar_t *__restrict __s1,
+                                          const wchar_t *__restrict __s2,
+                                          size_t __n, size_t __ns1);
+          void foo(wchar_t *x, wchar_t *y) {
+            __wmemcpy_chk(x, y, sizeof(wchar_t), 1234);
+          })"));
+  }
+  {
+    SCOPED_TRACE(
+        "multiple similar builtins with hardened variant reversed order");
+    EXPECT_TRUE(tooling::runToolOnCode(
+        std::unique_ptr<FrontendAction>(new CallDescriptionAction<>(
+            {{{CDM::CLibraryMaybeHardened, {"wmemcpy"}, 3}, true},
+             {{CDM::CLibraryMaybeHardened, {"memcpy"}, 3}, false}})),
+        R"(typedef __typeof(sizeof(int)) size_t;
+          extern wchar_t *__wmemcpy_chk (wchar_t *__restrict __s1,
+                                          const wchar_t *__restrict __s2,
+                                          size_t __n, size_t __ns1);
+          void foo(wchar_t *x, wchar_t *y) {
+            __wmemcpy_chk(x, y, sizeof(wchar_t), 1234);
+          })"));
+  }
   {
     SCOPED_TRACE("lookbehind and lookahead mismatches");
     EXPECT_TRUE(tooling::runToolOnCode(

>From 8a5696ef0b4eec13345280db70b0298683d23dc2 Mon Sep 17 00:00:00 2001
From: NagyDonat <donat.nagy at ericsson.com>
Date: Wed, 27 Mar 2024 13:03:36 +0100
Subject: [PATCH 2/2] Apply suggestions from code review

Co-authored-by: Balazs Benics <benicsbalazs at gmail.com>
---
 clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp |  2 +-
 clang/lib/StaticAnalyzer/Core/CheckerContext.cpp     | 12 ++++++------
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
index 7e18072caf76e0..63844563de44f1 100644
--- a/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
@@ -143,7 +143,7 @@ class CStringChecker : public Checker< eval::Call,
       {{CDM::CLibraryMaybeHardened, {"memset"}, 3},
        &CStringChecker::evalMemset},
       {{CDM::CLibrary, {"explicit_memset"}, 3}, &CStringChecker::evalMemset},
-      /* FIXME: C23 introduces 'memset_explicit', maybe also model that */
+      // FIXME: C23 introduces 'memset_explicit', maybe also model that
       {{CDM::CLibraryMaybeHardened, {"strcpy"}, 2},
        &CStringChecker::evalStrcpy},
       {{CDM::CLibraryMaybeHardened, {"strncpy"}, 3},
diff --git a/clang/lib/StaticAnalyzer/Core/CheckerContext.cpp b/clang/lib/StaticAnalyzer/Core/CheckerContext.cpp
index 3af92452eaa290..113abcd4c2ab0b 100644
--- a/clang/lib/StaticAnalyzer/Core/CheckerContext.cpp
+++ b/clang/lib/StaticAnalyzer/Core/CheckerContext.cpp
@@ -119,13 +119,13 @@ bool CheckerContext::isHardenedVariantOf(const FunctionDecl *FD,
   if (!II)
     return false;
 
-  StringRef FName = II->getName();
-  std::string ChkName = "__" + std::string(Name) + "_chk";
+  auto CompletelyMatchesParts = [II](auto... Parts) -> bool {
+    StringRef FName = II->getName();
+    return (FName.consume_front(Parts) && ...) && FName.empty();
+  };
 
-  // This is using `equals()` instead of more lenient prefix/suffix/substring
-  // checks because we don't want to say that e.g. `__builtin___vsprintf_chk()`
-  // is a hardened variant of `sprintf()`.
-  return FName.equals(ChkName) || FName.equals("__builtin_" + ChkName);
+  return CompletelyMatchesParts("__", Name, "_chk") ||
+         CompletelyMatchesParts("__builtin_", "__", Name, "_chk");
 }
 
 StringRef CheckerContext::getMacroNameOrSpelling(SourceLocation &Loc) {



More information about the cfe-commits mailing list