[clang] [analyzer] Refactor recognition of the errno getter functions (PR #91531)

DonĂ¡t Nagy via cfe-commits cfe-commits at lists.llvm.org
Wed May 8 13:32:34 PDT 2024


https://github.com/NagyDonat created https://github.com/llvm/llvm-project/pull/91531

There are many environments where `errno` is a macro that expands to something like `(*__errno())` (different standard library implementations use different names instead of "__errno").

In these environments the ErrnoModeling checker creates a symbolic region which will be used to represent the return value of this "get the location of errno" function.

Previously this symbol was only created when the checker was able to find the declaration of the "get the location of errno" function; but this commit eliminates the complex logic that was responsible for this and always creates the symbolic region when `errno` is not available as a "regular" global variable.

This significantly simplifies a code and only introduces a minimal performance reduction (one extra symbol) in the unlikely case when `errno` is not declared (neither as a variable nor as a function), but the `ErrnoModeling` checker is enabled.

In addition to this simplification, this commit specifies that the `CallDescription`s for the "get the location of errno" functions are matched in `CDM::CLibrary` mode. (This was my original goal, but I was sidetracked by resolving a FIXME above the `CallDescriptionSet` in `ErrnoModeling.cpp`.)

This change is very close to being NFC, but it fixes weird corner cases like the handling of a C++ method that happens to be named "__errno()" (previously it could've been recognized as an errno location getter function).

>From 07dc4dd5c60c8a04637cce686b379e195deb5b67 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Don=C3=A1t=20Nagy?= <donat.nagy at ericsson.com>
Date: Wed, 8 May 2024 20:01:57 +0200
Subject: [PATCH] [analyzer] Refactor recognition of the errno getter functions

There are many environments where `errno` is a macro that expands to
something like `(*__errno())` (different standard library
implementations use different names instead of "__errno").

In these environments the ErrnoModeling checker creates a symbolic
region which will be used to represent the return value of this "get the
location of errno" function.

Previously this symbol was only created when the checker was able to
find the declaration of the "get the location of errno" function; but
this commit eliminates the complex logic that was responsible for this
and always creates the symbolic region when `errno` is not available as
a "regular" global variable.

This significantly simplifies a code and only introduces a minimal
performance reduction (one extra symbol) in the unlikely case when
`errno` is not declared (neither as a variable nor as a function), but
the `ErrnoModeling` checker is enabled.

In addition to this simplification, this commit specifies that the
`CallDescription`s for the "get the location of errno" functions are
matched in `CDM::CLibrary` mode. (This was my original goal, but I was
sidetracked by resolving a FIXME above the `CallDescriptionSet` in
`ErrnoModeling.cpp`.)

This change is very close to being NFC, but it fixes weird corner
cases like the handling of a C++ method that happens to be named
"__errno()" (previously it could've been recognized as an errno
location getter function).
---
 .../StaticAnalyzer/Checkers/ErrnoChecker.cpp  |   2 +-
 .../StaticAnalyzer/Checkers/ErrnoModeling.cpp | 127 ++++++------------
 .../StaticAnalyzer/Checkers/ErrnoModeling.h   |   9 +-
 clang/test/Analysis/memory-model.cpp          |  18 +--
 4 files changed, 53 insertions(+), 103 deletions(-)

diff --git a/clang/lib/StaticAnalyzer/Checkers/ErrnoChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/ErrnoChecker.cpp
index 18e718e085536..72fd6781a7561 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ErrnoChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ErrnoChecker.cpp
@@ -205,7 +205,7 @@ void ErrnoChecker::checkPreCall(const CallEvent &Call,
   // Probably 'strerror'?
   if (CallF->isExternC() && CallF->isGlobal() &&
       C.getSourceManager().isInSystemHeader(CallF->getLocation()) &&
-      !isErrno(CallF)) {
+      !isErrnoLocationCall(Call)) {
     if (getErrnoState(C.getState()) == MustBeChecked) {
       std::optional<ento::Loc> ErrnoLoc = getErrnoLoc(C.getState());
       assert(ErrnoLoc && "ErrnoLoc should exist if an errno state is set.");
diff --git a/clang/lib/StaticAnalyzer/Checkers/ErrnoModeling.cpp b/clang/lib/StaticAnalyzer/Checkers/ErrnoModeling.cpp
index 1b34ea0e056e5..0612cd4c87248 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ErrnoModeling.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ErrnoModeling.cpp
@@ -39,10 +39,15 @@ namespace {
 // Name of the "errno" variable.
 // FIXME: Is there a system where it is not called "errno" but is a variable?
 const char *ErrnoVarName = "errno";
+
 // Names of functions that return a location of the "errno" value.
 // FIXME: Are there other similar function names?
-const char *ErrnoLocationFuncNames[] = {"__errno_location", "___errno",
-                                        "__errno", "_errno", "__error"};
+CallDescriptionSet ErrnoLocationCalls{
+    {CDM::SimpleFunc, {"__errno_location"}, 0, 0},
+    {CDM::SimpleFunc, {"___errno"}, 0, 0},
+    {CDM::SimpleFunc, {"__errno"}, 0, 0},
+    {CDM::SimpleFunc, {"_errno"}, 0, 0},
+    {CDM::SimpleFunc, {"__error"}, 0, 0}};
 
 class ErrnoModeling
     : public Checker<check::ASTDecl<TranslationUnitDecl>, check::BeginFunction,
@@ -54,16 +59,10 @@ class ErrnoModeling
   void checkLiveSymbols(ProgramStateRef State, SymbolReaper &SR) const;
   bool evalCall(const CallEvent &Call, CheckerContext &C) const;
 
-  // The declaration of an "errno" variable or "errno location" function.
-  mutable const Decl *ErrnoDecl = nullptr;
-
 private:
-  // FIXME: Names from `ErrnoLocationFuncNames` are used to build this set.
-  CallDescriptionSet ErrnoLocationCalls{{{"__errno_location"}, 0, 0},
-                                        {{"___errno"}, 0, 0},
-                                        {{"__errno"}, 0, 0},
-                                        {{"_errno"}, 0, 0},
-                                        {{"__error"}, 0, 0}};
+  // The declaration of an "errno" variable on systems where errno is
+  // represented by a variable (and not a function that queries its location).
+  mutable const Decl *ErrnoDecl = nullptr;
 };
 
 } // namespace
@@ -74,9 +73,13 @@ REGISTER_TRAIT_WITH_PROGRAMSTATE(ErrnoRegion, const MemRegion *)
 
 REGISTER_TRAIT_WITH_PROGRAMSTATE(ErrnoState, errno_modeling::ErrnoCheckState)
 
-/// Search for a variable called "errno" in the AST.
-/// Return nullptr if not found.
-static const VarDecl *getErrnoVar(ASTContext &ACtx) {
+void ErrnoModeling::checkASTDecl(const TranslationUnitDecl *D,
+                                 AnalysisManager &Mgr, BugReporter &BR) const {
+  // Try to find the declaration of the external variable `int errno;`.
+  // There are also C library implementations, where the `errno` location is
+  // accessed via a function that returns its address; in those environments
+  // this callback does nothing.
+  ASTContext &ACtx = Mgr.getASTContext();
   IdentifierInfo &II = ACtx.Idents.get(ErrnoVarName);
   auto LookupRes = ACtx.getTranslationUnitDecl()->lookup(&II);
   auto Found = llvm::find_if(LookupRes, [&ACtx](const Decl *D) {
@@ -86,47 +89,8 @@ static const VarDecl *getErrnoVar(ASTContext &ACtx) {
              VD->getType().getCanonicalType() == ACtx.IntTy;
     return false;
   });
-  if (Found == LookupRes.end())
-    return nullptr;
-
-  return cast<VarDecl>(*Found);
-}
-
-/// Search for a function with a specific name that is used to return a pointer
-/// to "errno".
-/// Return nullptr if no such function was found.
-static const FunctionDecl *getErrnoFunc(ASTContext &ACtx) {
-  SmallVector<const Decl *> LookupRes;
-  for (StringRef ErrnoName : ErrnoLocationFuncNames) {
-    IdentifierInfo &II = ACtx.Idents.get(ErrnoName);
-    llvm::append_range(LookupRes, ACtx.getTranslationUnitDecl()->lookup(&II));
-  }
-
-  auto Found = llvm::find_if(LookupRes, [&ACtx](const Decl *D) {
-    if (auto *FD = dyn_cast<FunctionDecl>(D))
-      return ACtx.getSourceManager().isInSystemHeader(FD->getLocation()) &&
-             FD->isExternC() && FD->getNumParams() == 0 &&
-             FD->getReturnType().getCanonicalType() ==
-                 ACtx.getPointerType(ACtx.IntTy);
-    return false;
-  });
-  if (Found == LookupRes.end())
-    return nullptr;
-
-  return cast<FunctionDecl>(*Found);
-}
-
-void ErrnoModeling::checkASTDecl(const TranslationUnitDecl *D,
-                                 AnalysisManager &Mgr, BugReporter &BR) const {
-  // Try to find an usable `errno` value.
-  // It can be an external variable called "errno" or a function that returns a
-  // pointer to the "errno" value. This function can have different names.
-  // The actual case is dependent on the C library implementation, we
-  // can only search for a match in one of these variations.
-  // We assume that exactly one of these cases might be true.
-  ErrnoDecl = getErrnoVar(Mgr.getASTContext());
-  if (!ErrnoDecl)
-    ErrnoDecl = getErrnoFunc(Mgr.getASTContext());
+  if (Found != LookupRes.end())
+    ErrnoDecl = cast<VarDecl>(*Found);
 }
 
 void ErrnoModeling::checkBeginFunction(CheckerContext &C) const {
@@ -136,25 +100,17 @@ void ErrnoModeling::checkBeginFunction(CheckerContext &C) const {
   ASTContext &ACtx = C.getASTContext();
   ProgramStateRef State = C.getState();
 
+  const MemRegion *ErrnoR;
+
   if (const auto *ErrnoVar = dyn_cast_or_null<VarDecl>(ErrnoDecl)) {
-    // There is an external 'errno' variable.
-    // Use its memory region.
-    // The memory region for an 'errno'-like variable is allocated in system
-    // space by MemRegionManager.
-    const MemRegion *ErrnoR =
-        State->getRegion(ErrnoVar, C.getLocationContext());
+    // There is an external 'errno' variable, so we can simply use the memory
+    // region that's associated with it.
+    ErrnoR = State->getRegion(ErrnoVar, C.getLocationContext());
     assert(ErrnoR && "Memory region should exist for the 'errno' variable.");
-    State = State->set<ErrnoRegion>(ErrnoR);
-    State =
-        errno_modeling::setErrnoValue(State, C, 0, errno_modeling::Irrelevant);
-    C.addTransition(State);
-  } else if (ErrnoDecl) {
-    assert(isa<FunctionDecl>(ErrnoDecl) && "Invalid errno location function.");
-    // There is a function that returns the location of 'errno'.
-    // We must create a memory region for it in system space.
-    // Currently a symbolic region is used with an artifical symbol.
-    // FIXME: It is better to have a custom (new) kind of MemRegion for such
-    // cases.
+  } else {
+    // The 'errno' location is accessed via a "magical" getter function, so
+    // create a new symbolic memory region that can be used as the return value
+    // of that function.
     SValBuilder &SVB = C.getSValBuilder();
     MemRegionManager &RMgr = C.getStateManager().getRegionManager();
 
@@ -162,27 +118,30 @@ void ErrnoModeling::checkBeginFunction(CheckerContext &C) const {
         RMgr.getGlobalsRegion(MemRegion::GlobalSystemSpaceRegionKind);
 
     // Create an artifical symbol for the region.
-    // It is not possible to associate a statement or expression in this case.
+    // Note that it is not possible to associate a statement or expression in
+    // this case and the `symbolTag` (opaque pointer tag) is just the address
+    // of the data member `ErrnoDecl` of the singleton `ErrnoModeling` checker
+    // object.
     const SymbolConjured *Sym = SVB.conjureSymbol(
         nullptr, C.getLocationContext(),
         ACtx.getLValueReferenceType(ACtx.IntTy), C.blockCount(), &ErrnoDecl);
 
     // The symbolic region is untyped, create a typed sub-region in it.
     // The ElementRegion is used to make the errno region a typed region.
-    const MemRegion *ErrnoR = RMgr.getElementRegion(
+    ErrnoR = RMgr.getElementRegion(
         ACtx.IntTy, SVB.makeZeroArrayIndex(),
         RMgr.getSymbolicRegion(Sym, GlobalSystemSpace), C.getASTContext());
-    State = State->set<ErrnoRegion>(ErrnoR);
-    State =
-        errno_modeling::setErrnoValue(State, C, 0, errno_modeling::Irrelevant);
-    C.addTransition(State);
   }
+  State = State->set<ErrnoRegion>(ErrnoR);
+  State =
+      errno_modeling::setErrnoValue(State, C, 0, errno_modeling::Irrelevant);
+  C.addTransition(State);
 }
 
 bool ErrnoModeling::evalCall(const CallEvent &Call, CheckerContext &C) const {
   // Return location of "errno" at a call to an "errno address returning"
   // function.
-  if (ErrnoLocationCalls.contains(Call)) {
+  if (errno_modeling::isErrnoLocationCall(Call)) {
     ProgramStateRef State = C.getState();
 
     const MemRegion *ErrnoR = State->get<ErrnoRegion>();
@@ -260,14 +219,8 @@ ProgramStateRef clearErrnoState(ProgramStateRef State) {
   return setErrnoState(State, Irrelevant);
 }
 
-bool isErrno(const Decl *D) {
-  if (const auto *VD = dyn_cast_or_null<VarDecl>(D))
-    if (const IdentifierInfo *II = VD->getIdentifier())
-      return II->getName() == ErrnoVarName;
-  if (const auto *FD = dyn_cast_or_null<FunctionDecl>(D))
-    if (const IdentifierInfo *II = FD->getIdentifier())
-      return llvm::is_contained(ErrnoLocationFuncNames, II->getName());
-  return false;
+bool isErrnoLocationCall(const CallEvent &CE) {
+  return ErrnoLocationCalls.contains(CE);
 }
 
 const NoteTag *getErrnoNoteTag(CheckerContext &C, const std::string &Message) {
diff --git a/clang/lib/StaticAnalyzer/Checkers/ErrnoModeling.h b/clang/lib/StaticAnalyzer/Checkers/ErrnoModeling.h
index 6b53572fe5e2d..3b033f26285cc 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ErrnoModeling.h
+++ b/clang/lib/StaticAnalyzer/Checkers/ErrnoModeling.h
@@ -71,12 +71,9 @@ ProgramStateRef setErrnoState(ProgramStateRef State, ErrnoCheckState EState);
 /// Clear state of errno (make it irrelevant).
 ProgramStateRef clearErrnoState(ProgramStateRef State);
 
-/// Determine if a `Decl` node related to 'errno'.
-/// This is true if the declaration is the errno variable or a function
-/// that returns a pointer to the 'errno' value (usually the 'errno' macro is
-/// defined with this function). \p D is not required to be a canonical
-/// declaration.
-bool isErrno(const Decl *D);
+/// Determine if `Call` is a call to a "magical" function that returns the
+/// location of `errno` (in environments where errno is accessed this way).
+bool isErrnoLocationCall(const CallEvent &Call);
 
 /// Create a NoteTag that displays the message if the 'errno' memory region is
 /// marked as interesting, and resets the interestingness.
diff --git a/clang/test/Analysis/memory-model.cpp b/clang/test/Analysis/memory-model.cpp
index fd5a286acb60c..cd42e8c72b8bd 100644
--- a/clang/test/Analysis/memory-model.cpp
+++ b/clang/test/Analysis/memory-model.cpp
@@ -34,9 +34,9 @@ void var_simple_ref() {
 }
 
 void var_simple_ptr(int *a) {
-  clang_analyzer_dump(a);             // expected-warning {{SymRegion{reg_$0<int * a>}}}
-  clang_analyzer_dumpExtent(a);       // expected-warning {{extent_$1{SymRegion{reg_$0<int * a>}}}}
-  clang_analyzer_dumpElementCount(a); // expected-warning {{(extent_$1{SymRegion{reg_$0<int * a>}}) / 4}}
+  clang_analyzer_dump(a);             // expected-warning {{SymRegion{reg_$1<int * a>}}}
+  clang_analyzer_dumpExtent(a);       // expected-warning {{extent_$2{SymRegion{reg_$1<int * a>}}}}
+  clang_analyzer_dumpElementCount(a); // expected-warning {{(extent_$2{SymRegion{reg_$1<int * a>}}) / 4}}
 }
 
 void var_array() {
@@ -53,9 +53,9 @@ void string() {
 }
 
 void struct_simple_ptr(S *a) {
-  clang_analyzer_dump(a);             // expected-warning {{SymRegion{reg_$0<S * a>}}}
-  clang_analyzer_dumpExtent(a);       // expected-warning {{extent_$1{SymRegion{reg_$0<S * a>}}}}
-  clang_analyzer_dumpElementCount(a); // expected-warning {{(extent_$1{SymRegion{reg_$0<S * a>}}) / 4}}
+  clang_analyzer_dump(a);             // expected-warning {{SymRegion{reg_$1<S * a>}}}
+  clang_analyzer_dumpExtent(a);       // expected-warning {{extent_$2{SymRegion{reg_$1<S * a>}}}}
+  clang_analyzer_dumpElementCount(a); // expected-warning {{(extent_$2{SymRegion{reg_$1<S * a>}}) / 4}}
 }
 
 void field_ref(S a) {
@@ -65,9 +65,9 @@ void field_ref(S a) {
 }
 
 void field_ptr(S *a) {
-  clang_analyzer_dump(&a->f);             // expected-warning {{Element{SymRegion{reg_$0<S * a>},0 S64b,struct S}.f}}
-  clang_analyzer_dumpExtent(&a->f);       // expected-warning {{extent_$1{SymRegion{reg_$0<S * a>}}}}
-  clang_analyzer_dumpElementCount(&a->f); // expected-warning {{(extent_$1{SymRegion{reg_$0<S * a>}}) / 4U}}
+  clang_analyzer_dump(&a->f);             // expected-warning {{Element{SymRegion{reg_$1<S * a>},0 S64b,struct S}.f}}
+  clang_analyzer_dumpExtent(&a->f);       // expected-warning {{extent_$2{SymRegion{reg_$1<S * a>}}}}
+  clang_analyzer_dumpElementCount(&a->f); // expected-warning {{(extent_$2{SymRegion{reg_$1<S * a>}}) / 4U}}
 }
 
 void symbolic_array() {



More information about the cfe-commits mailing list