[clang] [analyzer] Wrap SymbolicRegions by ElementRegions before getting a FieldRegion (PR #85211)

Balazs Benics via cfe-commits cfe-commits at lists.llvm.org
Thu Mar 14 04:55:15 PDT 2024


https://github.com/steakhal created https://github.com/llvm/llvm-project/pull/85211

Inside the ExprEngine when we process the initializers, we create a PostInitializer program-point, which will refer to the field being initialized, see `FieldLoc` inside `ExprEngine::ProcessInitializer`.

When a constructor (of which we evaluate the initializer-list) is analyzed in top-level context, then the `this` pointer will be represented by a `SymbolicRegion`, (as it should be).

This means that we will form a `FieldRegion{SymbolicRegion{.}}` as the initialized region.

```c++
class Bear {
public:
  void brum() const;
};
class Door {
public:
  // PostInitializer would refer to "FieldRegion{SymRegion{this}}"
  // whereas in the store and everywhere else it would be:
  // "FieldRegion{ELementRegion{SymRegion{Ty*, this}, 0, Ty}".
  Door() : ptr(nullptr) {
    ptr->brum(); // Bug
  }
private:
  Bear* ptr;
};
```

We (as CSA folks) decided to avoid the creation of FieldRegions directly of symbolic regions in the past:
https://github.com/llvm/llvm-project/commit/f8643a9b31c4029942f67d4534c9139b45173504

---

In this patch, I propose to also canonicalize it as in the mentioned patch, into this: `FieldRegion{ElementRegion{SymbolicRegion{Ty*, .}, 0, Ty}`

This would mean that FieldRegions will/should never simply wrap a SymbolicRegion directly, but rather an ElementRegion that is sitting in between.

This patch should have practically no observable effects, as the store (due to the mentioned patch) was made resilient to this issue, but we use `PostInitializer::getLocationValue()` for an alternative reporting, where we faced this issue.

Note that in really rare cases it suppresses now dereference bugs, as demonstrated in the test. It is because in the past we failed to follow the region of the PostInitializer inside the StoreSiteFinder visitor - because it was using this code:
```c++
// If this is a post initializer expression, initializing the region, we
// should track the initializer expression.
if (std::optional<PostInitializer> PIP =
        Pred->getLocationAs<PostInitializer>()) {
  const MemRegion *FieldReg = (const MemRegion *)PIP->getLocationValue();
  if (FieldReg == R) {
    StoreSite = Pred;
    InitE = PIP->getInitializer()->getInit();
  }
}
```
Notice that the equality check didn't pass for the regions I'm canonicalizing in this patch.

Given the nature of this change, we would rather upstream this patch.

CPP-4954

>From bde85e0d145049a6661afba6f4585865c5630792 Mon Sep 17 00:00:00 2001
From: Balazs Benics <benicsbalazs at gmail.com>
Date: Thu, 14 Mar 2024 12:38:12 +0100
Subject: [PATCH] [analyzer] Wrap SymbolicRegions by ElementRegions before
 getting a FieldRegion

Inside the ExprEngine when we process the initializers, we create a
PostInitializer program-point, which will refer to the field being
initialized, see `FieldLoc` inside `ExprEngine::ProcessInitializer`.

When a constructor (of which we evaluate the initializer-list) is analyzed
in top-level context, then the `this` pointer will be represented by a
`SymbolicRegion`, (as it should be).

This means that we will form a `FieldRegion{SymbolicRegion{.}}` as the
initialized region.

```c++
class Bear {
public:
  void brum() const;
};
class Door {
public:
  // PostInitializer would refer to "FieldRegion{SymRegion{this}}"
  // whereas in the store and everywhere else it would be:
  // "FieldRegion{ELementRegion{SymRegion{Ty*, this}, 0, Ty}".
  Door() : ptr(nullptr) {
    ptr->brum(); // Bug
  }
private:
  Bear* ptr;
};
```

We (as CSA folks) decided to avoid the creation of FieldRegions directly of
symbolic regions in the past:
https://github.com/llvm/llvm-project/commit/f8643a9b31c4029942f67d4534c9139b45173504

---

In this patch, I propose to also canonicalize it as in the mentioned patch,
into this: `FieldRegion{ElementRegion{SymbolicRegion{Ty*, .}, 0, Ty}`

This would mean that FieldRegions will/should never simply wrap a
SymbolicRegion directly, but rather an ElementRegion that is sitting
in between.

This patch should have practically no observable effects, as the store
(due to the mentioned patch) was made resilient to this issue, but we use
`PostInitializer::getLocationValue()` for an alternative reporting,
where we faced this issue.

Note that in really rare cases it suppresses now dereference bugs, as
demonstrated in the test. It is because in the past we failed to
follow the region of the PostInitializer inside
the StoreSiteFinder visitor - because it was using this code:
```c++
// If this is a post initializer expression, initializing the region, we
// should track the initializer expression.
if (std::optional<PostInitializer> PIP =
        Pred->getLocationAs<PostInitializer>()) {
  const MemRegion *FieldReg = (const MemRegion *)PIP->getLocationValue();
  if (FieldReg == R) {
    StoreSite = Pred;
    InitE = PIP->getInitializer()->getInit();
  }
}
```
And the equality check didn't pass for the regions I'm canonicalizing in this patch.

Given the nature of this change, we would rather upstream this patch.

CPP-4954
---
 .../Core/PathSensitive/ProgramState.h         | 14 --------
 .../lib/StaticAnalyzer/Core/ProgramState.cpp  | 32 +++++++++++++++++++
 .../inlining/false-positive-suppression.cpp   | 17 ++++++++++
 3 files changed, 49 insertions(+), 14 deletions(-)

diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
index ca75c2a756a4a5..47fc4ad88d3161 100644
--- a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
+++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
@@ -782,20 +782,6 @@ inline SVal ProgramState::getLValue(const ObjCIvarDecl *D, SVal Base) const {
   return getStateManager().StoreMgr->getLValueIvar(D, Base);
 }
 
-inline SVal ProgramState::getLValue(const FieldDecl *D, SVal Base) const {
-  return getStateManager().StoreMgr->getLValueField(D, Base);
-}
-
-inline SVal ProgramState::getLValue(const IndirectFieldDecl *D,
-                                    SVal Base) const {
-  StoreManager &SM = *getStateManager().StoreMgr;
-  for (const auto *I : D->chain()) {
-    Base = SM.getLValueField(cast<FieldDecl>(I), Base);
-  }
-
-  return Base;
-}
-
 inline SVal ProgramState::getLValue(QualType ElementType, SVal Idx, SVal Base) const{
   if (std::optional<NonLoc> N = Idx.getAs<NonLoc>())
     return getStateManager().StoreMgr->getLValueElement(ElementType, *N, Base);
diff --git a/clang/lib/StaticAnalyzer/Core/ProgramState.cpp b/clang/lib/StaticAnalyzer/Core/ProgramState.cpp
index f12f1a5ac970dd..604728cdf1dddd 100644
--- a/clang/lib/StaticAnalyzer/Core/ProgramState.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ProgramState.cpp
@@ -451,6 +451,38 @@ void ProgramState::setStore(const StoreRef &newStore) {
   store = newStoreStore;
 }
 
+/// FieldRegions are expected to be wrapped by an ElementRegion as a canonical
+/// representation. See f8643a9b31c4029942f67d4534c9139b45173504 why.
+static SVal wrapSymbolicRegion(const ProgramState &State, SVal Base) {
+  const auto *SymbolicBase =
+      dyn_cast_or_null<SymbolicRegion>(Base.getAsRegion());
+
+  if (!SymbolicBase)
+    return Base;
+
+  StoreManager &SM = State.getStateManager().getStoreManager();
+  QualType ElemTy = SymbolicBase->getPointeeStaticType();
+  return loc::MemRegionVal{SM.GetElementZeroRegion(SymbolicBase, ElemTy)};
+}
+
+SVal ProgramState::getLValue(const FieldDecl *D, SVal Base) const {
+  Base = wrapSymbolicRegion(*this, Base);
+  return getStateManager().StoreMgr->getLValueField(D, Base);
+}
+
+SVal ProgramState::getLValue(const IndirectFieldDecl *D, SVal Base) const {
+  StoreManager &SM = *getStateManager().StoreMgr;
+  Base = wrapSymbolicRegion(*this, Base);
+
+  // FIXME: This should work with `SM.getLValueField(D->getAnonField(), Base)`,
+  // but that would break some tests. There is probably a bug somewhere that it
+  // would expose.
+  for (const auto *I : D->chain()) {
+    Base = SM.getLValueField(cast<FieldDecl>(I), Base);
+  }
+  return Base;
+}
+
 //===----------------------------------------------------------------------===//
 //  State pretty-printing.
 //===----------------------------------------------------------------------===//
diff --git a/clang/test/Analysis/inlining/false-positive-suppression.cpp b/clang/test/Analysis/inlining/false-positive-suppression.cpp
index 56659b4a1941cc..2f9ed7f78b3f07 100644
--- a/clang/test/Analysis/inlining/false-positive-suppression.cpp
+++ b/clang/test/Analysis/inlining/false-positive-suppression.cpp
@@ -210,3 +210,20 @@ namespace Cleanups {
     testArgumentHelper(NonTrivial().getNull());
   }
 }
+
+class Bear *getNullBear() { return nullptr; }
+class Bear {
+public:
+  void brum() const;
+};
+class Door {
+public:
+  Door() : ptr(getNullBear()) {
+    ptr->brum();
+#ifndef SUPPRESSED
+    // expected-warning at -2 {{Called C++ object pointer is null}}
+#endif
+  }
+private:
+  Bear* ptr;
+};



More information about the cfe-commits mailing list