[PATCH] D93595: [analyzer] Fix extraction of punned and known scalar SVals

Artem Dergachev via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Dec 22 22:59:43 PST 2020


NoQ added a comment.

I think you've found a very nice and compact 50% solution to the problem. I didn't think of this while i was looking for a proper fix. Very nice.



================
Comment at: clang/lib/StaticAnalyzer/Core/RegionStore.cpp:1629-1631
+static SVal getSValAsConcreteInt(SValBuilder &SVB, SVal V, QualType baseT,
+                                 QualType elemT,
+                                 const RegionRawOffset &ORegionRawOffs) {
----------------
Let's write down the contract of this method. Do i understand correctly that for a given region **R** and offset **O**, you're trying to lookup an element of type `elemT` at offset **O** while knowing that **R** has binding `V` //at offset zero//?


================
Comment at: clang/lib/StaticAnalyzer/Core/RegionStore.cpp:1634
+
+  const llvm::APSInt *ConcreteValue = [&V](SVal Val) {
+    if (auto Int = V.getAs<nonloc::ConcreteInt>())
----------------
Please feel free to add this as a method on `SVal`, with a comment that this method does not take constraints in the current program state into account.


================
Comment at: clang/lib/StaticAnalyzer/Core/RegionStore.cpp:1648
+    else
+      bitPos = ORegionRawOffs.getOffset().getQuantity();
+    return bitPos * Ctx.getCharWidth();
----------------
This assignment can overflow. Both because the raw offset can be very large and because it can be negative.

Say, the following code compiles just fine (and may even run) so we need to be able to analyze it but you dodge that by always checking that the base type is a scalar type (if you intend to keep it that way i'd rather assert that; the other reason to assert that would be because endianness logic is screwed without it):
```lang=c
int main() {
  int x[5000000000]; // Over 2³² elements.
  x[0] = 1;
  char *y = &x;
  char c = y[19999999999];
}
```

The following code is valid but you seem to be protected from it because you dodge `SymbolicRegion`s by checking that **R** is a `TypedValueRegion`:
```lang=c
void foo(int *x) {
  long *y = (long *)(x - 1);
  y = 1;
  char c = *(char *)y;
}
```

The following code has an obvious UB but it still compiles so you have to make sure the analyzer behaves sanely:

```lang=c
void foo() {
  int x;
  long *y = (long *)(x - 1);
  y = 1;
  char c = *(char *)y;
}
```


================
Comment at: clang/lib/StaticAnalyzer/Core/RegionStore.cpp:1653-1654
+      Ctx.getCharWidth() * Ctx.getTypeSizeInChars(elemT).getQuantity();
+  if (bitPosition < ConcreteValue->getBitWidth() &&
+      (numBits + bitPosition) <= ConcreteValue->getBitWidth()) {
+    llvm::APInt bits = ConcreteValue->extractBits(numBits, bitPosition);
----------------
The `ConcreteValue->getBitWidth()` part is the only thing that you can't generalize to arbitrary values for now. Fundamentally, any symbolic value has an intrinsic width (and, moreover, almost an intrinsic //type//) that you should be able to compute by just looking at the value. For now we can't rely on that though, because our values for //integral casts// over symbols aren't represented correctly, and we can't represent them correctly without a proper constraint solver to solve the resulting constraints.

So a good temporary solution would be to proactively store widths together with bindings. I.e., instead of "In region **R** at offset **O** we have value **X**" we could write down "In region **R** from offset **O** to offset **O'** we have value **X**". That'd bulk up the `RegionStore` and make some operations more difficult but it will solve the problem entirely.


================
Comment at: clang/test/Analysis/concrete-endian.cpp:49
+#elif defined(__BIG_ENDIAN__)
+  clang_analyzer_eval(pps[3] == 0x8877);      // expected-warning{{TRUE}}
+  clang_analyzer_eval(pps[2] == 0xaa99);      // expected-warning{{TRUE}}
----------------
I suspect that exactly one of `pps[0]` in the little endian case or `pps[3]` in the big endian case should be `0x7788` instead. Like, they're in the opposite order, right?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93595/new/

https://reviews.llvm.org/D93595



More information about the cfe-commits mailing list