[PATCH] D38797: [analyzer] CStringChecker: pr34460: Admit that some casts are hard to model.

Artem Dergachev via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Oct 11 06:20:34 PDT 2017


NoQ created this revision.
Herald added a subscriber: szepet.

In https://bugs.llvm.org/show_bug.cgi?id=34460 CStringChecker tries to `evalCast()` a memory region from `void *` to `char *` for the purposes of modeling `mempcpy()`. The memory region turned out to be an element region of type `unsigned int` with a symbolic index. For such regions, even such trivial casts have turned out to be unsupported (modeled as `UnknownVal`), and the checker crashed upon asserting them to be supported.

I tried to get such trivial cast working for a few days and ended up believing that it not worth the effort because of:

1. How we represent casts in the SVal hierarchy.
  - In our case, the region is an ElementRegion with element type `unsigned int`, which //essentially// represents a pointer of type `unsigned int *`, not `void *`. It means that the cast is definitely not a no-op; the region needs to be modified.
  - Now, if the offset was concrete, we already try to replace ElementRegion with another ElementRegion of the other type, and recompute element index accordingly. It is only possible if byte offset of the region is divisible by size of the target type (in our case it is because the target type is `char`). If it's not divisible, we make two ElementRegions on top of each other, first with element-type `char` to represent a raw offset, other of the target type to represent the cast. It means that for symbolic index case, we need to account for being able to receive such two-element-regions construct in the first place, and then investigate divisibility of a linear expression of form `a*N + b`, where `N` is concrete but `a` and `b` are possibly symbolic, to  a concrete type size, which our range constraint manager would never handle.
  - In fact, now we not only unpack the top one or two ElementRegions while computing the offset (that needs to be divisible), but also we unpack all other top-level element regions (i.e. if we look into a multi-dimensional array; see `ElementRegion::getAsArrayOffset()`). This was never tested; i added relevant tests into `casts.c` in this patch to demonstrate the problems caused by that (which are unrelated to the rest of the patch). I believe that unpacking multi-dimensional arrays in this way is not ideal; however, i also added tests that would fail if this behavior is disabled. This needs much more work. Of course, this work would be even harder if we intend to support symbolic offsets.
2. How we split the responsibilities between entities. When trying to implement the plan described in 1., i ended up moving a lot of code around and passing ProgramState through dozens of APIs. We need the state to perform binary operations on `SVal` objects through `SValBuilder`. Originally, `castRegion()` resides in `RegionStore` which is not intended to access the program state. It is trivial to move `castRegion` to `SimpleSValBuilder`, but passing `State` into it results in a huge cascade of changes in `SValBuilder` method arguments. I have given up when i had to pass `State` to `SValBuilder::getConstantVal(const Expr *);`, which seemed ridiculous (it's constant, why does it depend on the state?). Introducing all this complexity only to compute an `SVal` that nobody would be able to understand was not worth the effort, in my opinion.

Hence the quick fix. Code slightly simplified to use a more high-level API.


https://reviews.llvm.org/D38797

Files:
  lib/StaticAnalyzer/Checkers/CStringChecker.cpp
  test/Analysis/bstring.cpp
  test/Analysis/casts.c

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D38797.118597.patch
Type: text/x-patch
Size: 5689 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20171011/5f528c3d/attachment.bin>


More information about the cfe-commits mailing list