[PATCH] D86874: [analyzer] Fix ArrayBoundCheckerV2 false positive regarding size_t indexer

Gabor Marton via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Sep 4 04:41:46 PDT 2020


martong added a comment.

I am coping my comments that I've sent in mail, just in case.

> However, in the checker, we don't check `byte offset < 0` directly.
> We //simplify// the left part of the `byte offset < 0` inequality by substracting/dividing both sides with the constant `C`, if the `byte offset` is a `SymintExpr` of `SymExpr OP C` over the plus or multiplication operator (OP).
> We do this //simplification// recursively.
> In the end, we will have a //symbolic root// (call it `RootSym`) expression and a `C'` constant over the right-hand side of the original relational operator.
> So, after the //simplification// process we get the `RootSym < C'` inequality, which we check.

Just for the record, this is in `getSimplifiedOffsets`.

> //Note: I don't know what `RegionRawOffset` is, or what that does.//

Seems like that is just a pair of a `MemRegion` plus a concrete int `Offset`. And this is the return type for `ElementRegion::getAsArrayOffset()` where we handle only `nonloc::ConcreteInt`s. So, this is similar to RegionRawOffsetV2, but without the possibility of the symbolic index value.

> Calculation of the RegionRawOffsetV2
> ------------------------------------
>
> Let's see how does these subscript expressions used during the calculation of the `RegionRawOffsetV2`:
> For multi-dimension arrays we //fold// the index expressions of the nested `ElementRegion`s.
> We are doing that by simply calculating the byte-offset of the nested region, and adding the current level's //index*sizeof(Type)//.
> So, the `ByteOffset` that we build up will contain mostly multiplications by a constant OR additions of symbolic expressions (like the `x+1` in the example).

We have these lines in the case of handling the ElementRegion:

  if (!index.getAs<NonLoc>())
    return RegionRawOffsetV2();

So, if the index is symbolic we give up, don't we? So, how could this work with `x+1`? What am I missing here?

>   The resulting `RegionRawOffsetV2` for `p`, `q` will be:
>   p: {BaseRegion: `buf`, ByteOffset: `20 S64b`}
>   q: {BaseRegion: `buf`, ByteOffset: `(((reg_$0<int x>) + 1) * 12) + 24`}
>                                         ^^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^
>                                                      |           |
>   `@a` This is an //object-language// expression. --/            |
>                                                                 /
>   `@b` The rest should be //meta-language// expression. -------/
>
> SPOILER: This distinction is really important here.
>
> So, we made an expression, we should not evaluate in either of the worlds.
> We should not evaluate it using the semantics of the //object-language// since only the `@a` is of that world.
> We should model overflow in `@a` if that expression is of //unsigned// type (OR signed but `-fwrapv`...) etc. according to the //abstract machine//.
> BUT modeling the possibility of an overflow of the rest of the expression (`@b`) would be a flaw.

Why? I'd expect that the value of the whole expression `@a at b` could overflow.

> Simplify step, again
> --------------------
>
>   Simplification of the `(((reg_$0<int x>) + 1) * 12) + 24` < `0` inequality...
>                                                 ^^^^^^^^^^
>                                                    `@b`
>   Result:  `reg_$0<int x> < -3 S64b`
>             ^^^^^^^^^^^^^   ^^^^^^^
>   `RootSym` --/                |
>                               /
>   `C'` ----------------------/
>
> `#1`: This step supposed to fold **ONLY** the //meta-language// expression parts (`@b`) of the previously aquired `ByteOffset`.
> `#2`: This step requires the expression under simplified to be of //meta-language// to be able to reorder the constants. (aka. to fold into the right hand side's constant).
> `#3`: This also requires the right-hand side to be of the //meta-language//.

Do I understand this correctly, that none of the binary operations can have a symbolic RHS, because that would mean we have a VLA?

> We treat the complete expression under //simplification// as an expression of the //meta-language//.
> I'm not changing this, but I probably should though.

Again, I don't understand why you are sure that the value of //whole// expression cannot overflow.

> Ideally, after //simplification// we should have this inequality: `x+1 < -2`
> That way we would not fold any subexpression of the //object-language//, so the `#1` requirement would be preserved.
> The right-hand side is of an expression of the //meta-language//.
> It makes sense, to evaluate the `operator<` as the semantics of the //meta-language//.
> But the left-hand side is an expression of the //object-language//.
> We need some sort of //logical// conversion here.

What if the second index is also symbolic? E.g `buf[x+1][y+1]`?
This would result in `(((reg_$0<int x>) + 1) * 12) + (reg_$1<int y>) + 1)` < `0` . And after simplification, the RHS cannot be interpreted as //meta//.

> Check if the resulting //folded constant// (`C'`) is negative or not.

What happens if the symbolic index is not the most inner index? E.g. `buf[6][x+1]`. Then `C` is no longer constant, and no longer //meta// ...


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86874/new/

https://reviews.llvm.org/D86874



More information about the cfe-commits mailing list