[PATCH] D86874: [analyzer] Fix ArrayBoundCheckerV2 false positive regarding size_t indexer
Gabor Marton via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Fri Sep 4 04:41:46 PDT 2020
martong added a comment.
I am coping my comments that I've sent in mail, just in case.
> However, in the checker, we don't check `byte offset < 0` directly.
> We //simplify// the left part of the `byte offset < 0` inequality by substracting/dividing both sides with the constant `C`, if the `byte offset` is a `SymintExpr` of `SymExpr OP C` over the plus or multiplication operator (OP).
> We do this //simplification// recursively.
> In the end, we will have a //symbolic root// (call it `RootSym`) expression and a `C'` constant over the right-hand side of the original relational operator.
> So, after the //simplification// process we get the `RootSym < C'` inequality, which we check.
Just for the record, this is in `getSimplifiedOffsets`.
> //Note: I don't know what `RegionRawOffset` is, or what that does.//
Seems like that is just a pair of a `MemRegion` plus a concrete int `Offset`. And this is the return type for `ElementRegion::getAsArrayOffset()` where we handle only `nonloc::ConcreteInt`s. So, this is similar to RegionRawOffsetV2, but without the possibility of the symbolic index value.
> Calculation of the RegionRawOffsetV2
> ------------------------------------
>
> Let's see how does these subscript expressions used during the calculation of the `RegionRawOffsetV2`:
> For multi-dimension arrays we //fold// the index expressions of the nested `ElementRegion`s.
> We are doing that by simply calculating the byte-offset of the nested region, and adding the current level's //index*sizeof(Type)//.
> So, the `ByteOffset` that we build up will contain mostly multiplications by a constant OR additions of symbolic expressions (like the `x+1` in the example).
We have these lines in the case of handling the ElementRegion:
if (!index.getAs<NonLoc>())
return RegionRawOffsetV2();
So, if the index is symbolic we give up, don't we? So, how could this work with `x+1`? What am I missing here?
> The resulting `RegionRawOffsetV2` for `p`, `q` will be:
> p: {BaseRegion: `buf`, ByteOffset: `20 S64b`}
> q: {BaseRegion: `buf`, ByteOffset: `(((reg_$0<int x>) + 1) * 12) + 24`}
> ^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^
> | |
> `@a` This is an //object-language// expression. --/ |
> /
> `@b` The rest should be //meta-language// expression. -------/
>
> SPOILER: This distinction is really important here.
>
> So, we made an expression, we should not evaluate in either of the worlds.
> We should not evaluate it using the semantics of the //object-language// since only the `@a` is of that world.
> We should model overflow in `@a` if that expression is of //unsigned// type (OR signed but `-fwrapv`...) etc. according to the //abstract machine//.
> BUT modeling the possibility of an overflow of the rest of the expression (`@b`) would be a flaw.
Why? I'd expect that the value of the whole expression `@a at b` could overflow.
> Simplify step, again
> --------------------
>
> Simplification of the `(((reg_$0<int x>) + 1) * 12) + 24` < `0` inequality...
> ^^^^^^^^^^
> `@b`
> Result: `reg_$0<int x> < -3 S64b`
> ^^^^^^^^^^^^^ ^^^^^^^
> `RootSym` --/ |
> /
> `C'` ----------------------/
>
> `#1`: This step supposed to fold **ONLY** the //meta-language// expression parts (`@b`) of the previously aquired `ByteOffset`.
> `#2`: This step requires the expression under simplified to be of //meta-language// to be able to reorder the constants. (aka. to fold into the right hand side's constant).
> `#3`: This also requires the right-hand side to be of the //meta-language//.
Do I understand this correctly, that none of the binary operations can have a symbolic RHS, because that would mean we have a VLA?
> We treat the complete expression under //simplification// as an expression of the //meta-language//.
> I'm not changing this, but I probably should though.
Again, I don't understand why you are sure that the value of //whole// expression cannot overflow.
> Ideally, after //simplification// we should have this inequality: `x+1 < -2`
> That way we would not fold any subexpression of the //object-language//, so the `#1` requirement would be preserved.
> The right-hand side is of an expression of the //meta-language//.
> It makes sense, to evaluate the `operator<` as the semantics of the //meta-language//.
> But the left-hand side is an expression of the //object-language//.
> We need some sort of //logical// conversion here.
What if the second index is also symbolic? E.g `buf[x+1][y+1]`?
This would result in `(((reg_$0<int x>) + 1) * 12) + (reg_$1<int y>) + 1)` < `0` . And after simplification, the RHS cannot be interpreted as //meta//.
> Check if the resulting //folded constant// (`C'`) is negative or not.
What happens if the symbolic index is not the most inner index? E.g. `buf[6][x+1]`. Then `C` is no longer constant, and no longer //meta// ...
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D86874/new/
https://reviews.llvm.org/D86874
More information about the cfe-commits
mailing list