[PATCH] D86874: [analyzer] Fix ArrayBoundCheckerV2 false positive regarding size_t indexer

Mon Sep 7 07:28:36 PDT 2020

steakhal added a comment.

In D86874#2259105 <https://reviews.llvm.org/D86874#2259105>, @Szelethus wrote:

> I can only imagine how long it took for you to write all this, because reading it wasn't that short either. Thank you so much! It really gave me a perspective on how you see this problem, as well as what is actually happening (and should happen) in the checker.

Thank you very much. I tied to make it as short as possible while keeping all the necessary details to make my reasoning perfectly clean. Now I think I was on a bad track, but @martong helped me to pinpoint some issues.

> In D86874#2254638 <https://reviews.llvm.org/D86874#2254638>, @steakhal wrote:
>
>>   void foo(int x) {
>>     int buf[10];
>>   
>>     char *r = (char*)(buf + x) + 3;
>>     *r = 66; // Might be safe, if x has a value to make it so.
>>   }
>
> I suppose you mean that it might be unsafe on the grounds of out-of-bounds addressing, but alignment-wise its okay?

You can safely access any valid memory region via `char`, `unsigned char` and `std::byte` pointers. And as integers are using all bits in their representation, I suspect that this code snippet is safe - given that the pointer points to a valid region.
(aka. no out-of-bound access happens). Alignment does not play any role here.

> In D86874#2256249 <https://reviews.llvm.org/D86874#2256249>, @martong wrote:
>
>>>   The resulting `RegionRawOffsetV2` for `p`, `q` will be:
>>>   p: {BaseRegion: `buf`, ByteOffset: `20 S64b`}
>>>   q: {BaseRegion: `buf`, ByteOffset: `(((reg_$0<int x>) + 1) * 12) + 24`}
>>>                                         ^^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^
>>>                                                      |           |
>>>   `@a` This is an //object-language// expression. --/            |
>>>                                                                 /
>>>   `@b` The rest should be //meta-language// expression. -------/
>>>
>>> SPOILER: This distinction is really important here.
>>>
>>> So, we made an expression, we should not evaluate in either of the worlds.
>>> We should not evaluate it using the semantics of the //object-language// since only the `@a` is of that world.
>>> We should model overflow in `@a` if that expression is of //unsigned// type (OR signed but `-fwrapv`...) etc. according to the //abstract machine//.
>>> BUT modeling the possibility of an overflow of the rest of the expression (`@b`) would be a flaw.
>>
>> Why? I'd expect that the value of the whole expression `@a at b` could overflow.
>
> That wasn't all too clear to me either. It seems like there is extra work to do if we do the calculation according to C++ rules, but not the other way around, so we could just regard everything as if it was under the object-language ruleset.

Let's forget it. My statements in this regard are going in a bad direction. Sorry for the dead-end, I thought I was on the right track.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86874/new/

https://reviews.llvm.org/D86874