[clang] [analyzer] Enhance array bound checking for `ConstantArrayType` (PR #159357)

Donát Nagy via cfe-commits cfe-commits at lists.llvm.org
Mon Sep 22 04:52:57 PDT 2025


NagyDonat wrote:

@alejandro-alvarez-sonarsource Thanks for the detailed answers and clarifications! :smile:

> I tend to be conservative about existing behavior because
> 
>     1. It is working!
> 
>     2. We have been relying on it for long, and even a small divergence could turn into hundreds, if not thousands, of new positives and negatives coming and going on existing projects (changes that unfortunately are not caught by lit tests)
> 
>     3. Not to mention a slight change in assumptions can cascade to other CSA checkers (both up and downstream) adding and/or removing another big set of raised issues.
> 
> 
> Of course this is not news for you!, but this is why I have preferred to keep the surface of the changes small (at the cost of duplication).

Understood, and I agree in general. However, note that `ArrayBound` is a new checker (I brought it out of alpha this year), so I feel that it may be a bit more malleable than the "ancient" checkers like e.g. `CallAndMessage`.

-----

> > * Perhaps we don't need the complex old logic that handles multidimensional arrays by calculating the offset from the beginning of the whole array (by combining multiple `ElementRegion`s) -- IIUC that logic was introduced when this checker was still using `check::Location` and `check::Bind` callbacks that provided less information than the current callbacks.
> > * Perhaps we could switch to using indices instead of byte offsets by default -- because this would produce simpler symbolic expressions that would be "more useful" for the rest of the analyzer (e.g. instead of assuming `4 * x < 40` we could assume `x < 10` which can also occur as e.g. an `if` condition).
> 
> Can we? How about cases such as this lit test?
> 
> ```c++
> struct two_bytes convertedArray2(void) {
>   // We report this with byte offsets because the offset is not divisible by the element size.
>   struct two_bytes a = {0, 0};
>   char *p = (char*)&a;
>   return *((struct two_bytes*)(p + 7));
>   // expected-warning at -1 {{Out of bound access to memory after the end of 'a'}}
>   // expected-note at -2 {{Access of 'a' at byte offset 7, while it holds only 2 bytes}}
> }
> ```

Hmm, we will probably need byte offsets as a fallback option if we need to cover cases like this. (Which is not an absolute requirement IMO -- we can accept not reporting a few rare corner cases if it is compensated by better behavior in the common case.)

> [...] It's quite a coincidence (or perhaps not!). I am also working in parallel in `PointerArithChecker` which is turning out to be very similar too (except it raises the moment the pointer is formed, and that one-after-the-end is still ok, as long as it is not derefenced). So indeed a "core bound checker" library sounds very interesting.

It is nice to hear that you are progressing with `PointerArith` :smile:

As my next goal in this area I will try to design the interface of the "core bound checker" library, which should be general enough to cover all the use cases. I'll try to post a discourse post about this within a few days.

> > In fact, I started to work on this change two days ago, but I'll probably need to restart that work because this PR affects lots of code within `ArrayBoundChecker.cpp`.
> 
> Sorry 😅 I am thinking, maybe you go ahead with your work on those changes, since you have already progressed. [...]

No problem :smile: collisions are natural in open source development.

Now that this discussion highlighted the question of introducing a library-like interface for bounds checking, I will probably work on that and discard my unfinished change (which would complicate the `performCheck` and make the separation of the library interface more difficult).

https://github.com/llvm/llvm-project/pull/159357


More information about the cfe-commits mailing list