# [cfe-dev] Buffer bounds checking for C in static analysis

Zhongxing Xu xuzhongxing at gmail.com
Tue Jan 19 16:57:17 PST 2010

```2010/1/19 Chris Hacking <chacking at cs.washington.edu>:
> Hi Zhongxing,
>
> Thanks for the quick response. If I’m understanding you correctly, here are couple of cases that the bounds checker would miss:
>        void setfive (int* x) { x[5]=5; }
>        int a[4]; setfive(a);
> (due to lack of inter-procedural analysis), and
>        int b[5]; int x;
>        for (x=0; x<=5; x++) { b[x]=x; }
> (due to lack of loop analysis).

Yes, clang would miss these cases. The reason is exactly what you pointed out.

>
> If I’ve misunderstood, please let me know. Considering the problem briefly, a few other areas that are potentially problematic are alias analysis:
>        int c[5]; int* y = c; y[5]=50;
> pointer arithmetic:
>        char s[10]; s++; s[9] = ‘\0’;
> dynamically sized arrays:
>        int len = atoi(“5”); char* str = malloc(len); str[len] = ‘\0’;
> although all could be interesting extensions to current checks.

Clang can process and track the information for these cases well. The
first two is pointer assignment and arithmetic. For the third one we
need to model atoi() (we haven't for now). If 'len is replaced by 5,
clang can detect it.

>
> While the exact implementation for handling the string manipulation functions would require some thought (and examination of the current analysis tool), they do seem important to test. The most dangerous buffer overflows typically seem to involve them in some way. To extend that idea somewhat, the string.h functions aren’t the only ones that can trigger buffer overflows; in particular many system calls write into buffers:
>        char e[255];
>        int s = socket(<blah>);
>        connect(s, <blah>);
>        recv(s, e, 256, 0);
>
> Anyhow, it sounds like there’s just enough groundwork here that we’d be able to make very meaningful contributions within the time frame available. I’m curious whether your comment on inter-procedural analysis means it’s not currently implemented in the Clang analyzer at all, or that it’s limited to specific checks right now. It seems like it would be near-critical to some checks.

Inter-procedural analysis is not used in any checks. There is an
experimental CallInliner.cpp, which implements inlining the callee
into the caller. But it's very preliminary. It's implementation is
hampered by some causal changes to the core analysis engine. For
example, we need to modify the RemoveDeadBindings algorithm to
accommodate the multiple call frame situation.

>
> In any case, thanks again. We need to create a specific project proposal (in the next few days), which is why I’m asking for more specifics on what is or is not checked currently. If there’s any documentation you could point me at that would help too. I looked at the source file but it doesn’t describe any limitations of the environment in which it operates.

I'm sorry that there is no document on the current status of checkers.
They are in the development stage.

>
> Chris Hacking
>
>
> From: Zhongxing Xu [mailto:xuzhongxing at gmail.com]
> Sent: Monday, 18 January 2010 11:35 PM
> To: Chris Hacking
> Cc: cfe-dev at cs.uiuc.edu
> Subject: Re: [cfe-dev] Buffer bounds checking for C in static analysis
>
> Hi Chris,
>
> The current clang static analyzer only has a basic array bounds checker (ArrayBoundChecker.cpp). Its function is very simple: whenever a location is visited, if it's an array element, compare its index with the size of the array. Some errors it can detect:
>
> int a[4];
> a[4] = 3;
>
> and
> int *p = malloc(12);
> p[3] = 3;
>
> That is, the current analyzer has the basic infrastructure for detecting out-of-bound array access.
>
> The complexity of bounds checking arises from two facts: (may be there are others)
>
> a. Many out-of-bound accesses occur in loops. Depending on the length of the buffer, it may require looping many times before triggering the out-of-bound access. But currently we only unwrap the loop for 2 or 3 times.
>
> b. There are several string manipulation functions, such as strcpy(), strcat(). Currently we don't handle them. One simple way is to provide a fake implementation of them and inline them into the call site, and treat them as normal loops. Another way is to create some linear constraints over the arguments of them, and solve it. But this has complexity that we need to model the actual length of the string and the size of the array.
>
> Some other projects includes integer overflow checking. This has not been implemented at all in clang.
>
> Also inter-procedural analysis is required to enhance all other checks.
> 2010/1/19 Chris Hacking <chacking at cs.washington.edu>
> Hi all,
>
> I'm a student (part of a group of 3) looking for a project involving
> software engineering tools, preferably static analysis. I do a lot of work
> in C and have long felt that a static analysis tool for bounds checking on
> memory buffers (arrays/strings) would be very helpful. I saw that this was a
> requested feature for the Clang analysis tool, but there's very little info
> and it's apparently not fully developed yet. Therefore I have two questions:
>
> What is the state of bounds checking for C in the Clang analyzer, in terms
> of how far it has gotten and how much work is progressing on it?
>
> Is there another static analysis area that the Clang static analyzer needs
> implemented that would be a reasonable project for a few CS grad students?
>
> Thanks,
> Chris Hacking
>
> There's no place I can be,
> Since I found Serenity.
> But you can't take the sky from me.
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>
>
>

```