[cfe-dev] [analyzer] Constrain the size of unknown memory regions

Balázs Benics via cfe-dev cfe-dev at lists.llvm.org
Mon Mar 16 07:06:07 PDT 2020


Hi Artem and Csaba,
Thank you for your response.

Sorry for the long description of the problem in my original email, but I
wanted to provide as much context as I could.
Provided that we have to analyze code like in that example, the analyzer
(correctly) assumes that the pointer points to an unknown memory region.
However, the user knows that the function will be called with a valid
buffer, which would be capable of holding at least/most n elements. For
now, we can not tell this assumption to the analyzer. There is no way in
standard C++ to express such notion precisely. Asserts would not be
powerful enough, as I earlier stated.
So we have to tell this kind of assumption to the analyzer in a different
way. But in what way?

Some sort of annotation or special analyzer assert? I'm not sure, but none
of these look promising, really.

I think we should find a way to properly analyze the following two cases:

   - `int f_user_function(int *points_to_at_least_5_elements);`
   The user knows that this function must be called with a pointer pointing
   to an array capable of holding at least 5 elements.
   We should be able to tell this assumption to the analyzer, to analyze
   its body according to this assumption.


   - `int g_user_function(int *buff, int size);`
   There is a connection between the `buff` and `size`, which denotes
   similar properties that were described in the previous bullet point, but
   the analyzer would not know.

Regards Balazs.

Artem Dergachev <noqnoqneo at gmail.com> ezt írta (időpont: 2020. márc. 16.,
H, 3:08):

> I think you're looking for SymbolExtent. It is *the* symbol that denotes
> the [otherwise completely] unknown size of a region. The helper function
> for obtaining either a known extent or a SymbolExtent is currently known
> as getDynamicSize() (but i'd rather rename it back to "extent" - see
> also https://reviews.llvm.org/D69726).
>
> ArrayBoundChecker already has the code that you're looking for.
>
> On 3/12/20 3:20 PM, Balázs Benics via cfe-dev wrote:
> > Hi, checker devs
> >
> > TLDR:
> > How to constrain the size of unknown memory regions, eg. pointed by
> > 'raw' char pointers?
> >
> > longer version:
> > Working on taint analysis I'm facing with the following problem:
> >
> >     void strncpy_bounded_tainted_buffer(char *src, char *dst) {
> >       // assert(strlen(src) >= 10 && "src must have at leas 10
> elements");
> >       int n;
> >       scanf("%d", &n);
> >       if (0 < n && n < 10) {
> >         strncpy(dst, src, n); // Should we warn or not?
> >       }
> >     }
> >
> >
> >
> > In this example we analyze a function taking two raw pointers, so we
> > don't know how big those arrays are.
> > We will check the `strncpy` call, whether it will access /(read and
> > write)/ only valid memory.
> > We will check the pointers (src and dst) separately by checking if
> > /`/&src[0]` and `&src[n-1]` would be in bound of the memory region
> > pointed by the pointer. Since the analyzer don't know (both states are
> > non-null), we should check if the `length` parameter is tainted, and
> > if so, we should still issue a warning telling that "String copy
> > function might overflow the given buffer since untrusted data is used
> > to specify the buffer size."
> > Obviously, if the `length` parameter is not tainted, we will assume
> > (conservatively) that the access would be valid.
> >
> >
> > How should tell the analyzer that the array which is pointed by the
> > pointer holds at least/most N elements?
> > For example in the code above, express something similar via an
> > assertion, like saying that `src` points to a c-string, which has at
> > least 10 + 1 element underlying storage.
> > Although this assertion using `strlen` seems like a solution,
> > unfortunately not applicable for example to the `dst` buffer, which is
> > most likely uninitialized - so not a c-string, in other words calling
> > `strlen` would be undefined behavior.
> >
> > The only (hacky) option which came in my mind was to abuse the
> > standard regarding pointer arithmetic.
> >
> >     assert(&src[n] - &src[-1]);
> >
> > The standard is clear about that pointer arithmetic is only applicable
> > for pointers pointing to elements of the same array OR to a
> > hypothetical ONE past/before element of that array.
> > http://eel.is/c++draft/expr.add#4.2
> >
> > This assertion would be undefined behavior if the size of the array
> > pointed by `src` would be smaller than `n`.
> >
> > IMO this looks really ugly.
> > I think that no '/annotations/' should introduce UB even if that
> > assumption expressed via an annotation is turned out to be _invalid_.
> >
> >
> > What would be the right approach to guide (to constrain the size of a
> > memory region) the analyzer?
> > How can the analyzer inference such constraint?
> >
> > Thanks Balazs.
> >
> > _______________________________________________
> > cfe-dev mailing list
> > cfe-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200316/ab27b4a0/attachment-0001.html>


More information about the cfe-dev mailing list