[cfe-dev] [analyzer] Constrain the size of unknown memory regions

Artem Dergachev via cfe-dev cfe-dev at lists.llvm.org
Mon Mar 16 07:41:32 PDT 2020


Wait, so you're not trying to do this in a checker, but trying to 
introduce, like, a mechanism for the users to use or something? Ok, got 
it. Interesting, yeah.

Well, you could always add a magic function like 
__clang_analyzer_getExtent() that would work exactly like 
clang_analyzer_getExtent() from the debug.ExprInspection checker but 
will be on by default and double-underscored so that to avoid 
conflicting with user code (note that defining double-underscored 
declarations outside of the standard library is UB). Then give them a 
header:

   // analyzer_helper.h
   #ifdef __clang_analyzer__
     size_t __clang_analyzer_getExtent(void *x);
     #define ANALYZER_ASSERT assert
     #define ANALYZER_EXTENT(x) __clang_analyzer_getExtent(x)
   #else
     #define ANALYZER_ASSERT
     #define ANALYZER_EXTENT(x)
   #endif

And then tell them to write code like this:

   ANALYZER_ASSERT(ANALYZER_EXTENT(buff) == size);

I don't know whether it's a good idea. Not only we have too little 
positive experience with array bound checking for now, but also there 
are other powerful tools to deal with buffer overflows, like ASan/MSan 
and, well, C++ containers. It's unclear to me how often would it be 
impossible

Another possible approach is to introduce annotations and teach the 
analyzer to understand them, eg.:

   int g_user_function(
     __attribute__((buffer_ptr("buff"))) int *buff,
     __attribute__((buffer_size("buff"))) int size);

Or like this:

   struct BoundedBuff {
     __attribute__((buffer_ptr("buff"))) int *buff;
     __attribute__((buffer_size("buff"))) int size);
   };

That's slightly less flexible but also less ugly (when you hide those 
behind macros as well).

 > I think that no '/annotations/' should introduce UB even if that
 > assumption expressed via an annotation is turned out to be _invalid_.

Well, that's a fairly common thing to happen when annotations are 
introduced for optimizations rather than for hardening. Say, if you 
violate __builtin_assume it's a UB. If you violate a `restrict` 
qualifier contract it's a UB. Even if you violate a `const` qualifier 
contract it's a UB. Interestingly, iirc in-language assertions in C++23 
contracts are currently expected to introduce UB when violated. That 
said, i definitely agree with you in your case.


On 3/16/20 5:06 PM, Balázs Benics wrote:
> Hi Artem and Csaba,
> Thank you for your response.
>
> Sorry for the long description of the problem in my original email, 
> but I wanted to provide as much context as I could.
> Provided that we have to analyze code like in that example, the 
> analyzer (correctly) assumes that the pointer points to an unknown 
> memory region.
> However, the user knows that the function will be called with a valid 
> buffer, which would be capable of holding at least/most n elements. 
> For now, we can not tell this assumption to the analyzer. There is no 
> way in standard C++ to express such notion precisely. Asserts would 
> not be powerful enough, as I earlier stated.
> So we have to tell this kind of assumption to the analyzer in a 
> different way. But in what way?
>
> Some sort of annotation or special analyzer assert? I'm not sure, but 
> none of these look promising, really.
>
> I think we should find a way to properly analyze the following two cases:
>
>   * `int f_user_function(int *points_to_at_least_5_elements);`
>     The user knows that this function must be called with a pointer
>     pointing to an array capable of holding at least 5 elements.
>     We should be able to tell this assumption to the analyzer, to
>     analyze its body according to this assumption.
>
>   * `int g_user_function(int *buff, int size);`
>     There is a connection between the `buff`and `size`, which denotes
>     similar properties that were described in the previous bullet
>     point, but the analyzer would not know.
>
> Regards Balazs.
>
> Artem Dergachev <noqnoqneo at gmail.com <mailto:noqnoqneo at gmail.com>> ezt 
> írta (időpont: 2020. márc. 16., H, 3:08):
>
>     I think you're looking for SymbolExtent. It is *the* symbol that
>     denotes
>     the [otherwise completely] unknown size of a region. The helper
>     function
>     for obtaining either a known extent or a SymbolExtent is currently
>     known
>     as getDynamicSize() (but i'd rather rename it back to "extent" - see
>     also https://reviews.llvm.org/D69726).
>
>     ArrayBoundChecker already has the code that you're looking for.
>
>     On 3/12/20 3:20 PM, Balázs Benics via cfe-dev wrote:
>     > Hi, checker devs
>     >
>     > TLDR:
>     > How to constrain the size of unknown memory regions, eg. pointed by
>     > 'raw' char pointers?
>     >
>     > longer version:
>     > Working on taint analysis I'm facing with the following problem:
>     >
>     >     void strncpy_bounded_tainted_buffer(char *src, char *dst) {
>     >       // assert(strlen(src) >= 10 && "src must have at leas 10
>     elements");
>     >       int n;
>     >       scanf("%d", &n);
>     >       if (0 < n && n < 10) {
>     >         strncpy(dst, src, n); // Should we warn or not?
>     >       }
>     >     }
>     >
>     >
>     >
>     > In this example we analyze a function taking two raw pointers,
>     so we
>     > don't know how big those arrays are.
>     > We will check the `strncpy` call, whether it will access /(read and
>     > write)/ only valid memory.
>     > We will check the pointers (src and dst) separately by checking if
>     > /`/&src[0]` and `&src[n-1]` would be in bound of the memory region
>     > pointed by the pointer. Since the analyzer don't know (both
>     states are
>     > non-null), we should check if the `length` parameter is tainted,
>     and
>     > if so, we should still issue a warning telling that "String copy
>     > function might overflow the given buffer since untrusted data is
>     used
>     > to specify the buffer size."
>     > Obviously, if the `length` parameter is not tainted, we will assume
>     > (conservatively) that the access would be valid.
>     >
>     >
>     > How should tell the analyzer that the array which is pointed by the
>     > pointer holds at least/most N elements?
>     > For example in the code above, express something similar via an
>     > assertion, like saying that `src` points to a c-string, which
>     has at
>     > least 10 + 1 element underlying storage.
>     > Although this assertion using `strlen` seems like a solution,
>     > unfortunately not applicable for example to the `dst` buffer,
>     which is
>     > most likely uninitialized - so not a c-string, in other words
>     calling
>     > `strlen` would be undefined behavior.
>     >
>     > The only (hacky) option which came in my mind was to abuse the
>     > standard regarding pointer arithmetic.
>     >
>     >     assert(&src[n] - &src[-1]);
>     >
>     > The standard is clear about that pointer arithmetic is only
>     applicable
>     > for pointers pointing to elements of the same array OR to a
>     > hypothetical ONE past/before element of that array.
>     > http://eel.is/c++draft/expr.add#4.2
>     >
>     > This assertion would be undefined behavior if the size of the array
>     > pointed by `src` would be smaller than `n`.
>     >
>     > IMO this looks really ugly.
>     > I think that no '/annotations/' should introduce UB even if that
>     > assumption expressed via an annotation is turned out to be
>     _invalid_.
>     >
>     >
>     > What would be the right approach to guide (to constrain the size
>     of a
>     > memory region) the analyzer?
>     > How can the analyzer inference such constraint?
>     >
>     > Thanks Balazs.
>     >
>     > _______________________________________________
>     > cfe-dev mailing list
>     > cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>     > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>



More information about the cfe-dev mailing list