[cfe-dev] [analyzer] Constrain the size of unknown memory regions
Artem Dergachev via cfe-dev
cfe-dev at lists.llvm.org
Mon Mar 16 07:41:32 PDT 2020
Wait, so you're not trying to do this in a checker, but trying to
introduce, like, a mechanism for the users to use or something? Ok, got
it. Interesting, yeah.
Well, you could always add a magic function like
__clang_analyzer_getExtent() that would work exactly like
clang_analyzer_getExtent() from the debug.ExprInspection checker but
will be on by default and double-underscored so that to avoid
conflicting with user code (note that defining double-underscored
declarations outside of the standard library is UB). Then give them a
header:
// analyzer_helper.h
#ifdef __clang_analyzer__
size_t __clang_analyzer_getExtent(void *x);
#define ANALYZER_ASSERT assert
#define ANALYZER_EXTENT(x) __clang_analyzer_getExtent(x)
#else
#define ANALYZER_ASSERT
#define ANALYZER_EXTENT(x)
#endif
And then tell them to write code like this:
ANALYZER_ASSERT(ANALYZER_EXTENT(buff) == size);
I don't know whether it's a good idea. Not only we have too little
positive experience with array bound checking for now, but also there
are other powerful tools to deal with buffer overflows, like ASan/MSan
and, well, C++ containers. It's unclear to me how often would it be
impossible
Another possible approach is to introduce annotations and teach the
analyzer to understand them, eg.:
int g_user_function(
__attribute__((buffer_ptr("buff"))) int *buff,
__attribute__((buffer_size("buff"))) int size);
Or like this:
struct BoundedBuff {
__attribute__((buffer_ptr("buff"))) int *buff;
__attribute__((buffer_size("buff"))) int size);
};
That's slightly less flexible but also less ugly (when you hide those
behind macros as well).
> I think that no '/annotations/' should introduce UB even if that
> assumption expressed via an annotation is turned out to be _invalid_.
Well, that's a fairly common thing to happen when annotations are
introduced for optimizations rather than for hardening. Say, if you
violate __builtin_assume it's a UB. If you violate a `restrict`
qualifier contract it's a UB. Even if you violate a `const` qualifier
contract it's a UB. Interestingly, iirc in-language assertions in C++23
contracts are currently expected to introduce UB when violated. That
said, i definitely agree with you in your case.
On 3/16/20 5:06 PM, Balázs Benics wrote:
> Hi Artem and Csaba,
> Thank you for your response.
>
> Sorry for the long description of the problem in my original email,
> but I wanted to provide as much context as I could.
> Provided that we have to analyze code like in that example, the
> analyzer (correctly) assumes that the pointer points to an unknown
> memory region.
> However, the user knows that the function will be called with a valid
> buffer, which would be capable of holding at least/most n elements.
> For now, we can not tell this assumption to the analyzer. There is no
> way in standard C++ to express such notion precisely. Asserts would
> not be powerful enough, as I earlier stated.
> So we have to tell this kind of assumption to the analyzer in a
> different way. But in what way?
>
> Some sort of annotation or special analyzer assert? I'm not sure, but
> none of these look promising, really.
>
> I think we should find a way to properly analyze the following two cases:
>
> * `int f_user_function(int *points_to_at_least_5_elements);`
> The user knows that this function must be called with a pointer
> pointing to an array capable of holding at least 5 elements.
> We should be able to tell this assumption to the analyzer, to
> analyze its body according to this assumption.
>
> * `int g_user_function(int *buff, int size);`
> There is a connection between the `buff`and `size`, which denotes
> similar properties that were described in the previous bullet
> point, but the analyzer would not know.
>
> Regards Balazs.
>
> Artem Dergachev <noqnoqneo at gmail.com <mailto:noqnoqneo at gmail.com>> ezt
> írta (időpont: 2020. márc. 16., H, 3:08):
>
> I think you're looking for SymbolExtent. It is *the* symbol that
> denotes
> the [otherwise completely] unknown size of a region. The helper
> function
> for obtaining either a known extent or a SymbolExtent is currently
> known
> as getDynamicSize() (but i'd rather rename it back to "extent" - see
> also https://reviews.llvm.org/D69726).
>
> ArrayBoundChecker already has the code that you're looking for.
>
> On 3/12/20 3:20 PM, Balázs Benics via cfe-dev wrote:
> > Hi, checker devs
> >
> > TLDR:
> > How to constrain the size of unknown memory regions, eg. pointed by
> > 'raw' char pointers?
> >
> > longer version:
> > Working on taint analysis I'm facing with the following problem:
> >
> > void strncpy_bounded_tainted_buffer(char *src, char *dst) {
> > // assert(strlen(src) >= 10 && "src must have at leas 10
> elements");
> > int n;
> > scanf("%d", &n);
> > if (0 < n && n < 10) {
> > strncpy(dst, src, n); // Should we warn or not?
> > }
> > }
> >
> >
> >
> > In this example we analyze a function taking two raw pointers,
> so we
> > don't know how big those arrays are.
> > We will check the `strncpy` call, whether it will access /(read and
> > write)/ only valid memory.
> > We will check the pointers (src and dst) separately by checking if
> > /`/&src[0]` and `&src[n-1]` would be in bound of the memory region
> > pointed by the pointer. Since the analyzer don't know (both
> states are
> > non-null), we should check if the `length` parameter is tainted,
> and
> > if so, we should still issue a warning telling that "String copy
> > function might overflow the given buffer since untrusted data is
> used
> > to specify the buffer size."
> > Obviously, if the `length` parameter is not tainted, we will assume
> > (conservatively) that the access would be valid.
> >
> >
> > How should tell the analyzer that the array which is pointed by the
> > pointer holds at least/most N elements?
> > For example in the code above, express something similar via an
> > assertion, like saying that `src` points to a c-string, which
> has at
> > least 10 + 1 element underlying storage.
> > Although this assertion using `strlen` seems like a solution,
> > unfortunately not applicable for example to the `dst` buffer,
> which is
> > most likely uninitialized - so not a c-string, in other words
> calling
> > `strlen` would be undefined behavior.
> >
> > The only (hacky) option which came in my mind was to abuse the
> > standard regarding pointer arithmetic.
> >
> > assert(&src[n] - &src[-1]);
> >
> > The standard is clear about that pointer arithmetic is only
> applicable
> > for pointers pointing to elements of the same array OR to a
> > hypothetical ONE past/before element of that array.
> > http://eel.is/c++draft/expr.add#4.2
> >
> > This assertion would be undefined behavior if the size of the array
> > pointed by `src` would be smaller than `n`.
> >
> > IMO this looks really ugly.
> > I think that no '/annotations/' should introduce UB even if that
> > assumption expressed via an annotation is turned out to be
> _invalid_.
> >
> >
> > What would be the right approach to guide (to constrain the size
> of a
> > memory region) the analyzer?
> > How can the analyzer inference such constraint?
> >
> > Thanks Balazs.
> >
> > _______________________________________________
> > cfe-dev mailing list
> > cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
More information about the cfe-dev
mailing list