[cfe-dev] Detecting undefined pointer arithmetic

Paweł Bylica via cfe-dev cfe-dev at lists.llvm.org
Mon Mar 8 02:34:19 PST 2021


Hi All,

There are two instrumentations in the ASan that are related to the subject:
- pointer-subtract,
- pointer-compare.

They check if a pointer arithmetic operation of a given kind (subtract or
compare) is not UB. The pointers must be from the same memory buffer
allocation, including the one-element-after end pointer.

> The option must be combined with [...] -fsanitize=address [...] By
default the check is disabled at run time. To enable it, add
detect_invalid_pointer_pairs=2 to the environment variable ASAN_OPTIONS.
Using detect_invalid_pointer_pairs=1 detects invalid operation only when
both pointers are non-null.

https://gcc.gnu.org/onlinedocs/gcc-10.2.0/gcc/Instrumentation-Options.html

These were good news. Bad news are:
- The compiler flags and runtime options "work" with clang but these checks
are not officially supported. E.g. see
https://bugs.llvm.org/show_bug.cgi?id=47626.
- Although these are supported by GCC, I have encountered multiple issues
with them in the latest GCC  10. However, most (if not all) issues have
been resolved and should be shipped with GCC 10.3 or GCC 11.
  - instrumentation may break constexpr evaluation:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97145
  - these checks report valid issues in std::vector and std::stringbuf
implementations in libstdc++:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97659,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97415
  - libasan crashes when the detect_stack_use_after_return runtime check is
also enabled: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97414

Despite the above issues, these checks have found some bugs in my projects.
I consider them useful if you develop in UB-free pedantic mode.

Have fun,
Paweł Bylica


On Sat, Mar 6, 2021 at 10:56 PM Demi M. Obenour via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> On 1/14/21 7:52 PM, Johannes Doerfert via cfe-dev wrote:
> > We would/should not exploit UB in such a case, at least not in the shown
> > example. The pointer computation might yield `poison` but that is it.
> > If you'd use the pointer afterwards, the situation is different though.
> >
> > ~ Johannes
>
> The problem is the following pattern in C:
>
> ```
> #include <stdlib.h>
> #include <stdint.h>
> #include <arpa/inet.h>
> #include <string.h>
> #include <stdbool.h>
>
> /* Struct for parsing results */
> typedef struct some_struct some_struct;
> /* Stream of untrusted data */
> typedef struct untrusted_stream untrusted_stream;
>
> /* Parse untrusted data into `output`.  Returns false on error */
> bool parse(untrusted_stream *stream, some_struct *output);
>
> /*
>  * Read `len` bytes from `stream` into `ptr`.  Returns false if it cannot
> read
>  * the full amount.
>  */
> bool read_data(untrusted_stream *stream, void *ptr, uint32_t len);
>
> /* Maximum input length */
> static const uint32_t MAX_INPUT = 1024 * 1024;
>
> static bool extract_be_u32(const uint8_t *ptr, const uint8_t *end,
> uint32_t *out);
>
> /**
>  * Parse some untrusted data of length `len`.  We are guaranteed that
> `ptr` points
>  * to `len` bytes of data.
>  * @param ptr Pointer to `len` bytes of untrusted data.
>  * @param len The length of the untrusted data.
>  */
> bool process_untrusted_input(const uint8_t *ptr, const size_t len,
> some_struct *bar) {
>    const uint8_t *end = ptr + len; /* end is now one-past-the-end */
>    uint32_t foo;
>    /* some parsing code */
>    if (!extract_be_u32(ptr, end, &foo))
>       return false;
>    foo = ntohl(foo);
>    /* more parsing code follows that uses foo */
>    return true;
> }
>
> bool extract_be_u32(const uint8_t *ptr, const uint8_t *end, uint32_t *out)
> {
>    uint32_t res;
>    /* this is the bug: it should be `end - ptr < sizeof res` */
>    if (ptr + sizeof res > end)
>       return false;
>    memcpy(&res, ptr, sizeof res);
>    *out = ntohl(res);
>    return true;
> }
>
> /* Parse some data from `stream` into `output`.  The data is not trusted.
> */
> bool parse(untrusted_stream *stream, some_struct *output) {
>    uint32_t len;
>    if (!read_data(stream, &len, sizeof len))
>       return false;
>    len = ntohl(len);
>    /* Avoid DoS attacks */
>    if (len > MAX_INPUT)
>       return false;
>    uint8_t *ptr = malloc(len);
>    if (!ptr || !read_data(stream, ptr, len))
>       return false;
>    bool res = process_untrusted_input(ptr, len, output);
>    free(ptr);
>    return res;
> }
> ```
>
> This code has undefined behavior if `process_untrusted_input` is passed
> less
> than 4 bytes of input, which an attacker can often arrange to happen.
> However,
> I know of no way to detect this, or even test for it.  Sanitizers and
> valgrind
> won’t catch it, since the out-of-bounds pointer is not dereferenced.  And
> the
> amount by which the pointer is out of bounds is bounded, so it won’t wrap
> in practice.  And if Clang optimized away the bounds check, the result is a
> security vulnerability.
>
> This isn’t just theoretical; I have seen similar bugs in the wild.
> Fortunately, I haven’t seen any cases of them being miscompiled.
>
> Will Clang miscompile code like this?  Even though it is undefined
> behavior,
> it happens often enough that I would prefer if Clang made it defined as
> an extension.  Dereferencing the out-of-bounds pointer would still be UB,
> of
> course.  I believe there is precedent for this, in that type-punning via
> unions
> is undefined behavior in C++ but (to my knowledge) works on both GCC and
> Clang.
> Also, GCC supports `-fwrapv-pointer`.
>
> Sincerely,
>
> Demi
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20210308/c1d1cccf/attachment.html>


More information about the cfe-dev mailing list