[cfe-dev] Detecting undefined pointer arithmetic

Demi M. Obenour via cfe-dev cfe-dev at lists.llvm.org
Sat Mar 6 13:56:02 PST 2021

On 1/14/21 7:52 PM, Johannes Doerfert via cfe-dev wrote:
> We would/should not exploit UB in such a case, at least not in the shown
> example. The pointer computation might yield `poison` but that is it.
> If you'd use the pointer afterwards, the situation is different though.
> ~ Johannes

The problem is the following pattern in C:

#include <stdlib.h>
#include <stdint.h>
#include <arpa/inet.h>
#include <string.h>
#include <stdbool.h>

/* Struct for parsing results */
typedef struct some_struct some_struct;
/* Stream of untrusted data */
typedef struct untrusted_stream untrusted_stream;

/* Parse untrusted data into `output`.  Returns false on error */
bool parse(untrusted_stream *stream, some_struct *output);

 * Read `len` bytes from `stream` into `ptr`.  Returns false if it cannot read
 * the full amount.
bool read_data(untrusted_stream *stream, void *ptr, uint32_t len);

/* Maximum input length */
static const uint32_t MAX_INPUT = 1024 * 1024;

static bool extract_be_u32(const uint8_t *ptr, const uint8_t *end, uint32_t *out);

 * Parse some untrusted data of length `len`.  We are guaranteed that `ptr` points
 * to `len` bytes of data.
 * @param ptr Pointer to `len` bytes of untrusted data.
 * @param len The length of the untrusted data.
bool process_untrusted_input(const uint8_t *ptr, const size_t len, some_struct *bar) {
   const uint8_t *end = ptr + len; /* end is now one-past-the-end */
   uint32_t foo;
   /* some parsing code */
   if (!extract_be_u32(ptr, end, &foo))
      return false;
   foo = ntohl(foo);
   /* more parsing code follows that uses foo */
   return true;

bool extract_be_u32(const uint8_t *ptr, const uint8_t *end, uint32_t *out) {
   uint32_t res;
   /* this is the bug: it should be `end - ptr < sizeof res` */
   if (ptr + sizeof res > end)
      return false;
   memcpy(&res, ptr, sizeof res);
   *out = ntohl(res);
   return true;

/* Parse some data from `stream` into `output`.  The data is not trusted. */
bool parse(untrusted_stream *stream, some_struct *output) {
   uint32_t len;
   if (!read_data(stream, &len, sizeof len))
      return false;
   len = ntohl(len);
   /* Avoid DoS attacks */
   if (len > MAX_INPUT)
      return false;
   uint8_t *ptr = malloc(len);
   if (!ptr || !read_data(stream, ptr, len))
      return false;
   bool res = process_untrusted_input(ptr, len, output);
   return res;

This code has undefined behavior if `process_untrusted_input` is passed less
than 4 bytes of input, which an attacker can often arrange to happen.  However,
I know of no way to detect this, or even test for it.  Sanitizers and valgrind
won’t catch it, since the out-of-bounds pointer is not dereferenced.  And the
amount by which the pointer is out of bounds is bounded, so it won’t wrap
in practice.  And if Clang optimized away the bounds check, the result is a
security vulnerability.

This isn’t just theoretical; I have seen similar bugs in the wild.
Fortunately, I haven’t seen any cases of them being miscompiled.

Will Clang miscompile code like this?  Even though it is undefined behavior,
it happens often enough that I would prefer if Clang made it defined as
an extension.  Dereferencing the out-of-bounds pointer would still be UB, of
course.  I believe there is precedent for this, in that type-punning via unions
is undefined behavior in C++ but (to my knowledge) works on both GCC and Clang.
Also, GCC supports `-fwrapv-pointer`.



-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20210306/f956a776/attachment.sig>

More information about the cfe-dev mailing list