[cfe-dev] Language extension: __attribute__((bounds))/__attribute__((__bounded__))

Török Edwin edwintorok at gmail.com
Wed Sep 23 11:59:42 PDT 2009


Hi,

I started working on adding a new attribute to Clang, to annotate
pointers with bounds information,
before going further I'd like to hear your opinion whether this is
acceptable for clang (or if we can
modify this proposal for it to be acceptable).

Given a pointer field in a struct, or a pointer parameter, the proposed
__attribute__ would tell
the compiler which field/parameter holds the bounds of the buffer pointer.

This is not a completely new attribute, in fact OpenBSD has had
something similar for GCC,
but only for function parameters [1].

Some possible use cases for this attribute:
 - emit a warning when a buffer pointer is passed to a function if the
length argument is larger than the
actual length of the buffer
 - use this attribute in the clang static analyzer to do bounds-checks
 - use this attribute to do static analysis in LLVM
 - use clang to compile a subset of C, with this attribute, where the
bounds of all pointers are known,
and reject programs that are not part of this subset

OpenBSD's attribute gives warnings only when the parameters are
constant, but I'd like to take it further,
and also warn/error when the length is stored in a struct, later read
and passed as parameter to a function.

For now I implemented the sema+cg part for the bounds attribute, small
example attached at end of this mail [2]

Since my attribute will do more/other than the OpenBSD attribute, to
avoid clashes, I named my attribute 'bounds' (suggestions welcome), but
I intend to support BSD's __bounded__ attribute too!
Also the __bounded__ attribute uses indexes, rather than names, which I
consider error prone.

Proposed semantics:
struct foo {
  unsigned n;
  unsigned *x __attribute__((bounds(n)));
};

This will declare that x is an array of 'n' elements, where 'n' is the
field of the same struct.
NULL pointers will have size 0 (and so will buffers of size 0 which may
or may not be null).
It will be enforced at 2 places:
 - storing values: 'n' must always be accurate and reflect the current
size of 'x' (which means store order matters), with one exception: if
the 'struct foo' variable, or pointer to it is only visible in current
function, and doesn't escape, it is allowed to store in any order.
 - dereferencing x: the index must always be in the range [0, n)
 - check that 'n' really is the size of the allocation
 - if the size is unknown, it will warn/emit error too

Note that these checks don't have to be done in clang necesarely, they
can be done in LLVM (in fact I'd prefer, it is easier
to do dataflow analysis there).

This attribute should work for pointers to constant size arrays,  VLAs
[3], pointers coming from malloc (and malloc-like wrappers).

For function parameters the semantics is as described in [1], except
checks will be done for non-constant sized buffers too.

The emitted LLVM code will use metadata to capture this information, see
[2].

[1]
http://www.openbsd.org/cgi-bin/man.cgi?query=gcc-local&apropos=0&sektion=0&manpath=OpenBSD+Current&arch=i386&format=html
[2]
--------- Sema example
int n;
int *x __attribute__((bounds(n))); // expected-error {{'bounds'
attribute applies to pointer fields only}}
int z __attribute__((bounds(n))); // expected-error {{'bounds' attribute
applies to pointer fields only}}
unsigned o;

struct foo {
    unsigned n;
    int ns;// expected-note {{declared at}}
    int *a __attribute__((bounds(1))); // expected-error {{attribute
requires unquoted parameter}}
    int *b __attribute__((bounds(n, 1))); // expected-error {{attribute
requires 0 argument(s)}}
    int *c __attribute__((bounds(m))); // expected-error {{use of
undeclared identifier 'm'}}
    int *d __attribute__((bounds(o))); // expected-error {{use of
undeclared identifier 'o'}}
    int *e __attribute__((bounds(ns))); // expected-error {{'bounds'
attribute requires parameter 1 to be the name of an unsigned field}}
};

struct foo foos;
-------------------- Codegen example
struct foo {
    unsigned n;
    int *b __attribute__((bounds(n)));
};

struct foo foos;
// CHECK: !llvm.boundsinfo.foo = !{!0}
// CHECK: !0 = metadata !{i32 1, i32 0}

[3] not sure about VLAs, a too large VLA could exceed stack size and
crash the program

Best regards,
--Edwin



More information about the cfe-dev mailing list