[clang] [BoundsSafety] Initial documentation for -fbounds-safety (PR #70749)

Aaron Ballman via cfe-commits cfe-commits at lists.llvm.org
Wed Dec 6 11:36:55 PST 2023


================
@@ -0,0 +1,844 @@
+==================================================
+``-fbounds-safety``: Enforcing bounds safety for C
+==================================================
+
+.. contents::
+   :local:
+
+Overview
+========
+
+``-fbounds-safety`` is a C extension to enforce bounds safety to prevent
+out-of-bounds (OOB) memory accesses, which remain a major source of security
+vulnerabilities in C. ``-fbounds-safety`` aims to eliminate this class of bugs
+by turning OOB accesses into deterministic traps.
+
+The ``-fbounds-safety`` extension offers bounds annotations that programmers can
+use to attach bounds to pointers. For example, programmers can add the
+``__counted_by(N)`` annotation to parameter ``ptr``, indicating that the pointer
+has ``N`` valid elements:
+
+.. code-block:: c
+
+   void foo(int *__counted_by(N) ptr, size_t N);
+
+Using this bounds information, the compiler inserts bounds checks on every
+pointer dereference, ensuring that the program does not access memory outside
+the specified bounds. The compiler requires programmers to provide enough bounds
+information so that the accesses can be checked at either run time or compile
+time — and it rejects code if it cannot.
+
+The most important contribution of ``-fbounds-safety`` is how it reduces the
+programmer’s annotation burden by reconciling bounds annotations at ABI
+boundaries with the use of implicit wide pointers (a.k.a. “fat” pointers) that
+carry bounds information on local variables without the need for annotations. We
+designed this model so that it preserves ABI compatibility with C while
+minimizing adoption effort.
+
+The ``-fbounds-safety`` extension has been adopted on millions of lines of
+production C code and proven to work in a consumer operating system setting. The
+extension was designed to enable incremental adoption — a key requirement in
+real-world settings where modifying an entire project and its dependencies all
+at once is often not possible. It also addresses multiple of other practical
+challenges that have made existing approaches to safer C dialects difficult to
+adopt, offering these properties that make it widely adoptable in practice:
+
+* It is designed to preserve the Application Binary Interface (ABI).
+* It interoperates well with plain C code.
+* It can be adopted partially and incrementally while still providing safety
+  benefits.
+* It is a conforming extension to C.
+* Consequently, source code that adopts the extension can continue to be
+  compiled by toolchains that do not support the extension (CAVEAT: this still
+  requires inclusion of a header file micro-defining bounds annotations to
+  empty).
+* It has a relatively low adoption cost.
+
+This document discusses the key designs of ``-fbounds-safety``. The document is
+subject to be actively updated with a more detailed specification. The
+implementation plan can be found in Implementation plans for -fbounds-safety.
+
+.. Cross reference doesn't currently work
+   `Implementation plans for -fbounds-safety <BoundsSafetyImplPlans.rst>`_.
+
+Programming Model
+=================
+
+Overview
+--------
+
+``-fbounds-safety`` ensures that pointers are not used to access memory beyond
+their bounds by performing bounds checking. If a bounds check fails, the program
+will deterministically trap before out-of-bounds memory is accessed.
+
+In our model, every pointer has an explicit or implicit bounds attribute that
+determines its bounds and ensures guaranteed bounds checking. Consider the
+example below where the ``__counted_by(count)`` annotation indicates that
+parameter ``p`` points to a buffer of integers containing ``count`` elements. An
+off-by-one error is present in the loop condition, leading to ``p[i]`` being
+out-of-bounds access during the loop’s final iteration. The compiler inserts a
+bounds check before ``p`` is dereferenced to ensure that the access remains
+within the specified bounds.
+
+.. code-block:: c
+
+   void fill_array_with_indices(int *__counted_by(count) p, unsigned count) {
+      // off-by-one error (i < count)
+      for (unsigned i = 0; i <= count; ++i) {
+         // bounds check inserted:
+         //   if (i >= count) trap();
+         p[i] = i;
+      }
+   }
+
+A bounds annotation defines an invariant for the pointer type, and the model
+ensures that this invariant remains true. In the example below, pointer ``p``
+annotated with ``__counted_by(count)`` must always point to a memory buffer
+containing at least ``count`` elements of the pointee type. Changing the value
+of ``count``, like in the example below, may violate this invariant and permit
+out-of-bounds access to the pointer. To avoid this, the compiler employs
+compile-time restrictions and emits run-time checks as necessary to ensure the
+new count value doesn't exceed the actual length of the buffer. Section
+`Maintaining correctness of bounds annotations`_ provides more details about
+this programming model.
+
+.. code-block:: c
+
+   int g;
+
+   void foo(int *__counted_by(count) p, size_t count) {
+      count++; // may violate the invariant of __counted_by
+      count--; // may violate the invariant of __counted_by if count was 0.
+      count = g; // may violate the invariant of __counted_by
+                 // depending on the value of `g`.
+   }
+
+The requirement to annotate all pointers with explicit bounds information could
+present a significant adoption burden. To tackle this issue, the model
+incorporates the concept of a “wide pointer” (a.k.a. fat pointer) – a larger
+pointer that carries bounds information alongside the pointer value. Utilizing
+wide pointers can potentially reduce the adoption burden, as it contains bounds
+information internally and eliminates the need for explicit bounds annotations.
+However, wide pointers differ from standard C pointers in their data layout,
+which may result in incompatibilities with the application binary interface
+(ABI). Breaking the ABI complicates interoperability with external code that has
+not adopted the same programming model.
+
+``-fbounds-safety`` harmonizes the wide pointer and the bounds annotation
+approaches to reduce the adoption burden while maintaining the ABI. In this
+model, local variables of pointer type are implicitly treated as wide pointers,
+allowing them to carry bounds information without requiring explicit bounds
+annotations. This approach does not impact the ABI, as local variables are
+hidden from the ABI. Pointers associated with any other variables are treated as
+single object pointers (i.e., ``__single``), ensuring that they always have the
+tightest bounds by default and offering a strong bounds safety guarantee.
+
+By implementing default bounds annotations based on ABI visibility, a
+considerable portion of C code can operate without modifications within this
+programming model, reducing the adoption burden.
+
+The rest of the section will discuss individual bounds annotations and the
+programming model in more detail.
+
+Bounds annotations
+------------------
+
+Annotation for pointers to a single object
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The C language allows pointer arithmetic on arbitrary pointers and this has been
+a source of many bounds safety issues. In practice, many pointers are merely
+pointing to a single object and incrementing or decrementing such a pointer
+immediately makes the pointer go out-of-bounds. To prevent this unsafety,
+``-fbounds-safety`` provides the annotation ``__single`` that causes pointer
+arithmetic on annotated pointers to be a compile time error.
+
+* ``__single`` : indicates that the pointer is either pointing to a single
+  object or null. Hence, pointers with ``__single`` do not permit pointer
+  arithmetic nor being subscripted with a non-zero index. Dereferencing a
+  ``__single`` pointer is allowed but it requires a null check. Upper and lower
+  bounds checks are not required because the ``__single`` pointer should point
+  to a valid object unless it’s null.
+
+We use ``__single`` as the default annotation for ABI-visible pointers. This
----------------
AaronBallman wrote:

```suggestion
``__single`` is the default annotation for ABI-visible pointers. This
```

https://github.com/llvm/llvm-project/pull/70749


More information about the cfe-commits mailing list