[clang] [BoundsSafety] Initial documentation for -fbounds-safety (PR #70749)

Yeoul Na via cfe-commits cfe-commits at lists.llvm.org
Mon Dec 11 20:53:42 PST 2023


================
@@ -0,0 +1,362 @@
+==================================================
+``-fbounds-safety``: Enforcing bounds safety for C
+==================================================
+
+.. contents::
+   :local:
+
+Overview
+========
+
+``-fbounds-safety`` is a C extension to enforce bounds safety to prevent out-of-bounds (OOB) memory accesses, which remain a major source of security vulnerabilities in C. ``-fbounds-safety`` aims to eliminate this class of bugs by turning OOB accesses into deterministic traps.
+
+The ``-fbounds-safety`` extension offers bounds annotations that programmers can use to attach bounds to pointers. For example, programmers can add the ``__counted_by(N)`` annotation to parameter ``ptr``, indicating that the pointer has ``N`` valid elements:
+
+.. code-block:: c
+
+   void foo(int *__counted_by(N) ptr, size_t N);
+
+Using this bounds information, the compiler inserts bounds checks on every pointer dereference, ensuring that the program does not access memory outside the specified bounds. The compiler requires programmers to provide enough bounds information so that the accesses can be checked at either run time or compile time — and it rejects code if it cannot.
+
+The most important contribution of ``-fbounds-safety`` is how it reduces the programmer’s annotation burden by reconciling bounds annotations at ABI boundaries with the use of implicit wide pointers (a.k.a. “fat” pointers) that carry bounds information on local variables without the need for annotations. We designed this model so that it preserves ABI compatibility with C while minimizing adoption effort.
+
+The ``-fbounds-safety`` extension has been adopted on millions of lines of production C code and proven to work in a consumer operating system setting. The extension was designed to enable incremental adoption — a key requirement in real-world settings where modifying an entire project and its dependencies all at once is often not possible. It also addresses multiple of other practical challenges that have made existing approaches to safer C dialects difficult to adopt, offering these properties that make it widely adoptable in practice:
+
+* It is designed to preserve the Application Binary Interface (ABI).
+* It interoperates well with plain C code.
+* It can be adopted partially and incrementally while still providing safety benefits.
+* It is syntactically and semantically compatible with C.
+* Consequently, source code that adopts the extension can continue to be compiled by toolchains that do not support the extension.
+* It has a relatively low adoption cost.
+* It can be implemented on top of Clang.
+
+This document discusses the key designs of ``-fbounds-safety``. The document is subject to be actively updated with a more detailed specification. The implementation plan can be found in `Implementation plans for -fbounds-safety <BoundsSafetyImplPlans.rst>`_.
+
+Programming Model
+=================
+
+Overview
+--------
+
+``-fbounds-safety`` ensures that pointers are not used to access memory beyond their bounds by performing bounds checking. If a bounds check fails, the program will deterministically trap before out-of-bounds memory is accessed.
+
+In our model, every pointer has an explicit or implicit bounds attribute that determines its bounds and ensures guaranteed bounds checking. Consider the example below where the ``__counted_by(count)`` annotation indicates that parameter ``p`` points to a buffer of integers containing ``count`` elements. An off-by-one error is present in the loop condition, leading to ``p[i]`` being out-of-bounds access during the loop’s final iteration. The compiler inserts a bounds check before ``p`` is dereferenced to ensure that the access remains within the specified bounds.
+
+.. code-block:: c
+
+   void fill_array_with_indices(int *__counted_by(count) p, unsigned count) {
+   // off-by-one error (i < count)
+      for (unsigned i = 0; i <= count; ++i) {
+         // bounds check inserted:
+         //   if (i >= count) trap();
+         p[i] = i;
+      }
+   }
+
+A bounds annotation defines an invariant for the pointer type, and the model ensures that this invariant remains true. In the example below, pointer ``p`` annotated with ``__counted_by(count)`` must always point to a memory buffer containing at least ``count`` elements of the pointee type. Increasing the value of ``count``, like in the example below, would violate this invariant and permit out-of-bounds access to the pointer. To avoid this, the compiler emits either a compile-time error or a run-time trap. Section `Maintaining correctness of bounds annotations`_ provides more details about the programming model.
+
+.. code-block:: c
+
+   void foo(int *__counted_by(count) p, size_t count) {
+      count++; // violates the invariant of __counted_by
+   }
+
+The requirement to annotate all pointers with explicit bounds information could present a significant adoption burden. To tackle this issue, the model incorporates the concept of a “wide pointer” (a.k.a. fat pointer) – a larger pointer that carries bounds information alongside the pointer value. Utilizing wide pointers can potentially reduce the adoption burden, as it contains bounds information internally and eliminates the need for explicit bounds annotations. However, wide pointers differ from standard C pointers in their data layout, which may result in incompatibilities with the application binary interface (ABI). Breaking the ABI complicates interoperability with external code that has not adopted the same programming model.
+
+``-fbounds-safety`` harmonizes the wide pointer and the bounds annotation approaches to reduce the adoption burden while maintaining the ABI. In this model, local variables of pointer type are implicitly treated as wide pointers, allowing them to carry bounds information without requiring explicit bounds annotations. This approach does not impact the ABI, as local variables are hidden from the ABI. Pointers associated with any other variables are treated as single object pointers (i.e., ``__single``), ensuring that they always have the tightest bounds by default and offering a strong bounds safety guarantee.
----------------
rapidsna wrote:

There are multiple things going on here.

Firstly, function parameters are on the ABI surface, so they are `__single` by default (not `__bidi_indexable`) unless otherwise annotated. I'll clarify local variables are implicitly wide pointers, but function parameters are not.

Secondly, the type of `p` is `int *__counted_by(count)` which is dependent on `count`, another parameter of `func`. Applying this exact type on another variable will create another dependency with the parameter `count`, which is undesirable in most cases. Function `hmm` in this example, doesn't have `count` to use for `typeof` and even if it did, it's not the same `count` on `typeof(p)`. Thus, the compiler will report an error for using `typeof` on type with any dependent variable. On the other hand, `typeof(q)` where `q` is of type `int *__counted_by(4)` will be allowed as the type doesn't depend on another variable.

Therefore, it behaves as the following.

```C
void func(int *__counted_by(count) p, size_t count) {
  extern void hmm(typeof(p)); // error: `typeof` on a type with a dependent variable
}

void func(int *p, size_t count) { // `p` is a parameter so it's `__single` by default
  extern void hmm(typeof(p)); // creates an interface of type `extern void hmm(int *__single);`
}

void func(int *__counted_by(4) p) { // `p` is annotated with `__counted_by(4) so it's not a wide pointer
  extern void hmm(typeof(p)); // creates an interface of type `extern void hmm(int *__counted_by(4));`
}
```

As you pointed out, however, a local variable isn't totally hidden from the ABI when we consider a non-parameter local variable here:

```C
void func(void) {
  int *p; // implicitly `int *__bidi_indexable p`
  extern void hmm(typeof(p)); // creates an interface of type `void hmm(int *__bidi_indexable)`
}
```

And this is problematic because at the definition of function `hmm`, the parameter will be `__single` by default as any other parameters.

I think we should report a warning when `typeof` is used on an interface with an implicitly wide pointer to avoid silently breaking the ABI.

Section "ABI implications of default bounds annotations" is trying to discuss how `-fbounds-safety` tries to avoid similar issues. I'll update the section to discuss this case.

https://github.com/llvm/llvm-project/pull/70749


More information about the cfe-commits mailing list