[clang] dda2ce8 - [BoundsSafety] Initial documentation for -fbounds-safety (#70749)

via cfe-commits cfe-commits at lists.llvm.org
Tue Jan 16 13:40:50 PST 2024


Author: Yeoul Na
Date: 2024-01-16T13:40:45-08:00
New Revision: dda2ce82c2ec8a3b988636c6cebb76eb32b4de05

URL: https://github.com/llvm/llvm-project/commit/dda2ce82c2ec8a3b988636c6cebb76eb32b4de05
DIFF: https://github.com/llvm/llvm-project/commit/dda2ce82c2ec8a3b988636c6cebb76eb32b4de05.diff

LOG: [BoundsSafety] Initial documentation for -fbounds-safety (#70749)

The document is mostly the exact copy of RFC: Enforcing Bounds Safety in
C, except some clarifications made over the PR review:

https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854

Further changes and clarifications for the programming model will be
done as separate patches to make it easier to track history of changes.

Added: 
    clang/docs/BoundsSafety.rst
    clang/docs/BoundsSafetyImplPlans.rst

Modified: 
    clang/docs/index.rst

Removed: 
    


################################################################################
diff  --git a/clang/docs/BoundsSafety.rst b/clang/docs/BoundsSafety.rst
new file mode 100644
index 000000000000000..f1837675ec9bf88
--- /dev/null
+++ b/clang/docs/BoundsSafety.rst
@@ -0,0 +1,998 @@
+==================================================
+``-fbounds-safety``: Enforcing bounds safety for C
+==================================================
+
+.. contents::
+   :local:
+
+Overview
+========
+
+``-fbounds-safety`` is a C extension to enforce bounds safety to prevent
+out-of-bounds (OOB) memory accesses, which remain a major source of security
+vulnerabilities in C. ``-fbounds-safety`` aims to eliminate this class of bugs
+by turning OOB accesses into deterministic traps.
+
+The ``-fbounds-safety`` extension offers bounds annotations that programmers can
+use to attach bounds to pointers. For example, programmers can add the
+``__counted_by(N)`` annotation to parameter ``ptr``, indicating that the pointer
+has ``N`` valid elements:
+
+.. code-block:: c
+
+   void foo(int *__counted_by(N) ptr, size_t N);
+
+Using this bounds information, the compiler inserts bounds checks on every
+pointer dereference, ensuring that the program does not access memory outside
+the specified bounds. The compiler requires programmers to provide enough bounds
+information so that the accesses can be checked at either run time or compile
+time — and it rejects code if it cannot.
+
+The most important contribution of ``-fbounds-safety`` is how it reduces the
+programmer's annotation burden by reconciling bounds annotations at ABI
+boundaries with the use of implicit wide pointers (a.k.a. "fat" pointers) that
+carry bounds information on local variables without the need for annotations. We
+designed this model so that it preserves ABI compatibility with C while
+minimizing adoption effort.
+
+The ``-fbounds-safety`` extension has been adopted on millions of lines of
+production C code and proven to work in a consumer operating system setting. The
+extension was designed to enable incremental adoption — a key requirement in
+real-world settings where modifying an entire project and its dependencies all
+at once is often not possible. It also addresses multiple of other practical
+challenges that have made existing approaches to safer C dialects 
diff icult to
+adopt, offering these properties that make it widely adoptable in practice:
+
+* It is designed to preserve the Application Binary Interface (ABI).
+* It interoperates well with plain C code.
+* It can be adopted partially and incrementally while still providing safety
+  benefits.
+* It is a conforming extension to C.
+* Consequently, source code that adopts the extension can continue to be
+  compiled by toolchains that do not support the extension (CAVEAT: this still
+  requires inclusion of a header file macro-defining bounds annotations to
+  empty).
+* It has a relatively low adoption cost.
+
+This document discusses the key designs of ``-fbounds-safety``. The document is
+subject to be actively updated with a more detailed specification. The
+implementation plan can be found in :doc:`BoundsSafetyImplPlans`.
+
+
+Programming Model
+=================
+
+Overview
+--------
+
+``-fbounds-safety`` ensures that pointers are not used to access memory beyond
+their bounds by performing bounds checking. If a bounds check fails, the program
+will deterministically trap before out-of-bounds memory is accessed.
+
+In our model, every pointer has an explicit or implicit bounds attribute that
+determines its bounds and ensures guaranteed bounds checking. Consider the
+example below where the ``__counted_by(count)`` annotation indicates that
+parameter ``p`` points to a buffer of integers containing ``count`` elements. An
+off-by-one error is present in the loop condition, leading to ``p[i]`` being
+out-of-bounds access during the loop's final iteration. The compiler inserts a
+bounds check before ``p`` is dereferenced to ensure that the access remains
+within the specified bounds.
+
+.. code-block:: c
+
+   void fill_array_with_indices(int *__counted_by(count) p, unsigned count) {
+      // off-by-one error (i < count)
+      for (unsigned i = 0; i <= count; ++i) {
+         // bounds check inserted:
+         //   if (i >= count) trap();
+         p[i] = i;
+      }
+   }
+
+A bounds annotation defines an invariant for the pointer type, and the model
+ensures that this invariant remains true. In the example below, pointer ``p``
+annotated with ``__counted_by(count)`` must always point to a memory buffer
+containing at least ``count`` elements of the pointee type. Changing the value
+of ``count``, like in the example below, may violate this invariant and permit
+out-of-bounds access to the pointer. To avoid this, the compiler employs
+compile-time restrictions and emits run-time checks as necessary to ensure the
+new count value doesn't exceed the actual length of the buffer. Section
+`Maintaining correctness of bounds annotations`_ provides more details about
+this programming model.
+
+.. code-block:: c
+
+   int g;
+
+   void foo(int *__counted_by(count) p, size_t count) {
+      count++; // may violate the invariant of __counted_by
+      count--; // may violate the invariant of __counted_by if count was 0.
+      count = g; // may violate the invariant of __counted_by
+                 // depending on the value of `g`.
+   }
+
+The requirement to annotate all pointers with explicit bounds information could
+present a significant adoption burden. To tackle this issue, the model
+incorporates the concept of a "wide pointer" (a.k.a. fat pointer) – a larger
+pointer that carries bounds information alongside the pointer value. Utilizing
+wide pointers can potentially reduce the adoption burden, as it contains bounds
+information internally and eliminates the need for explicit bounds annotations.
+However, wide pointers 
diff er from standard C pointers in their data layout,
+which may result in incompatibilities with the application binary interface
+(ABI). Breaking the ABI complicates interoperability with external code that has
+not adopted the same programming model.
+
+``-fbounds-safety`` harmonizes the wide pointer and the bounds annotation
+approaches to reduce the adoption burden while maintaining the ABI. In this
+model, local variables of pointer type are implicitly treated as wide pointers,
+allowing them to carry bounds information without requiring explicit bounds
+annotations. Please note that this approach doesn't apply to function parameters
+which are considered ABI-visible. As local variables are typically hidden from
+the ABI, this approach has a marginal impact on it. In addition,
+``-fbounds-safety`` employs compile-time restrictions to prevent implicit wide
+pointers from silently breaking the ABI (see `ABI implications of default bounds
+annotations`_). Pointers associated with any other variables, including function
+parameters, are treated as single object pointers (i.e., ``__single``), ensuring
+that they always have the tightest bounds by default and offering a strong
+bounds safety guarantee.
+
+By implementing default bounds annotations based on ABI visibility, a
+considerable portion of C code can operate without modifications within this
+programming model, reducing the adoption burden.
+
+The rest of the section will discuss individual bounds annotations and the
+programming model in more detail.
+
+Bounds annotations
+------------------
+
+Annotation for pointers to a single object
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The C language allows pointer arithmetic on arbitrary pointers and this has been
+a source of many bounds safety issues. In practice, many pointers are merely
+pointing to a single object and incrementing or decrementing such a pointer
+immediately makes the pointer go out-of-bounds. To prevent this unsafety,
+``-fbounds-safety`` provides the annotation ``__single`` that causes pointer
+arithmetic on annotated pointers to be a compile time error.
+
+* ``__single`` : indicates that the pointer is either pointing to a single
+  object or null. Hence, pointers with ``__single`` do not permit pointer
+  arithmetic nor being subscripted with a non-zero index. Dereferencing a
+  ``__single`` pointer is allowed but it requires a null check. Upper and lower
+  bounds checks are not required because the ``__single`` pointer should point
+  to a valid object unless it's null.
+
+``__single`` is the default annotation for ABI-visible pointers. This
+gives strong security guarantees in that these pointers cannot be incremented or
+decremented unless they have an explicit, overriding bounds annotation that can
+be used to verify the safety of the operation. The compiler issues an error when
+a ``__single`` pointer is utilized for pointer arithmetic or array access, as
+these operations would immediately cause the pointer to exceed its bounds.
+Consequently, this prompts programmers to provide sufficient bounds information
+to pointers. In the following example, the pointer on parameter p is
+single-by-default, and is employed for array access. As a result, the compiler
+generates an error suggesting to add ``__counted_by`` to the pointer.
+
+.. code-block:: c
+
+   void fill_array_with_indices(int *p, unsigned count) {
+      for (unsigned i = 0; i < count; ++i) {
+         p[i] = i; // error
+      }
+   }
+
+
+External bounds annotations
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+"External" bounds annotations provide a way to express a relationship between a
+pointer variable and another variable (or expression) containing the bounds
+information of the pointer. In the following example, ``__counted_by(count)``
+annotation expresses the bounds of parameter p using another parameter count.
+This model works naturally with many C interfaces and structs because the bounds
+of a pointer is often available adjacent to the pointer itself, e.g., at another
+parameter of the same function prototype, or at another field of the same struct
+declaration.
+
+.. code-block:: c
+
+   void fill_array_with_indices(int *__counted_by(count) p, size_t count) {
+      // off-by-one error
+      for (size_t i = 0; i <= count; ++i)
+         p[i] = i;
+   }
+
+External bounds annotations include ``__counted_by``, ``__sized_by``, and
+``__ended_by``. These annotations do not change the pointer representation,
+meaning they do not have ABI implications.
+
+* ``__counted_by(N)`` : The pointer points to memory that contains ``N``
+  elements of pointee type. ``N`` is an expression of integer type which can be
+  a simple reference to declaration, a constant including calls to constant
+  functions, or an arithmetic expression that does not have side effect. The
+  ``__counted_by`` annotation cannot apply to pointers to incomplete types or
+  types without size such as ``void *``. Instead, ``__sized_by`` can be used to
+  describe the byte count.
+* ``__sized_by(N)`` : The pointer points to memory that contains ``N`` bytes.
+  Just like the argument of ``__counted_by``, ``N`` is an expression of integer
+  type which can be a constant, a simple reference to a declaration, or an
+  arithmetic expression that does not have side effects. This is mainly used for
+  pointers to incomplete types or types without size such as ``void *``.
+* ``__ended_by(P)`` : The pointer has the upper bound of value ``P``, which is
+  one past the last element of the pointer. In other words, this annotation
+  describes a range that starts with the pointer that has this annotation and
+  ends with ``P`` which is the argument of the annotation. ``P`` itself may be
+  annotated with ``__ended_by(Q)``. In this case, the end of the range extends
+  to the pointer ``Q``. This is used for "iterator" support in C where you're
+  iterating from one pointer value to another until a final pointer value is
+  reached (and the final pointer value is not dereferencable).
+
+Accessing a pointer outside the specified bounds causes a run-time trap or a
+compile-time error. Also, the model maintains correctness of bounds annotations
+when the pointer and/or the related value containing the bounds information are
+updated or passed as arguments. This is done by compile-time restrictions or
+run-time checks (see `Maintaining correctness of bounds annotations`_
+for more detail). For instance, initializing ``buf`` with ``null`` while
+assigning non-zero value to ``count``, as shown in the following example, would
+violate the ``__counted_by`` annotation because a null pointer does not point to
+any valid memory location. To avoid this, the compiler produces either a
+compile-time error or run-time trap.
+
+.. code-block:: c
+
+   void null_with_count_10(int *__counted_by(count) buf, unsigned count) {
+      buf = 0;
+      // This is not allowed as it creates a null pointer with non-zero length
+      count = 10;
+   }
+
+However, there are use cases where a pointer is either a null pointer or is
+pointing to memory of the specified size. To support this idiom,
+``-fbounds-safety`` provides ``*_or_null`` variants,
+``__counted_by_or_null(N)``, ``__sized_by_or_null(N)``, and
+``__ended_by_or_null(P)``. Accessing a pointer with any of these bounds
+annotations will require an extra null check to avoid a null pointer
+dereference.
+
+Internal bounds annotations
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+A wide pointer (sometimes known as a "fat" pointer) is a pointer that carries
+additional bounds information internally (as part of its data). The bounds
+require additional storage space making wide pointers larger than normal
+pointers, hence the name "wide pointer". The memory layout of a wide pointer is
+equivalent to a struct with the pointer, upper bound, and (optionally) lower
+bound as its fields as shown below.
+
+.. code-block:: c
+
+   struct wide_pointer_datalayout {
+      void* pointer; // Address used for dereferences and pointer arithmetic
+      void* upper_bound; // Points one past the highest address that can be
+                         // accessed
+      void* lower_bound; // (Optional) Points to lowest address that can be
+                         // accessed
+   };
+
+Even with this representational change, wide pointers act syntactically as
+normal pointers to allow standard pointer operations, such as pointer
+dereference (``*p``), array subscript (``p[i]``), member access (``p->``), and
+pointer arithmetic, with some restrictions on bounds-unsafe uses.
+
+``-fbounds-safety`` has a set of "internal" bounds annotations to turn pointers
+into wide pointers. These are ``__bidi_indexable`` and ``__indexable``. When a
+pointer has either of these annotations, the compiler changes the pointer to the
+corresponding wide pointer. This means these annotations will break the ABI and
+will not be compatible with plain C, and thus they should generally not be used
+in ABI surfaces.
+
+* ``__bidi_indexable`` : A pointer with this annotation becomes a wide pointer
+  to carry the upper bound and the lower bound, the layout of which is
+  equivalent to ``struct { T *ptr; T *upper_bound; T *lower_bound; };``. As the
+  name indicates, pointers with this annotation are "bidirectionally indexable",
+  meaning that they can be indexed with either a negative or a positive offset
+  and the pointers can be incremented or decremented using pointer arithmetic. A
+  ``__bidi_indexable`` pointer is allowed to hold an out-of-bounds pointer
+  value. While creating an OOB pointer is undefined behavior in C,
+  ``-fbounds-safety`` makes it well-defined behavior. That is, pointer
+  arithmetic overflow with ``__bidi_indexable`` is defined as equivalent of
+  two's complement integer computation, and at the LLVM IR level this means
+  ``getelementptr`` won't get ``inbounds`` keyword. Accessing memory using the
+  OOB pointer is prevented via a run-time bounds check.
+
+* ``__indexable`` : A pointer with this annotation becomes a wide pointer
+  carrying the upper bound (but no explicit lower bound), the layout of which is
+  equivalent to ``struct { T *ptr; T *upper_bound; };``. Since ``__indexable``
+  pointers do not have a separate lower bound, the pointer value itself acts as
+  the lower bound. An ``__indexable`` pointer can only be incremented or indexed
+  in the positive direction. Indexing it in the negative direction will trigger
+  a compile-time error. Otherwise, the compiler inserts a run-time
+  check to ensure pointer arithmetic doesn't make the pointer smaller than the
+  original ``__indexable`` pointer (Note that ``__indexable`` doesn't have a
+  lower bound so the pointer value is effectively the lower bound). As pointer
+  arithmetic overflow will make the pointer smaller than the original pointer,
+  it will cause a trap at runtime. Similar to ``__bidi_indexable``, an
+  ``__indexable`` pointer is allowed to have a pointer value above the upper
+  bound and creating such a pointer is well-defined behavior. Dereferencing such
+  a pointer, however, will cause a run-time trap.
+
+* ``__bidi_indexable`` offers the best flexibility out of all the pointer
+  annotations in this model, as ``__bidi_indexable`` pointers can be used for
+  any pointer operation. However, this comes with the largest code size and
+  memory cost out of the available pointer annotations in this model. In some
+  cases, use of the ``__bidi_indexable`` annotation may be duplicating bounds
+  information that exists elsewhere in the program. In such cases, using
+  external bounds annotations may be a better choice.
+
+``__bidi_indexable`` is the default annotation for non-ABI visible pointers,
+such as local pointer variables — that is, if the programmer does not specify
+another bounds annotation, a local pointer variable is implicitly
+``__bidi_indexable``. Since ``__bidi_indexable`` pointers automatically carry
+bounds information and have no restrictions on kinds of pointer operations that
+can be used with these pointers, most code inside a function works as is without
+modification. In the example below, ``int *buf`` doesn't require manual
+annotation as it's implicitly ``int *__bidi_indexable buf``, carrying the bounds
+information passed from the return value of malloc, which is necessary to insert
+bounds checking for ``buf[i]``.
+
+.. code-block:: c
+
+   void *__sized_by(size) malloc(size_t size);
+
+   int *__counted_by(n) get_array_with_0_to_n_1(size_t n) {
+      int *buf = malloc(sizeof(int) * n);
+      for (size_t i = 0; i < n; ++i)
+         buf[i] = i;
+      return buf;
+   }
+
+Annotations for sentinel-delimited arrays
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+A C string is an array of characters. The null terminator — the first null
+character ('\0') element in the array — marks the end of the string.
+``-fbounds-safety`` provides ``__null_terminated`` to annotate C strings and the
+generalized form ``__terminated_by(T)`` to annotate pointers and arrays with an
+end marked by a sentinel value. The model prevents dereferencing a
+``__terminated_by`` pointer beyond its end. Calculating the location of the end
+(i.e., the address of the sentinel value), requires reading the entire array in
+memory and would have some performance costs. To avoid an unintended performance
+hit, the model puts some restrictions on how these pointers can be used.
+``__terminated_by`` pointers cannot be indexed and can only be incremented one
+element at a time. To allow these operations, the pointers must be explicitly
+converted to ``__indexable`` pointers using the intrinsic function
+``__unsafe_terminated_by_to_indexable(P, T)`` (or
+``__unsafe_null_terminated_to_indexable(P)``) which converts the
+``__terminated_by`` pointer ``P`` to an ``__indexable`` pointer.
+
+* ``__null_terminated`` : The pointer or array is terminated by ``NULL`` or
+  ``0``. Modifying the terminator or incrementing the pointer beyond it is
+  prevented at run time.
+
+* ``__terminated_by(T)`` : The pointer or array is terminated by ``T`` which is
+  a constant expression. Accessing or incrementing the pointer beyond the
+  terminator is not allowed. This is a generalization of ``__null_terminated``
+  which is defined as ``__terminated_by(0)``.
+
+Annotation for interoperating with bounds-unsafe code
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+A pointer with the ``__unsafe_indexable`` annotation behaves the same as a plain
+C pointer. That is, the pointer does not have any bounds information and pointer
+operations are not checked.
+
+``__unsafe_indexable`` can be used to mark pointers from system headers or
+pointers from code that has not adopted -fbounds safety. This enables
+interoperation between code using ``-fbounds-safety`` and code that does not.
+
+Default pointer types
+---------------------
+
+ABI visibility and default annotations
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Requiring ``-fbounds-safety`` adopters to add bounds annotations to all pointers
+in the codebase would be a significant adoption burden. To avoid this and to
+secure all pointers by default, ``-fbounds-safety`` applies default bounds
+annotations to pointer types.
+Default annotations apply to pointer types of declarations
+
+``-fbounds-safety`` applies default bounds annotations to pointer types used in
+declarations. The default annotations are determined by the ABI visibility of
+the pointer. A pointer type is ABI-visible if changing its size or
+representation affects the ABI. For instance, changing the size of a type used
+in a function parameter will affect the ABI and thus pointers used in function
+parameters are ABI-visible pointers. On the other hand, changing the types of
+local variables won't have such ABI implications. Hence, ``-fbounds-safety``
+considers the outermost pointer types of local variables as non-ABI visible. The
+rest of the pointers such as nested pointer types, pointer types of global
+variables, struct fields, and function prototypes are considered ABI-visible.
+
+All ABI-visible pointers are treated as ``__single`` by default unless annotated
+otherwise. This default both preserves ABI and makes these pointers safe by
+default. This behavior can be controlled with macros, i.e.,
+``__ptrcheck_abi_assume_*ATTR*()``, to set the default annotation for
+ABI-visible pointers to be either ``__single``, ``__bidi_indexable``,
+``__indexable``, or ``__unsafe_indexable``. For instance,
+``__ptrcheck_abi_assume_unsafe_indexable()`` will make all ABI-visible pointers
+be ``__unsafe_indexable``. Non-ABI visible pointers — the outermost pointer
+types of local variables — are ``__bidi_indexable`` by default, so that these
+pointers have the bounds information necessary to perform bounds checks without
+the need for a manual annotation. All ``const char`` pointers or any typedefs
+equivalent to ``const char`` pointers are ``__null_terminated`` by default. This
+means that ``char8_t`` is ``unsigned char`` so ``const char8_t *`` won't be
+``__null_terminated`` by default. Similarly, ``const wchar_t *`` won't be
+``__null_terminated`` by default unless the platform defines it as ``typedef
+char wchar_t``. Please note, however, that the programmers can still explicitly
+use ``__null_terminated`` in any other pointers, e.g., ``char8_t
+*__null_terminated``, ``wchar_t *__null_terminated``, ``int
+*__null_terminated``, etc. if they should be treated as ``__null_terminated``.
+The same applies to other annotations.
+In system headers, the default pointer attribute for ABI-visible pointers is set
+to ``__unsafe_indexable`` by default.
+
+The ``__ptrcheck_abi_assume_*ATTR*()`` macros are defined as pragmas in the
+toolchain header (See `Portability with toolchains that do not support the
+extension`_ for more details about the toolchain header):
+
+.. code-block:: C
+
+#define __ptrcheck_abi_assume_single() \
+   _Pragma("clang abi_ptr_attr set(single)")
+
+#define __ptrcheck_abi_assume_indexable() \
+  _Pragma("clang abi_ptr_attr set(indexable)")
+
+#define __ptrcheck_abi_assume_bidi_indexable() \
+  _Pragma("clang abi_ptr_attr set(bidi_indexable)")
+
+#define __ptrcheck_abi_assume_unsafe_indexable() \
+  _Pragma("clang abi_ptr_attr set(unsafe_indexable)")
+
+
+ABI implications of default bounds annotations
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Although simply modifying types of a local variable doesn't normally impact the
+ABI, taking the address of such a modified type could create a pointer type that
+has an ABI mismatch. Looking at the following example, ``int *local`` is
+implicitly ``int *__bidi_indexable`` and thus the type of ``&local`` is a
+pointer to ``int *__bidi_indexable``. On the other hand, in ``void foo(int
+**)``, the parameter type is a pointer to ``int *__single`` (i.e., ``void
+foo(int *__single *__single)``) (or a pointer to ``int *__unsafe_indexable`` if
+it's from a system header). The compiler reports an error for casts between
+pointers whose elements have incompatible pointer attributes. This way,
+``-fbounds-safety`` prevents pointers that are implicitly ``__bidi_indexable``
+from silently escaping thereby breaking the ABI.
+
+.. code-block:: c
+
+   void foo(int **);
+
+   void bar(void) {
+      int *local = 0;
+      // error: passing 'int *__bidi_indexable*__bidi_indexable' to parameter of
+      // incompatible nested pointer type 'int *__single*__single'
+      foo(&local);
+   }
+
+A local variable may still be exposed to the ABI if ``typeof()`` takes the type
+of local variable to define an interface as shown in the following example.
+
+.. code-block:: C
+
+   // bar.c
+   void bar(int *) { ... }
+
+   // foo.c
+   void foo(void) {
+      int *p; // implicitly `int *__bidi_indexable p`
+      extern void bar(typeof(p)); // creates an interface of type
+                                  // `void bar(int *__bidi_indexable)`
+   }
+
+Doing this may break the ABI if the parameter is not ``__bidi_indexable`` at the
+definition of function ``bar()`` which is likely the case because parameters are
+``__single`` by default without an explicit annotation.
+
+In order to avoid an implicitly wide pointer from silently breaking the ABI, the
+compiler reports a warning when ``typeof()`` is used on an implicit wide pointer
+at any ABI visible context (e.g., function prototype, struct definition, etc.).
+
+.. _Default pointer types in typeof:
+
+Default pointer types in ``typeof()``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+When ``typeof()`` takes an expression, it respects the bounds annotation on
+the expression type, including the bounds annotation is implcit. For example,
+the global variable ``g`` in the following code is implicitly ``__single`` so
+``typeof(g)`` gets ``char *__single``. The similar is true for the parameter
+``p``, so ``typeof(p)`` returns ``void *__single``. The local variable ``l`` is
+implicitly ``__bidi_indexable``, so ``typeof(l)`` becomes
+``int *__bidi_indexable``.
+
+.. code-block:: C
+
+   char *g; // typeof(g) == char *__single
+
+   void foo(void *p) {
+      // typeof(p) == void *__single
+
+      int *l; // typeof(l) == int *__bidi_indexable
+   }
+
+When the type of expression has an "external" bounds annotation, e.g.,
+``__sized_by``, ``__counted_by``, etc., the compiler may report an error on
+``typeof`` if the annotation creates a dependency with another declaration or
+variable. For example, the compiler reports an error on ``typeof(p1)`` shown in
+the following code because allowing it can potentially create another type
+dependent on the parameter ``size`` in a 
diff erent context (Please note that an
+external bounds annotation on a parameter may only refer to another parameter of
+the same function). On the other hand, ``typeof(p2)`` works resulting in ``int
+*__counted_by(10)``, since it doesn't depend on any other declaration.
+
+.. TODO: add a section describing constraints on external bounds annotations
+
+.. code-block:: C
+
+   void foo(int *__counted_by(size) p1, size_t size) {
+      // typeof(p1) == int *__counted_by(size)
+      // -> a compiler error as it tries to create another type
+      // dependent on `size`.
+
+      int *__counted_by(10) p2; // typeof(p2) == int *__counted_by(10)
+                                // -> no error
+
+   }
+
+When ``typeof()`` takes a type name, the compiler doesn't apply an implicit
+bounds annotation on the named pointer types. For example, ``typeof(int*)``
+returns ``int *`` without any bounds annotation. A bounds annotation may be
+added after the fact depending on the context. In the following example,
+``typeof(int *)`` returns ``int *`` so it's equivalent as the local variable is
+declared as ``int *l``, so it eventually becomes implicitly
+``__bidi_indexable``.
+
+.. code-block:: c
+
+   void foo(void) {
+      typeof(int *) l; // `int *__bidi_indexable` (same as `int *l`)
+   }
+
+The programmers can still explicitly add a bounds annotation on the types named
+inside ``typeof``, e.g., ``typeof(int *__bidi_indexable)``, which evaluates to
+``int *__bidi_indexable``.
+
+
+Default pointer types in ``sizeof()``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+When ``sizeof()`` takes a type name, the compiler doesn't apply an implicit
+bounds annotation on the named pointer types. This means if a bounds annotation
+is not specified, the evaluated pointer type is treated identically to a plain C
+pointer type. Therefore, ``sizeof(int*)`` remains the same with or without
+``-fbounds-safety``. That said, programmers can explicitly add attribute to the
+types, e.g., ``sizeof(int *__bidi_indexable)``, in which case the sizeof
+evaluates to the size of type ``int *__bidi_indexable`` (the value equivalent to
+``3 * sizeof(int*)``).
+
+When ``sizeof()`` takes an expression, i.e., ``sizeof(expr``, it behaves as
+``sizeof(typeof(expr))``, except that ``sizeof(expr)`` does not report an error
+with ``expr`` that has a type with an external bounds annotation dependent on
+another declaration, whereas ``typeof()`` on the same expression would be an
+error as described in :ref:`Default pointer types in typeof`.
+The following example describes this behavior.
+
+.. code-block:: c
+
+   void foo(int *__counted_by(size) p, size_t size) {
+      // sizeof(p) == sizeof(int *__counted_by(size)) == sizeof(int *)
+      // typeof(p): error
+   };
+
+Default pointer types in ``alignof()``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``alignof()`` only takes a type name as the argument and it doesn't take an
+expression. Similar to ``sizeof()`` and ``typeof``, the compiler doesn't apply
+an implicit bounds annotation on the pointer types named inside ``alignof()``.
+Therefore, ``alignof(T *)`` remains the same with or without
+``-fbounds-safety``, evaluating into the alignment of the raw pointer ``T *``.
+The programmers can explicitly add a bounds annotation to the types, e.g.,
+``alignof(int *__bidi_indexable)``, which returns the alignment of ``int
+*__bidi_indexable``. A bounds annotation including an internal bounds annotation
+(i.e., ``__indexable`` and ``__bidi_indexable``) doesn't affect the alignment of
+the original pointer. Therefore, ``alignof(int *__bidi_indexable)`` is equal to
+``alignof(int *)``.
+
+
+Default pointer types used in C-style casts
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+A pointer type used in a C-style cast (e.g., ``(int *)src``) inherits the same
+pointer attribute in the type of src. For instance, if the type of src is ``T
+*__single`` (with ``T`` being an arbitrary C type), ``(int *)src`` will be ``int
+*__single``. The reasoning behind this behavior is so that a C-style cast
+doesn't introduce any unexpected side effects caused by an implicit cast of
+bounds attribute.
+
+Pointer casts can have explicit bounds annotations. For instance, ``(int
+*__bidi_indexable)src`` casts to ``int *__bidi_indexable`` as long as src has a
+bounds annotation that can implicitly convert to ``__bidi_indexable``. If
+``src`` has type ``int *__single``, it can implicitly convert to ``int
+*__bidi_indexable`` which then will have the upper bound pointing to one past
+the first element. However, if src has type ``int *__unsafe_indexable``, the
+explicit cast ``(int *__bidi_indexable)src`` will cause an error because
+``__unsafe_indexable`` cannot cast to ``__bidi_indexable`` as
+``__unsafe_indexable`` doesn't have bounds information. `Cast rules`_ describes
+in more detail what kinds of casts are allowed between pointers with 
diff erent
+bounds annotations.
+
+Default pointer types in typedef
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Pointer types in ``typedef``\s do not have implicit default bounds annotations.
+Instead, the bounds annotation is determined when the ``typedef`` is used. The
+following example shows that no pointer annotation is specified in the ``typedef
+pint_t`` while each instance of ``typedef``'ed pointer gets its bounds
+annotation based on the context in which the type is used.
+
+.. code-block:: c
+
+   typedef int * pint_t; // int *
+
+   pint_t glob; // int *__single glob;
+
+   void foo(void) {
+      pint_t local; // int *__bidi_indexable local;
+   }
+
+Pointer types in a ``typedef`` can still have explicit annotations, e.g.,
+``typedef int *__single``, in which case the bounds annotation ``__single`` will
+apply to every use of the ``typedef``.
+
+Array to pointer promotion to secure arrays (including VLAs)
+------------------------------------------------------------
+
+Arrays on function prototypes
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+In C, arrays on function prototypes are promoted (or "decayed") to a pointer to
+its first element (e.g., ``&arr[0]``). In ``-fbounds-safety``, arrays are also
+decayed to pointers, but with the addition of an implicit bounds annotation,
+which includes variable-length arrays (VLAs). As shown in the following example,
+arrays on function prototypes are decalyed to corresponding ``__counted_by``
+pointers.
+
+.. code-block:: c
+
+   // Function prototype: void foo(int n, int *__counted_by(n) arr);
+   void foo(int n, int arr[n]);
+
+   // Function prototype: void bar(int *__counted_by(10) arr);
+   void bar(int arr[10]);
+
+This means the array parameters are treated as `__counted_by` pointers within
+the function and callers of the function also see them as the corresponding
+`__counted_by` pointers.
+
+Incomplete arrays on function prototypes will cause a compiler error unless it
+has ``__counted_by`` annotation in its bracket.
+
+.. code-block:: c
+
+   void f1(int n, int arr[]); // error
+
+   void f3(int n, int arr[__counted_by(n)]); // ok
+
+   void f2(int n, int arr[n]); // ok, decays to int *__counted_by(n)
+
+   void f4(int n, int *__counted_by(n) arr); // ok
+
+   void f5(int n, int *arr); // ok, but decays to int *__single,
+                             // and cannot be used for pointer arithmetic
+
+Array references
+^^^^^^^^^^^^^^^^
+
+In C, similar to arrays on the function prototypes, a reference to array is
+automatically promoted (or "decayed") to a pointer to its first element (e.g.,
+``&arr[0]``).
+
+In `-fbounds-safety`, array references are promoted to ``__bidi_indexable``
+pointers which contain the upper and lower bounds of the array, with the
+equivalent of ``&arr[0]`` serving as the lower bound and ``&arr[array_size]``
+(or one past the last element) serving as the upper bound. This applies to all
+types of arrays including constant-length arrays, variable-length arrays (VLAs),
+and flexible array members annotated with `__counted_by`.
+
+In the following example, reference to ``vla`` promotes to ``int
+*__bidi_indexable``, with ``&vla[n]`` as the upper bound and ``&vla[0]`` as the
+lower bound. Then, it's copied to ``int *p``, which is implicitly ``int
+*__bidi_indexable p``. Please note that value of ``n`` used to create the upper
+bound is ``10``, not ``100``, in this case because ``10`` is the actual length
+of ``vla``, the value of ``n`` at the time when the array is being allocated.
+
+.. code-block:: c
+
+   void foo(void) {
+      int n = 10;
+      int vla[n];
+      n = 100;
+      int *p = vla; // { .ptr: &vla[0], .upper: &vla[10], .lower: &vla[0] }
+                    // it's `&vla[10]` because the value of `n` was 10 at the
+                    // time when the array is actually allocated.
+      // ...
+   }
+
+By promoting array references to ``__bidi_indexable``, all array accesses are
+bounds checked in ``-fbounds-safety``, just as ``__bidi_indexable`` pointers
+are.
+
+Maintaining correctness of bounds annotations
+---------------------------------------------
+
+``-fbounds-safety`` maintains correctness of bounds annotations by performing
+additional checks when a pointer object and/or its related value containing the
+bounds information is updated.
+
+For example, ``__single`` expresses an invariant that the pointer must either
+point to a single valid object or be a null pointer. To maintain this invariant,
+the compiler inserts checks when initializing a ``__single`` pointer, as shown
+in the following example:
+
+.. code-block:: c
+
+   void foo(void *__sized_by(size) vp, size_t size) {
+      // Inserted check:
+      // if ((int*)upper_bound(vp) - (int*)vp < sizeof(int) && !!vp) trap();
+      int *__single ip = (int *)vp;
+   }
+
+Additionally, an explicit bounds annotation such as ``int *__counted_by(count)
+buf`` defines a relationship between two variables, ``buf`` and ``count``:
+namely, that ``buf`` has ``count`` number of elements available. This
+relationship must hold even after any of these related variables are updated. To
+this end, the model requires that assignments to ``buf`` and ``count`` must be
+side by side, with no side effects between them. This prevents ``buf`` and
+``count`` from temporarily falling out of sync due to updates happening at a
+distance.
+
+The example below shows a function ``alloc_buf`` that initializes a struct that
+members that use the ``__counted_by`` annotation. The compiler allows these
+assignments because ``sbuf->buf`` and ``sbuf->count`` are updated side by side
+without any side effects in between the assignments.
+
+Furthermore, the compiler inserts additional run-time checks to ensure the new
+``buf`` has at least as many elements as the new ``count`` indicates as shown in
+the transformed pseudo code of function ``alloc_buf()`` in the example below.
+
+.. code-block:: c
+
+   typedef struct {
+      int *__counted_by(count) buf;
+      size_t count;
+   } sized_buf_t;
+
+   void alloc_buf(sized_buf_t *sbuf, sized_t nelems) {
+      sbuf->buf = (int *)malloc(sizeof(int) * nelems);
+      sbuf->count = nelems;
+   }
+
+   // Transformed pseudo code:
+   void alloc_buf(sized_buf_t *sbuf, sized_t nelems) {
+      // Materialize RHS values:
+      int *tmp_ptr = (int *)malloc(sizeof(int) * nelems);
+      int tmp_count = nelems;
+      // Inserted check:
+      //   - checks to ensure that `lower <= tmp_ptr <= upper`
+      //   - if (upper(tmp_ptr) - tmp_ptr < tmp_count) trap();
+      sbuf->buf = tmp_ptr;
+      sbuf->count = tmp_count;
+   }
+
+Whether the compiler can optimize such run-time checks depends on how the upper
+bound of the pointer is derived. If the source pointer has ``__sized_by``,
+``__counted_by``, or a variant of such, the compiler assumes that the upper
+bound calculation doesn't overflow, e.g., ``ptr + size`` (where the type of
+``ptr`` is ``void *__sized_by(size)``), because when the ``__sized_by`` pointer
+is initialized, ``-fbounds-safety`` inserts run-time checks to ensure that ``ptr
++ size`` doesn't overflow and that ``size >= 0``.
+
+Assuming the upper bound calculation doesn't overflow, the compiler can simplify
+the trap condition ``upper(tmp_ptr) - tmp_ptr < tmp_count`` to ``size <
+tmp_count`` so if both ``size`` and ``tmp_count`` values are known at compile
+time such that ``0 <= tmp_count <= size``, the optimizer can remove the check.
+
+``ptr + size`` may still overflow if the ``__sized_by`` pointer is created from
+code that doesn't enable ``-fbounds-safety``, which is undefined behavior.
+
+In the previous code example with the transformed ``alloc_buf()``, the upper
+bound of ``tmp_ptr`` is derived from ``void *__sized_by_or_null(size)``, which
+is the return type of ``malloc()``. Hence, the pointer arithmetic doesn't
+overflow or ``tmp_ptr`` is null. Therefore, if ``nelems`` was given as a
+compile-time constant, the compiler could remove the checks.
+
+Cast rules
+----------
+
+``-fbounds-safety`` does not enforce overall type safety and bounds invariants
+can still be violated by incorrect casts in some cases. That said,
+``-fbounds-safety`` prevents type conversions that change bounds attributes in a
+way to violate the bounds invariant of the destination's pointer annotation.
+Type conversions that change bounds attributes may be allowed if it does not
+violate the invariant of the destination or that can be verified at run time.
+Here are some of the important cast rules.
+
+Two pointers that have 
diff erent bounds annotations on their nested pointer
+types are incompatible and cannot implicitly cast to each other. For example,
+``T *__single *__single`` cannot be converted to ``T *__bidi_indexable
+*__single``. Such a conversion between incompatible nested bounds annotations
+can be allowed using an explicit cast (e.g., C-style cast). Hereafter, the rules
+only apply to the top pointer types. ``__unsafe_indexable`` cannot be converted
+to any other safe pointer types (``__single``, ``__bidi_indexable``,
+``__counted_by``, etc) using a cast. The extension provides builtins to force
+this conversion, ``__unsafe_forge_bidi_indexable(type, pointer, char_count)`` to
+convert pointer to a ``__bidi_indexable`` pointer of type with ``char_count``
+bytes available and ``__unsafe_forge_single(type, pointer)`` to convert pointer
+to a single pointer of type type. The following examples show the usage of these
+functions. Function ``example_forge_bidi()`` gets an external buffer from an
+unsafe library by calling ``get_buf()`` which returns ``void
+*__unsafe_indexable.`` Under the type rules, this cannot be directly assigned to
+``void *buf`` (implicitly ``void *__bidi_indexable``). Thus,
+``__unsafe_forge_bidi_indexable`` is used to manually create a
+``__bidi_indexable`` from the unsafe buffer.
+
+.. code-block:: c
+
+   // unsafe_library.h
+   void *__unsafe_indexable get_buf(void);
+   size_t get_buf_size(void);
+
+   // my_source1.c (enables -fbounds-safety)
+   #include "unsafe_library.h"
+   void example_forge_bidi(void) {
+      void *buf =
+        __unsafe_forge_bidi_indexable(void *, get_buf(), get_buf_size());
+      // ...
+   }
+
+   // my_source2.c (enables -fbounds-safety)
+   #include <stdio.h>
+   void example_forge_single(void) {
+      FILE *fp = __unsafe_forge_single(FILE *, fopen("mypath", "rb"));
+      // ...
+   }
+
+* Function ``example_forge_single`` takes a file handle by calling fopen defined
+  in system header ``stdio.h``. Assuming ``stdio.h`` did not adopt
+  ``-fbounds-safety``, the return type of ``fopen`` would implicitly be ``FILE
+  *__unsafe_indexable`` and thus it cannot be directly assigned to ``FILE *fp``
+  in the bounds-safe source. To allow this operation, ``__unsafe_forge_single``
+  is used to create a ``__single`` from the return value of ``fopen``.
+
+* Similar to ``__unsafe_indexable``, any non-pointer type (including ``int``,
+  ``intptr_t``, ``uintptr_t``, etc.) cannot be converted to any safe pointer
+  type because these don't have bounds information. ``__unsafe_forge_single`` or
+  ``__unsafe_forge_bidi_indexable`` must be used to force the conversion.
+
+* Any safe pointer types can cast to ``__unsafe_indexable`` because it doesn't
+  have any invariant to maintain.
+
+* ``__single`` casts to ``__bidi_indexable`` if the pointee type has a known
+  size. After the conversion, the resulting ``__bidi_indexable`` has the size of
+  a single object of the pointee type of ``__single``. ``__single`` cannot cast
+  to ``__bidi_indexable`` if the pointee type is incomplete or sizeless. For
+  example, ``void *__single`` cannot convert to ``void *__bidi_indexable``
+  because void is an incomplete type and thus the compiler cannot correctly
+  determine the upper bound of a single void pointer.
+
+* Similarly, ``__single`` can cast to ``__indexable`` if the pointee type has a
+  known size. The resulting ``__indexable`` has the size of a single object of
+  the pointee type.
+
+* ``__single`` casts to ``__counted_by(E)`` only if ``E`` is 0 or 1.
+
+* ``__single`` can cast to ``__single`` including when they have 
diff erent
+  pointee types as long as it is allowed in the underlying C standard.
+  ``-fbounds-safety`` doesn't guarantee type safety.
+
+* ``__bidi_indexable`` and ``__indexable`` can cast to ``__single``. The
+  compiler may insert run-time checks to ensure the pointer has at least a
+  single element or is a null pointer.
+
+* ``__bidi_indexable`` casts to ``__indexable`` if the pointer does not have an
+  underflow. The compiler may insert run-time checks to ensure the pointer is
+  not below the lower bound.
+
+* ``__indexable`` casts to ``__bidi_indexable``. The resulting
+  ``__bidi_indexable`` gets the lower bound same as the pointer value.
+
+* A type conversion may involve both a bitcast and a bounds annotation cast. For
+  example, casting from ``int *__bidi_indexable`` to ``char *__single`` involve
+  a bitcast (``int *`` to ``char *``) and a bounds annotation cast
+  (``__bidi_indexable`` to ``__single``). In this case, the compiler performs
+  the bitcast and then converts the bounds annotation. This means, ``int
+  *__bidi_indexable`` will be converted to ``char *__bidi_indexable`` and then
+  to ``char *__single``.
+
+* ``__terminated_by(T)`` cannot cast to any safe pointer type without the same
+  ``__terminated_by(T)`` attribute. To perform the cast, programmers can use an
+  intrinsic function such as ``__unsafe_terminated_by_to_indexable(P)`` to force
+  the conversion.
+
+* ``__terminated_by(T)`` can cast to ``__unsafe_indexable``.
+
+* Any type without ``__terminated_by(T)`` cannot cast to ``__terminated_by(T)``
+  without explicitly using an intrinsic function to allow it.
+
+  + ``__unsafe_terminated_by_from_indexable(T, PTR [, PTR_TO_TERM])`` casts any
+    safe pointer PTR to a ``__terminated_by(T)`` pointer. ``PTR_TO_TERM`` is an
+    optional argument where the programmer can provide the exact location of the
+    terminator. With this argument, the function can skip reading the entire
+    array in order to locate the end of the pointer (or the upper bound).
+    Providing an incorrect ``PTR_TO_TERM`` causes a run-time trap.
+
+  + ``__unsafe_forge_terminated_by(T, P, E)`` creates ``T __terminated_by(E)``
+    pointer given any pointer ``P``. Tmust be a pointer type.
+
+Portability with toolchains that do not support the extension
+-------------------------------------------------------------
+
+The language model is designed so that it doesn't alter the semantics of the
+original C program, other than introducing deterministic traps where otherwise
+the behavior is undefined and/or unsafe. Clang provides a toolchain header
+(``ptrcheck.h``) that macro-defines the annotations as type attributes when
+``-fbounds-safety`` is enabled and defines them to empty when the extension is
+disabled. Thus, the code adopting ``-fbounds-safety`` can compile with
+toolchains that do not support this extension, by including the header or adding
+macros to define the annotations to empty. For example, the toolchain not
+supporting this extension may not have a header defining ``__counted_by``, so
+the code using ``__counted_by`` must define it as nothing or include a header
+that has the define.
+
+.. code-block:: c
+
+   #if defined(__has_feature) && __has_feature(bounds_safety)
+   #define __counted_by(T) __attribute__((__counted_by__(T)))
+   // ... other bounds annotations
+   #else #define __counted_by(T) // defined as nothing
+   // ... other bounds annotations
+   #endif
+
+   // expands to `void foo(int * ptr, size_t count);`
+   // when extension is not enabled or not available
+   void foo(int *__counted_by(count) ptr, size_t count);
+
+Other potential applications of bounds annotations
+==================================================
+
+The bounds annotations provided by the ``-fbounds-safety`` programming model
+have potential use cases beyond the language extension itself. For example,
+static and dynamic analysis tools could use the bounds information to improve
+diagnostics for out-of-bounds accesses, even if ``-fbounds-safety`` is not used.
+The bounds annotations could be used to improve C interoperability with
+bounds-safe languages, providing a better mapping to bounds-safe types in the
+safe language interface. The bounds annotations can also serve as documentation
+specifying the relationship between declarations.
+
+Limitations
+===========
+
+``-fbounds-safety`` aims to bring the bounds safety guarantee to the C language,
+and it does not guarantee other types of memory safety properties. Consequently,
+it may not prevent some of the secondary bounds safety violations caused by
+other types of safety violations such as type confusion. For instance,
+``-fbounds-safety`` does not perform type-safety checks on conversions between
+`__single`` pointers of 
diff erent pointee types (e.g., ``char *__single`` →
+``void *__single`` → ``int *__single``) beyond what the foundation languages
+(C/C++) already offer.
+
+``-fbounds-safety`` heavily relies on run-time checks to keep the bounds safety
+and the soundness of the type system. This may incur significant code size
+overhead in unoptimized builds and leaving some of the adoption mistakes to be
+caught only at run time. This is not a fundamental limitation, however, because
+incrementally adding necessary static analysis will allow us to catch issues
+early on and remove unnecessary bounds checks in unoptimized builds.
\ No newline at end of file

diff  --git a/clang/docs/BoundsSafetyImplPlans.rst b/clang/docs/BoundsSafetyImplPlans.rst
new file mode 100644
index 000000000000000..4fbf87f96635079
--- /dev/null
+++ b/clang/docs/BoundsSafetyImplPlans.rst
@@ -0,0 +1,255 @@
+============================================
+Implementation plans for ``-fbounds-safety``
+============================================
+
+.. contents::
+   :local:
+
+External bounds annotations
+===========================
+
+The bounds annotations are C type attributes appertaining to pointer types. If
+an attribute is added to the position of a declaration attribute, e.g., ``int
+*ptr __counted_by(size)``, the attribute appertains to the outermost pointer
+type of the declaration (``int *``).
+
+New sugar types
+===============
+
+An external bounds annotation creates a type sugar of the underlying pointer
+types. We will introduce a new sugar type, ``DynamicBoundsPointerType`` to
+represent ``__counted_by`` or ``__sized_by``. Using ``AttributedType`` would not
+be sufficient because the type needs to hold the count or size expression as
+well as some metadata necessary for analysis, while this type may be implemented
+through inheritance from ``AttributedType``. Treating the annotations as type
+sugars means two types with incompatible external bounds annotations may be
+considered canonically the same types. This is sometimes necessary, for example,
+to make the ``__counted_by`` and friends not participate in function
+overloading. However, this design requires a separate logic to walk through the
+entire type hierarchy to check type compatibility of bounds annotations.
+
+Late parsing for C
+==================
+
+A bounds annotation such as ``__counted_by(count)`` can be added to type of a
+struct field declaration where count is another field of the same struct
+declared later. Similarly, the annotation may apply to type of a function
+parameter declaration which precedes the parameter count in the same function.
+This means parsing the argument of bounds annotations must be done after the
+parser has the whole context of a struct or a function declaration. Clang has
+late parsing logic for C++ declaration attributes that require late parsing,
+while the C declaration attributes and C/C++ type attributes do not have the
+same logic. This requires introducing late parsing logic for C/C++ type
+attributes.
+
+Internal bounds annotations
+===========================
+
+``__indexable`` and ``__bidi_indexable`` alter pointer representations to be
+equivalent to a struct with the pointer and the corresponding bounds fields.
+Despite this 
diff erence in their representations, they are still pointers in
+terms of types of operations that are allowed and their semantics. For instance,
+a pointer dereference on a ``__bidi_indexable`` pointer will return the
+dereferenced value same as plain C pointers, modulo the extra bounds checks
+being performed before dereferencing the wide pointer. This means mapping the
+wide pointers to struct types with equivalent layout won’t be sufficient. To
+represent the wide pointers in Clang AST, we add an extra field in the
+PointerType class to indicate the internal bounds of the pointer. This ensures
+pointers of 
diff erent representations are mapped to 
diff erent canonical types
+while they are still treated as pointers.
+
+In LLVM IR, wide pointers will be emitted as structs of equivalent
+representations. Clang CodeGen will handle them as Aggregate in
+``TypeEvaluationKind (TEK)``. ``AggExprEmitter`` was extended to handle pointer
+operations returning wide pointers. Alternatively, a new ``TEK`` and an
+expression emitter dedicated to wide pointers could be introduced.
+
+Default bounds annotations
+==========================
+
+The model may implicitly add ``__bidi_indexable`` or ``__single`` depending on
+the context of the declaration that has the pointer type. ``__bidi_indexable``
+implicitly adds to local variables, while ``__single`` implicitly adds to
+pointer types specifying struct fields, function parameters, or global
+variables. This means the parser may first create the pointer type without any
+default pointer attribute and then recreate the type once the parser has the
+declaration context and determined the default attribute accordingly.
+
+This also requires the parser to reset the type of the declaration with the
+newly created type with the right default attribute.
+
+Promotion expression
+====================
+
+A new expression will be introduced to represent the conversion from a pointer
+with an external bounds annotation, such as ``__counted_by``, to
+``__bidi_indexable``. This type of conversion cannot be handled by normal
+CastExprs because it requires an extra subexpression(s) to provide the bounds
+information necessary to create a wide pointer.
+
+Bounds check expression
+=======================
+
+Bounds checks are part of semantics defined in the ``-fbounds-safety`` language
+model. Hence, exposing the bounds checks and other semantic actions in the AST
+is desirable. A new expression for bounds checks has been added to the AST. The
+bounds check expression has a ``BoundsCheckKind`` to indicate the kind of checks
+and has the additional sub-expressions that are necessary to perform the check
+according to the kind.
+
+Paired assignment check
+=======================
+
+``-fbounds-safety`` enforces that variables or fields related with the same
+external bounds annotation (e.g., ``buf`` and ``count`` related with
+``__counted_by`` in the example below) must be updated side by side within the
+same basic block and without side effect in between.
+
+.. code-block:: c
+
+   typedef struct {
+      int *__counted_by(count) buf; size_t count;
+   } sized_buf_t;
+
+   void alloc_buf(sized_buf_t *sbuf, sized_t nelems) {
+      sbuf->buf = (int *)malloc(sizeof(int) * nelems);
+      sbuf->count = nelems;
+   }
+
+To implement this rule, the compiler requires a linear representation of
+statements to understand the ordering and the adjacency between the two or more
+assignments. The Clang CFG is used to implement this analysis as Clang CFG
+provides a linear view of statements within each ``CFGBlock`` (Clang
+``CFGBlock`` represents a single basic block in a source-level CFG).
+
+Bounds check optimizations
+==========================
+
+In ``-fbounds-safety``, the Clang frontend emits run-time checks for every
+memory dereference if the type system or analyses in the frontend couldn’t
+verify its bounds safety. The implementation relies on LLVM optimizations to
+remove redundant run-time checks. Using this optimization strategy, if the
+original source code already has bounds checks, the fewer additional checks
+``-fbounds-safety`` will introduce. The LLVM ``ConstraintElimination`` pass is
+design to remove provable redundant checks (please check Florian Hahn’s
+presentation in 2021 LLVM Dev Meeting and the implementation to learn more). In
+the following example, ``-fbounds-safety`` implicitly adds the redundant bounds
+checks that the optimizer can remove:
+
+.. code-block:: c
+
+   void fill_array_with_indices(int *__counted_by(count) p, size_t count) {
+      for (size_t i = 0; i < count; ++i) {
+         // implicit bounds checks:
+         //   if (p + i < p || p + i + 1 > p + count) trap();
+         p[i] = i;
+      }
+   }
+
+``ConstraintElimination`` collects the following facts and determines if the
+bounds checks can be safely removed:
+
+* Inside the for-loop, ``0 <= i < count``, hence ``1 <= i + 1 <= count``.
+* Pointer arithmetic ``p + count`` in the if-condition doesn’t wrap.
+* ``-fbounds-safety`` treats pointer arithmetic overflow as deterministically
+  two’s complement computation, not an undefined behavior. Therefore,
+  getelementptr does not typically have inbounds keyword. However, the compiler
+  does emit inbounds for ``p + count`` in this case because
+  ``__counted_by(count)`` has the invariant that p has at least as many as
+  elements as count. Using this information, ``ConstraintElimination`` is able
+  to determine ``p + count`` doesn’t wrap.
+* Accordingly, ``p + i`` and ``p + i + 1`` also don’t wrap.
+* Therefore, ``p <= p + i`` and ``p + i + 1 <= p + count``.
+* The if-condition simplifies to false and becomes dead code that the subsequent
+  optimization passes can remove.
+
+``OptRemarks`` can be utilized to provide insights into performance tuning. It
+has the capability to report on checks that it cannot eliminate, possibly with
+reasons, allowing programmers to adjust their code to unlock further
+optimizations.
+
+Debugging
+=========
+
+Internal bounds annotations
+---------------------------
+
+Internal bounds annotations change a pointer into a wide pointer. The debugger
+needs to understand that wide pointers are essentially pointers with a struct
+layout. To handle this, a wide pointer is described as a record type in the
+debug info. The type name has a special name prefix (e.g.,
+``__bounds_safety$bidi_indexable``) which can be recognized by a debug info
+consumer to provide support that goes beyond showing the internal structure of
+the wide pointer. There are no DWARF extensions needed to support wide pointers.
+In our implementation, LLDB recognizes wide pointer types by name and
+reconstructs them as wide pointer Clang AST types for use in the expression
+evaluator.
+
+External bounds annotations
+---------------------------
+
+Similar to internal bounds annotations, external bound annotations are described
+as a typedef to their underlying pointer type in the debug info, and the bounds
+are encoded as strings in the typedef’s name (e.g.,
+``__bounds_safety$counted_by:N``).
+
+Recognizing ``-fbounds-safety`` traps
+-------------------------------------
+
+Clang emits debug info for ``-fbounds-safety`` traps as inlined functions, where
+the function name encodes the error message. LLDB implements a frame recognizer
+to surface a human-readable error cause to the end user. A debug info consumer
+that is unaware of this sees an inlined function whose name encodes an error
+message (e.g., : ``__bounds_safety$Bounds check failed``).
+
+Expression Parsing
+------------------
+
+In our implementation, LLDB’s expression evaluator does not enable the
+``-fbounds-safety`` language option because it’s currently unable to fully
+reconstruct the pointers with external bounds annotations, and also because the
+evaluator operates in C++ mode, utilizing C++ reference types, while
+``-fbounds-safety`` does not currently support C++. This means LLDB’s expression
+evaluator can only evaluate a subset of the ``-fbounds-safety`` language model.
+Specifically, it’s capable of evaluating the wide pointers that already exist in
+the source code. All other expressions are evaluated according to C/C++
+semantics.
+
+C++ support
+===========
+
+C++ has multiple options to write code in a bounds-safe manner, such as
+following the bounds-safety core guidelines and/or using hardened libc++ along
+with the `C++ Safe Buffer model
+<https://discourse.llvm.org/t/rfc-c-buffer-hardening/65734>`_. However, these
+techniques may require ABI changes and may not be applicable to code
+interoperating with C. When the ABI of an existing program needs to be preserved
+and for headers shared between C and C++, ``-fbounds-safety`` offers a potential
+solution.
+
+``-fbounds-safety`` is not currently supported in C++, but we believe the
+general approach would be applicable for future efforts.
+
+Upstreaming plan
+================
+
+Gradual updates with experimental flag
+--------------------------------------
+
+The upstreaming will take place as a series of smaller PRs and we will guard our
+implementation with an experimental flag ``-fexperimental-bounds-safety`` until
+the usable model is fully upstreamed. Once the model is ready for use, we will
+expose the flag ``-fbounds-safety``.
+
+Possible patch sets
+-------------------
+
+* External bounds annotations and the (late) parsing logic.
+* Internal bounds annotations (wide pointers) and their parsing logic.
+* Clang code generation for wide pointers with debug information.
+* Pointer cast semantics involving bounds annotations (this could be divided
+  into multiple sub-PRs).
+* CFG analysis for pairs of related pointer and count assignments and the likes.
+* Bounds check expressions in AST and the Clang code generation (this could also
+  be divided into multiple sub-PRs).
+

diff  --git a/clang/docs/index.rst b/clang/docs/index.rst
index 5453a19564b873c..a35a867b96bd7ee 100644
--- a/clang/docs/index.rst
+++ b/clang/docs/index.rst
@@ -35,6 +35,8 @@ Using Clang as a Compiler
    SanitizerCoverage
    SanitizerStats
    SanitizerSpecialCaseList
+   BoundsSafety
+   BoundsSafetyImplPlans
    ControlFlowIntegrity
    LTOVisibility
    SafeStack


        


More information about the cfe-commits mailing list