[clang] [BoundsSafety] Initial documentation for -fbounds-safety (PR #70749)

Adrian Prantl via cfe-commits cfe-commits at lists.llvm.org
Thu Nov 30 12:54:18 PST 2023


================
@@ -0,0 +1,255 @@
+============================================
+Implementation plans for ``-fbounds-safety``
+============================================
+
+.. contents::
+   :local:
+
+External bounds annotations
+===========================
+
+The bounds annotations are C type attributes appertaining to pointer types. If
+an attribute is added to the position of a declaration attribute, e.g., ``int
+*ptr __counted_by(size)``, the attribute appertains to the outermost pointer
+type of the declaration (``int *``).
+
+New sugar types
+===============
+
+An external bounds annotation creates a type sugar of the underlying pointer
+types. We will introduce a new sugar type, ``DynamicBoundsPointerType`` to
+represent ``__counted_by`` or ``__sized_by``. Using ``AttributedType`` would not
+be sufficient because the type needs to hold the count or size expression as
+well as some metadata necessary for analysis, while this type may be implemented
+through inheritance from ``AttributedType``. Treating the annotations as type
+sugars means two types with incompatible external bounds annotations may be
+considered canonically the same types. This is sometimes necessary, for example,
+to make the ``__counted_by`` and friends not participate in function
+overloading. However, this design requires a separate logic to walk through the
+entire type hierarchy to check type compatibility of bounds annotations.
+
+Late parsing for C
+==================
+
+A bounds annotation such as ``__counted_by(count)`` can be added to type of a
+struct field declaration where count is another field of the same struct
+declared later. Similarly, the annotation may apply to type of a function
+parameter declaration which precedes the parameter count in the same function.
+This means parsing the argument of bounds annotations must be done after the
+parser has the whole context of a struct or a function declaration. Clang has
+late parsing logic for C++ declaration attributes that require late parsing,
+while the C declaration attributes and C/C++ type attributes do not have the
+same logic. This requires introducing late parsing logic for C/C++ type
+attributes.
+
+Internal bounds annotations
+===========================
+
+``__indexable`` and ``__bidi_indexable`` alter pointer representations to be
+equivalent to a struct with the pointer and the corresponding bounds fields.
+Despite this difference in their representations, they are still pointers in
+terms of types of operations that are allowed and their semantics. For instance,
+a pointer dereference on a ``__bidi_indexable`` pointer will return the
+dereferenced value same as plain C pointers, modulo the extra bounds checks
+being performed before dereferencing the wide pointer. This means mapping the
+wide pointers to struct types with equivalent layout won’t be sufficient. To
+represent the wide pointers in Clang AST, we add an extra field in the
+PointerType class to indicate the internal bounds of the pointer. This ensures
+pointers of different representations are mapped to different canonical types
+while they are still treated as pointers.
+
+In LLVM IR, wide pointers will be emitted as structs of equivalent
+representations. Clang CodeGen will handle them as Aggregate in
+``TypeEvaluationKind (TEK)``. ``AggExprEmitter`` was extended to handle pointer
+operations returning wide pointers. Alternatively, a new ``TEK`` and an
+expression emitter dedicated to wide pointers could be introduced.
+
+Default bounds annotations
+==========================
+
+The model may implicitly add ``__bidi_indexable`` or ``__single`` depending on
+the context of the declaration that has the pointer type. ``__bidi_indexable``
+implicitly adds to local variables, while ``__single`` implicitly adds to
+pointer types specifying struct fields, function parameters, or global
+variables. This means the parser may first create the pointer type without any
+default pointer attribute and then recreate the type once the parser has the
+declaration context and determined the default attribute accordingly.
+
+This also requires the parser to reset the type of the declaration with the
+newly created type with the right default attribute.
+
+Promotion expression
+====================
+
+A new expression will be introduced to represent the conversion from a pointer
+with an external bounds annotation, such as ``__counted_by``, to
+``__bidi_indexable``. This type of conversion cannot be handled by normal
+CastExprs because it requires an extra subexpression(s) to provide the bounds
+information necessary to create a wide pointer.
+
+Bounds check expression
+=======================
+
+Bounds checks are part of semantics defined in the ``-fbounds-safety`` language
+model. Hence, exposing the bounds checks and other semantic actions in the AST
+is desirable. A new expression for bounds checks has been added to the AST. The
+bounds check expression has a ``BoundsCheckKind`` to indicate the kind of checks
+and has the additional sub-expressions that are necessary to perform the check
+according to the kind.
+
+Paired assignment check
+=======================
+
+``-fbounds-safety`` enforces that variables or fields related with the same
+external bounds annotation (e.g., ``buf`` and ``count`` related with
+``__counted_by`` in the example below) must be updated side by side within the
+same basic block and without side effect in between.
+
+.. code-block:: c
+
+   typedef struct {
+      int *__counted_by(count) buf; size_t count;
+   } sized_buf_t;
+
+   void alloc_buf(sized_buf_t *sbuf, sized_t nelems) {
+      sbuf->buf = (int *)malloc(sizeof(int) * nelems);
+      sbuf->count = nelems;
+   }
+
+To implement this rule, the compiler requires a linear representation of
+statements to understand the ordering and the adjacency between the two or more
+assignments. The Clang CFG is used to implement this analysis as Clang CFG
+provides a linear view of statements within each ``CFGBlock`` (Clang
+``CFGBlock`` represents a single basic block in a source-level CFG).
+
+Bounds check optimizations
+==========================
+
+In ``-fbounds-safety``, the Clang frontend emits run-time checks for every
+memory dereference if the type system or analyses in the frontend couldn’t
+verify its bounds safety. The implementation relies on LLVM optimizations to
+remove redundant run-time checks. Using this optimization strategy, if the
+original source code already has bounds checks, the fewer additional checks
+``-fbounds-safety`` will introduce. The LLVM ``ConstraintElimination`` pass is
+design to remove provable redundant checks (please check Florian Hahn’s
+presentation in 2021 LLVM Dev Meeting and the implementation to learn more). In
+the following example, ``-fbounds-safety`` implicitly adds the redundant bounds
+checks that the optimizer can remove:
+
+.. code-block:: c
+
+   void fill_array_with_indices(int *__counted_by(count) p, size_t count) {
+      for (size_t i = 0; i < count; ++i) {
+         // implicit bounds checks:
+         //   if (p + i < p || p + i + 1 > p + count) trap();
+         p[i] = i;
+      }
+   }
+
+``ConstraintElimination`` collects the following facts and determines if the
+bounds checks can be safely removed:
+
+* Inside the for-loop, ``0 <= i < count``, hence ``1 <= i + 1 <= count``.
+* Pointer arithmetic ``p + count`` in the if-condition doesn’t wrap.
+* ``-fbounds-safety`` treats pointer arithmetic overflow as deterministically
+  two’s complement computation, not an undefined behavior. Therefore,
+  getelementptr does not typically have inbounds keyword. However, the compiler
+  does emit inbounds for ``p + count`` in this case because
+  ``__counted_by(count)`` has the invariant that p has at least as many as
+  elements as count. Using this information, ``ConstraintElimination`` is able
+  to determine ``p + count`` doesn’t wrap.
+* Accordingly, ``p + i`` and ``p + i + 1`` also don’t wrap.
+* Therefore, ``p <= p + i`` and ``p + i + 1 <= p + count``.
+* The if-condition simplifies to false and becomes dead code that the subsequent
+  optimization passes can remove.
+
+``OptRemarks`` can be utilized to provide insights into performance tuning. It
+has the capability to report on checks that it cannot eliminate, possibly with
+reasons, allowing programmers to adjust their code to unlock further
+optimizations.
+
+Debugging
+=========
+
+Internal bounds annotations
+---------------------------
+
+Internal bounds annotations change a pointer into a wide pointer. The debugger
+needs to understand that wide pointers are essentially pointers with a struct
+layout. To handle this, a wide pointer is described as a record type in the
+debug info. The type name has a special name prefix (e.g.,
+``__bounds_safety$bidi_indexable``) which can be recognized by a debug info
+consumer to provide support that goes beyond showing the internal structure of
+the wide pointer. There are no DWARF extensions needed to support wide pointers.
+In our implementation, LLDB recognized wide pointer types by name and
----------------
adrian-prantl wrote:

```suggestion
In our implementation, LLDB recognizes wide pointer types by name and
```

https://github.com/llvm/llvm-project/pull/70749


More information about the cfe-commits mailing list