[clang] [llvm] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)

John McCall via cfe-commits cfe-commits at lists.llvm.org
Wed Dec 20 14:16:00 PST 2023


https://github.com/rjmccall commented:

In general, this patch needs to be clearer about what rules it's actually enforcing.  You're adding new command-line options, but users have to guess what they mean!

If you're going to be working on TBAA, would you mind adding a section to Clang's manual (`UsersManual.rst`) about type-based alias analysis?  We should start by documenting our current behavior, then document the behavior of all these new options.  To make this a less onerous request, let me suggest some starter text:

```
C and C++ forbid programmers from accessing objects using l-values that don't match the type of the object.  By default, Clang takes advantage of these rules to decide that certain pointers cannot point to the same object; this is called *strict aliasing* or *type-based alias analysis* (TBAA).  This can be completely disabled using the option ``-fno-strict-aliasing``.  ``-fno-strict-aliasing`` is the default for ``clang-cl``.

When strict aliasing is enabled, Clang uses the type-based aliasing rules from the appropriate standard for the current language mode.  In the C standard, the aliasing rules are laid out in section 6.5 (Expressions).  In the C++ standard, the aliasing rules are laid out in [basic.lval].  For the most part, the C and C++ rules coincide and can be summarized as follows:

- An object can be accessed through an l-value of character type (e.g. ``char``).
- An object of integer type can be accessed through an l-value of different signedness; e.g. a ``signed short`` object can be accessed through an ``unsigned short`` l-value.
- Otherwise, objects can only be accessed through l-values of the type of the object.

For the exact rules, please consult the standards.  Clang generally reserves the flexibility to take advantage of the exact rules for the current language mode, except as noted here:

- While C gives all character types the power to arbitrarily alias, C++ reserves this to ``char`` and ``unsigned char``.  Clang relaxes this rule in C++ to match the C rule.

There are several ways to load from or store to an object as if it had a different type without violating the strict aliasing rule.  The most explicit and portable is to ``memcpy`` between the object and an object of the desired type; for aliasing purposes, ``memcpy`` behaves as if it used loads and stores of character type.  Clang also supports ``__attribute__((may_alias))``, which can be placed on a type declaration (such as a ``struct`` or ``typedef``) to give that type the equivalent aliasing power of a character type.

Clang uses an implementation model in which "sufficiently obvious" aliasing should override type-based assumptions.  Strict aliasing means that Clang will assume that `int*` and `float*` parameters to a function do not alias, and it may reorder loads and stores to those parameters accordingly.  However, if a `float*` parameter to a function is cast to `int*`, Clang will understand that the result of the cast still aliases the original parameter, and it should not reorder loads and stores to those pointers. This is only a best-effort attempt to avoid miscompiles, and programmers should generally still aim to write code which does not violate the strict aliasing rules, as discussed above.

An access to a member of an aggregate type (such as a ``struct``) is considered to also be an access to the aggregate.  This means that there must also be an object of the aggregate type at that location, and it means that accesses into different aggregates cannot alias.  This rule can be weakened to only consider the final accessed type using ``-fno-struct-path-tbaa``.

<document your new options here>
```

https://github.com/llvm/llvm-project/pull/75177


More information about the cfe-commits mailing list