[clang] [llvm] [Clang][IR] add TBAA metadata on pointer, union and array types. (PR #75177)
John McCall via cfe-commits
cfe-commits at lists.llvm.org
Thu Dec 28 22:23:16 PST 2023
================
@@ -184,13 +205,59 @@ llvm::MDNode *CodeGenTBAA::getTypeInfoHelper(const Type *Ty) {
return getChar();
// Handle pointers and references.
- // TODO: Implement C++'s type "similarity" and consider dis-"similar"
- // pointers distinct.
- if (Ty->isPointerType() || Ty->isReferenceType())
- return createScalarTypeNode("any pointer", getChar(), Size);
+ //
+ // In C11 for two pointer type to alias it is required for them to be
+ // compatible [section 6.5 p7].
----------------
rjmccall wrote:
This comment is pretty good, but it'd be better if we had a good comment on the function itself, and then this could be specifically in the context of that. Please add something like this as a doc comment on the entire function:
```
/// Return an LLVM TBAA metadata node appropriate for an access through
/// an l-value of the given type. Type-based alias analysis takes advantage
/// of the following rules from the language standards:
///
/// C 6.5p7:
/// An object shall have its stored value accessed only by an lvalue
/// expression that has one of the following types:
/// - a type compatible with the effective type of the object,
/// - a qualified version of a type compatible with the effective
/// type of the object,
/// - a type that is the signed or unsigned type corresponding
/// to the effective type of the object,
/// - a type that is the signed or unsigned type corresponding
/// to a qualified version of the effective type of the object,
/// - an aggregate or union type that includes one of the
/// aforementioned types among its members (including,
/// recursively, a member of a subaggregate or contained union), or
/// - a character type.
///
/// C++ [basic.lval]p11:
/// If a program attempts to access the stored value of an object
/// through a glvalue whose type is not similar to one of the following
/// types the behavior is undefined:
/// - the dynamic type of the object,
/// - a type that is the signed or unsigned type corresponding
/// to the dynamic type of the object, or
/// - a char, unsigned char, or std::byte type.
///
/// The C and C++ rules about effective/dynamic type are broadly similar
/// and permit memory to be reused with a different type. C does not have
/// an explicit operation to change the effective type of memory; any store
/// can do it. While C++ arguably does have such an operation (the standard
/// global `operator new(void*, size_t)`), in practice it is important to
/// be just as permissive as C. We therefore treat all stores as being able to
/// change the effective type of memory, regardless of language mode. That is,
/// loads have both a precondition and a postcondition on the effective
/// type of the memory, but stores only have a postcondition. This imposes
/// an inherent limitation that TBAA can only be used to reorder loads
/// before stores. This is quite restrictive, but we don't have much of a
/// choice. In practice, hoisting loads is the most important optimization
/// for alias analysis to enable anyway.
///
/// Therefore, given a load (and its precondition) and an earlier store
/// (and its postcondition), the question posed to TBAA is whether there
/// exists a type that is consistent with both accesses. If there isn't,
/// it's fine to hoist the load because either the memory is non-overlapping
/// or the precondition on the load is wrong (which would be UB).
///
/// LLVM TBAA says that two accesses with TBAA metadata nodes may alias if:
/// - the metadata nodes are the same,
/// - one of the metadata nodes is a base of the other (this can be
/// recursive, but it has to be the original node that's a base,
/// not just that the nodes have a common base), or
/// - one of the metadata nodes is a `tbaa.struct` node (the access
/// necessarily being a `memcpy`) with a subobject node that would
/// be allowed to alias with the other.
///
/// Our job here is to produce metadata nodes that will never say that
/// an alias is not allowed when there exists a type that would be consistent
/// with the types of the accesses from which the nodes were produced.
///
/// The last clause in both language rules permits character types to
/// alias objects of any type. We handle this by converting all character
/// types (as well as `std::byte` and types with the `mayalias` attribute)
/// to a single metadata node (the `char` node), then making sure that
/// that node is a base of every other metadata node we generate.
/// We can always just conservatively use this node if we aren't otherwise
/// sure how to implement the language rules for a type.
///
/// Read literally, the C rule for aggregates permits an aggregate l-value
/// (e.g. of type `struct { int x; }`) to be used to access an object that
/// is not part of an aggregate object of that type (e.g. a local variable
/// of type `int`). That case is perhaps sensical, but it would also permit
/// e.g. an l-value of type `struct { int x; float f; }` to be used to
/// access an object of type `float`, which is nonsense. We interpret this
/// clause as just intending to permit objects to be accessed through an
/// l-value that properly references a containing object.
///
/// C++ does not have an explicit rule for aggregates because in C++
/// a non-member access to an aggregate l-value is always a call to a
/// constructor or assignment operator, which then accesses all the
/// subobjects. In general, however, our interpretation of member
/// accesses is that they are also an access to the containing object
/// and therefore require such an object to exist at that address;
/// this permits us to just use the C rule for the accesses done by
/// trivial copy/move constructors/operators.
///
/// Both C and C++ permit some qualification differences. In C, however,
/// qualification can only differ at the outermost level, whereas C++
/// allows qualification to differ in nested positions through the
/// similar-types rule. This means that e.g. an l-value of type
/// `const float *` is not permitted to access an object of type
/// `float *` in C, but it is in C++. We use the C++ rule
/// unconditionally; the C rule is needlessly strict and frequently
/// violated in practice by code that we don't want to say is wrong.
/// We implement this by just discarding type qualifiers within pointer-like
/// types when deriving TBAA nodes; basically, we produce the TBAA node
/// for the type that is unqualified at all the recursive positions
/// considered by the C++ similar type rule. The implementation
/// doesn't actually construct this recursively-qualified type as a
/// `QualType`; it just ignores qualifiers when recursing into types.
///
/// The similar-type rule only really applies to the standard CVR
/// qualifiers, which never affect representations. Qualifiers such as
/// address spaces that may involve a representation difference would
/// be totally appropriate to distinguish for TBAA purposes. However,
/// the current implementation just discards all qualifiers.
///
/// We handle the signed/unsigned clause by just making unsigned types
/// use the the metadata node for the signed variant of the type. In the
/// language rules, this only applies at the outermost level, and e.g. an
/// l-value of type `signed int *` is not permitted to alias an object of
/// type `unsigned int *`. We choose not to distinguish those types when
/// pointer-type TBAA is enabled, however.
///
/// After discarding qualifiers and signedness differences as above,
/// the language rules come down to whether the types are compatible
/// (in C) or identical (in C++). Even in C, most types are compatible
/// only with themselves. The exceptions will be considered in the cases
/// below.
```
and then this comment can just be something like this:
```
// When PointerTBAA is disabled, all pointers and references use the same
// "any pointer" TBAA node. Otherwise, we generate a type-specific TBAA
// node and use the "any pointer" node as its base for compatibility between
// TUs with different settings. To implement the C++ similar-type rules
// (which we also adopt in C), we need to ignore qualifiers on the
// pointee type, and that has to be done recursively if the pointee type
// is itself a pointer-like type.
//
// Currently we ignore the differences between pointer-like types and just
// and use this tag for the type: `p<pointer depth> <inner type tag>`.
// This means we give e.g. `char **` and `char A::**` the same TBAA tag.
```
https://github.com/llvm/llvm-project/pull/75177
More information about the cfe-commits
mailing list