[llvm-dev] [RFC] Design of a TBAA sanitizer

Tue Apr 4 13:13:24 PDT 2017

Hi everyone,

At EuroLLVM, Chandler and I chatted about the design for a potential 
TBAA sanitizer. Here's my attempt to summarize:

C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit 
these given TBAA metadata added by Clang. Roughly, a pointer of given 
type cannot be used to access an object of a different type (with, of 
course, certain exceptions). Unfortunately, there's a lot of code in the 
wild that violates these rules (e.g. for type punning), and such code 
often must be built with -fno-strict-aliasing. Performance is often 
sacrificed as a result. Part of the problem is the difficulty of finding 
TBAA violations. A sanitizer would help.

A design goal of a TBAA sanitizer is to limit the shadow-memory overhead 
of the implementation. ASan, for example, uses 1 bit per byte. Here 
we're hoping to keep the overhead down to 2 bits per byte for the TBAA 
sanitizing. We might be able to do this, while handling all common types 
on the fast path, if we use both alignment and type information. When 
accessing data of B bytes, 2*B bits of shadow memory can be used. Thus, 
we'll get 2 bits for a one-byte type, 4 bits for a two-byte type, etc. 
Moreover, we need appropriate holes in the encoding space so that no 
type has a shadow encoding that overlaps with an aligned part of a 
larger type's encoding.
For example, we need to detect:

   double f = ...; return *(int*) &f; // We should catch this.

We might use the following encoding. The idea is that the common case, 
for which we need a reasonable fast path, is that type types are exactly 
equal. For this case, we want a simple comparison of the shadow type 
encodings to be sufficient to validate the access. For cases where the 
encodings don't match (and isn't zero to indicate an unknown type), or 
for which there is no direct encoding for the access type, a slow path 
must be used. All of this assumes that we're validating the the pointer 
alignment first, and then checking the type encodings.

1 Byte:
00 = 0  = unknown type
01 = 1 = hole
10 = 2 = hole
11 = 3  = all one-byte types (slow path, see note later on this)

2 Bytes:
0000 = 0 = unknown type
0101 = 5 = short
0110 = 6 = hole (A)
0111 = 7 = wchar_t (under some ABIs)
1001 = 9 = hole (B)
1010 = 10 = hole (C)
1011 = 11 = char16_t
1111 = 15 = all other types (slow path)

It is important here to have wchar_t have a direct encoding here because 
wchar_t is two bytes on Windows, and moreover, wchar_t is very commonly 
used on Windows. The partial encoding overlap of wchar_t (i.e. 0111) 
with the 11 one-byte-type encoding works because 11 always indicates a 
slow-path check.

4 Bytes:
0000 0000 = 0 = unknown type
A A = int
A B = float
B A  = pointer (under some ABIs)
B B = long (under some ABIs)
A 1111 = wchar_t (under some ABIs)
B 1111 = char32_t
A C = hole (D)
C A = hole (E)
B C = hole (F)
C B = hole (G)
C C = hole (H)
1111 1111 = 255 = all other types (slow path)

8 Bytes:
0000 0000 0000 0000 = 0 = unknown type
D D = double
D E = long (under some ABIs)
E D = long long (under some ABIs)
E E = long double (under some ABIs)
D F = pointer (under some ABIs)
F D = hole (I)
E F = hole (J)
F E = hole
F F = hole
...
1111 1111 1111 1111 = 65535 = all other types

16 Bytes:
0 = unknown type
| | = __int128_t
I J = long long (under some ABIs)
J I = long double (under some ABIs)
J J = hole
...
-1 = all other types

For pointers, this scheme would consider all pointers to be the same 
(regardless of pointee type). Doing otherwise would mostly requiring 
putting pointer-type checking on the slow path (i.e. access via a 
pointer pointer), and that could add considerable overhead. We might, 
however, split out function pointers from other pointers. We could 
provide a compile-time option to control the granularity of pointer-type 
checks.

Builtin vector types for which the vector element type has a direct 
encoding also naturally have a direct encoding (the concatenation of the 
encoding for the element type).

Obviously the fact that we have no fast-path encodings for one-byte 
types could be problematic. Note however that:

  1. If a larger type is being used to access a smaller type (plus 
more), the encodings won't match, so we always end up on the slow path.

  2. If the access type is a one-byte type, we would want to validate 
quickly. However, most common one-byte types are universally aliasing 
(i.e. not subject to TBAA violations). Specifically, for C/C++, pointers 
to char, unsigned char, signed char (C only), and std::byte, can be used 
to access any part of any type. That leaves signed char (C++ only), 
bool/_Bool, and enums with a [signed/unsigned] char base type (C++ only, 
std::byte exempted) as pointee types we might wish to validate. We'd 
always need to fall back to the slow path to validate these. We could 
provide a compile-time option to disable such one-byte access checking 
if necessary.

How would the slow path work? First, the code needs to find the 
beginning of the allocation. It can do this by scanning backwards in the 
ASan shadow memory. Once located, we'll read a pointer to a 
type-description structure from that "red zone" location. For dynamic 
allocations, ASan's allocator will ensure such a space for the pointer 
exists. For static allocations and globals, the compiler will ensure it 
exists. The compiler will make sure that all constructors locate this 
field and fill it in. Destructors can clear it. If two of these 
type-description-structure pointers are equal, then we can conclude that 
the types are equal. If not, then we need to interpret the structure. 
The pointer itself might be to an interval map (to deal with arrays, 
placement new, etc. - we can use the low bit of the pointer to 
differentiate between an actual type-description structure and an 
interval map), and the leaves of the interval map point to actual 
type-description structures. The type-description structure is an array 
of (offset, type) pairs, where the type field is also a 
type-description-structure pointer. The type-description structures 
themselves are comdat globals emitted in each relevant translation unit, 
where the comdat key is formed using the mangled type name (and size, 
etc.), and pointers to these symbols are then used to identify the types.

Thoughts?

Thanks again,
Hal

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory