[llvm-dev] [RFC] Design of a TBAA sanitizer

Fri Apr 7 07:26:13 PDT 2017

> At EuroLLVM, Chandler and I chatted about the design for a potential
> TBAA sanitizer. Here's my attempt to summarize:
> 
> C/C++ have type-based aliasing rules, and LLVM's optimizer can exploit
> these given TBAA metadata added by Clang. Roughly, a pointer of given
> type cannot be used to access an object of a different type (with, of
> course, certain exceptions). Unfortunately, there's a lot of code in the
> wild that violates these rules (e.g. for type punning), and such code
> often must be built with -fno-strict-aliasing. Performance is often
> sacrificed as a result. Part of the problem is the difficulty of finding
> TBAA violations. A sanitizer would help.

It's not quite the same thing, but at last year's EuroLLVM, Chris 
Diamand and I presented a Clang-based tool for detecting bad pointer 
casts. This seems to be fairly good at identifying code which needs 
-fno-strict-aliasing... although it is not designed for checking of 
exactly that. <http://www.llvm.org/devmtg/2016-03/#presentation9>

Funnily enough, I was about to e-mail this list with some notes about 
that system, since I currently have some spare cycles which could 
perhaps help get the Clang/LLVM parts contributed. I'll mostly save 
those thoughts for a separate thread (appearing soon!), but continuing 
on-topic below....

> A design goal of a TBAA sanitizer is to limit the shadow-memory overhead
> of the implementation. ASan, for example, uses 1 bit per byte. Here
> we're hoping to keep the overhead down to 2 bits per byte for the TBAA
> sanitizing. We might be able to do this, while handling all common types
> on the fast path, if we use both alignment and type information. 

Slightly provocative question, but are you sure that byte-scale shadow 
memory is a good fit? It might be, depending on the rest of the design 
(more below). A possible alternative is to maintain metadata only at 
allocation granularity... my approach does this, and it helps keep the 
overhead pretty low in time and space, with two caveats. One is about 
checking casts, not dereferences (more below); the other is needing more 
complex run-time machinery to understand what "allocations" are. 
(However, that has a ton of other applications, so I believe it's very 
good value.)

> When
> accessing data of B bytes, 2*B bits of shadow memory can be used. Thus,
> we'll get 2 bits for a one-byte type, 4 bits for a two-byte type, etc.
> Moreover, we need appropriate holes in the encoding space so that no
> type has a shadow encoding that overlaps with an aligned part of a
> larger type's encoding. For example, we need to detect:
> 
>   double f = ...; return *(int*) &f; // We should catch this.
> 
> We might use the following encoding. The idea is that the common case,
> for which we need a reasonable fast path, is that type types are exactly
> equal. For this case, we want a simple comparison of the shadow type
> encodings to be sufficient to validate the access. For cases where the
> encodings don't match (and isn't zero to indicate an unknown type), or
> for which there is no direct encoding for the access type, a slow path
> must be used. All of this assumes that we're validating the the pointer
> alignment first, and then checking the type encodings.
> 
> 1 Byte:
> 00 = 0  = unknown type
> 01 = 1 = hole
> 10 = 2 = hole
> 11 = 3  = all one-byte types (slow path, see note later on this)
> 
> 2 Bytes:
> 0000 = 0 = unknown type
> 0101 = 5 = short
> 0110 = 6 = hole (A)
> 0111 = 7 = wchar_t (under some ABIs)
> 1001 = 9 = hole (B)
> 1010 = 10 = hole (C)
> 1011 = 11 = char16_t
> 1111 = 15 = all other types (slow path)
> 
> It is important here to have wchar_t have a direct encoding here because
> wchar_t is two bytes on Windows, and moreover, wchar_t is very commonly
> used on Windows. The partial encoding overlap of wchar_t (i.e. 0111)
> with the 11 one-byte-type encoding works because 11 always indicates a
> slow-path check.
> 
> 4 Bytes:
> 0000 0000 = 0 = unknown type
> A A = int
> A B = float
> B A  = pointer (under some ABIs)
> B B = long (under some ABIs)
> A 1111 = wchar_t (under some ABIs)
> B 1111 = char32_t
> A C = hole (D)
> C A = hole (E)
> B C = hole (F)
> C B = hole (G)
> C C = hole (H)
> 1111 1111 = 255 = all other types (slow path)
> 
> 8 Bytes:
> 0000 0000 0000 0000 = 0 = unknown type
> D D = double
> D E = long (under some ABIs)
> E D = long long (under some ABIs)
> E E = long double (under some ABIs)
> D F = pointer (under some ABIs)
> F D = hole (I)
> E F = hole (J)
> F E = hole
> F F = hole
> ...
> 1111 1111 1111 1111 = 65535 = all other types
> 
> 16 Bytes:
> 0 = unknown type
> | | = __int128_t
> I J = long long (under some ABIs)
> J I = long double (under some ABIs)
> J J = hole
> ...
> -1 = all other types

Just so I'm understanding: is this talking about shadowing memory with 
its "leaf" type only, or with its "top-level" type? So if I have, say, 
the following usually-64-bit type:

struct compound { int x; float y; }

... would we paint the shadow area with an alternating pattern of int 
(0110 0110) and floats (0110 1001)? Or would we be using the "all other 
types" thing since the memory is actually "compound"?

> For pointers, this scheme would consider all pointers to be the same
> (regardless of pointee type). Doing otherwise would mostly requiring
> putting pointer-type checking on the slow path (i.e. access via a
> pointer pointer), and that could add considerable overhead. We might,
> however, split out function pointers from other pointers. We could
> provide a compile-time option to control the granularity of pointer-type
> checks.

Aha, the pointers-to-pointers problem. Treating all pointers the same 
feels like a mistake because, say, aliasing a T* through a void** is 
frequently done and really quite useful (although technically UB, if I 
understand correctly; is this exploited in LLVM?), whereas aliasing a T* 
through an arbitrary S** is probably a bug.

Just for contrast: I'm familiar with this problem, and my way around 
this comes in a few parts. I check pointer creation (casts) only; checks 
on pointer use (dereference) are unnecessary. Usually it takes a while 
before people believe me on this, but the short story is that 
pointer-creation checks enforce an invariant that no bad-type pointer 
gets into circulation. So dereferences are always okay (until the first 
warning, at least; execution can continue after a warning, in a 
best-effort no-guarantees fashion). This also greatly helps performance. 
I precisely record the type of pointer-typed storage, e.g. so an array 
of T** really is recorded as such. There are then some special 
allowances for "generic pointers to pointers", including void** , which 
need checking *writes* through them. These checks ignore the 
dereferenced pointer's type, and check directly for a match between the 
type of the written-to storage (the pointer in target memory) and the 
type of the pointed-to object (i.e. what's on the end of the pointer 
being written). Caching can part-amortise repeated similar checks.

> Builtin vector types for which the vector element type has a direct
> encoding also naturally have a direct encoding (the concatenation of the
> encoding for the element type).
> 
> Obviously the fact that we have no fast-path encodings for one-byte
> types could be problematic. Note however that:
> 
>  1. If a larger type is being used to access a smaller type (plus more),
> the encodings won't match, so we always end up on the slow path.
> 
>  2. If the access type is a one-byte type, we would want to validate
> quickly. 
> However, most common one-byte types are universally aliasing (i.e. not
> subject to TBAA violations). Specifically, for C/C++, pointers to char,
> unsigned char, signed char (C only), and std::byte, can be used to
> access any part of any type. That leaves signed char (C++ only),
> bool/_Bool, and enums with a [signed/unsigned] char base type (C++ only,
> std::byte exempted) as pointee types we might wish to validate. We'd
> always need to fall back to the slow path to validate these. We could
> provide a compile-time option to disable such one-byte access checking
> if necessary.
> 
> How would the slow path work? First, the code needs to find the
> beginning of the allocation. It can do this by scanning backwards in the
> ASan shadow memory. Once located, we'll read a pointer to a
> type-description structure from that "red zone" location. For dynamic
> allocations, ASan's allocator will ensure such a space for the pointer
> exists. For static allocations and globals, the compiler will ensure it
> exists. The compiler will make sure that all constructors locate this
> field and fill it in. Destructors can clear it. If two of these
> type-description-structure pointers are equal, then we can conclude that
> the types are equal. If not, then we need to interpret the structure.
> The pointer itself might be to an interval map (to deal with arrays,
> placement new, etc. - we can use the low bit of the pointer to
> differentiate between an actual type-description structure and an
> interval map), and the leaves of the interval map point to actual
> type-description structures. The type-description structure is an array
> of (offset, type) pairs, where the type field is also a
> type-description-structure pointer. The type-description structures
> themselves are comdat globals emitted in each relevant translation unit,
> where the comdat key is formed using the mangled type name (and size,
> etc.), and pointers to these symbols are then used to identify the
> types.

Just as a note, my infrastructure uses a similar comdat trick for the
type descriptors, although the internal structure is quite a bit more
elaborate.

> Thoughts?

My high-level thought is the following question. Is the plan fixed on 
developing a tool specifically for C/C++ effective-type rules, i.e. 
focused on catching/fixing those particular cases of UB and with the 
goal of improving user code performance? Or is there any interest in the 
overlapping but distinct problem of helping users detect and debug 
bad-pointer problems of a type flavour (i.e. those that are not bounds 
errors or temporal errors)?

Plenty of pointer bugs are not UB... e.g. for any heap allocation 
("object without a declared type", in C11-speak) I'm allowed to scribble 
over it, thereby changing its effective type. It's only UB if I read it 
with its previous effective type. But the scribbling itself is fairly 
likely to be the bug, because programmers rarely want to change the type 
of an allocation mid-lifetime. Meanwhile, many pointer-type problems are 
too indirect or complex for the compiler to actually do TBAA-derived 
optimisation on, leaving little to be gained from the UB/TBAA point of 
view... but again, plenty from the debugging perspective. I suspect the 
biggest value of any tool, even if geared specifically towards TBAA, 
will turn out to be largely in debugging scenarios more general than 
effective-type UB bugs.

I realise all the above might come across as "here's the problem I'd 
solve instead" which may not be helpful... I'm interested by any tool in 
this general space, so looking forward to hearing more, and happy to 
follow up on any of the above,

Stephen.

PS: on top of the EuroLLVM '16 talk, in case you're wondering about how 
my system works, the following research papers might help. Or feel free 
just to ask questions....

- "Dynamically diagnosing run-time type errors in unsafe code" (OOPSLA '16)
<http://www.cl.cam.ac.uk/~srk31/#oopsla16a>

- "Towards a dynamic object model within Unix processes" (Onward '15) 
<http://www.cl.cam.ac.uk/~srk31/#onward15>

... or code if you prefer: 
<https://github.com/stephenrkell/liballocs>
<https://github.com/stephenrkell/libcrunch>
<https://github.com/chrisdiamand/clangcrunch>.