[PATCH] D32199: [TBAASan] A TBAA Sanitizer (Clang)

Thu Apr 20 08:39:20 PDT 2017

hfinkel added a comment.

In https://reviews.llvm.org/D32199#731472, @rsmith wrote:

> > As I've currently implemented it, both reads and writes set the type of previously-unknown storage, and after that it says fixed (unless you memcpy to it, memset it, or its lifetime ends (the type gets reset on lifetime.start/end and for malloc/allocas/etc.). There's a flag to enable the "writes always set the type" rule, but that's not the default. Is this too strict?
>
> That seems like it will have at least three flavors of false positive:
>
> 1. C's "effective type" rule allows writes to set the type pretty much unconditionally, unless the storage is for a variable with a declared type

To come back to this point: We don't really implement these rules now, and it is not clear that we will. The problem here is that, if we take the specification literally, then we can't use our current TBAA at all. The problem is that if we have:

  write x, !tbaa "int"
  read x, !tbaa "int"
  write x, !tbaa "float"

TBAA will currently tell us that the "float" write aliases with neither the preceding read nor the preceding write. As a result, we might move the "float" write to before the read (which is wrong) or before the write (also wrong). It seems to me that following the effective-type rules strictly will require that we only emit TBAA for memory accesses we can prove are to declared variables (as these are the only ones whose types don't get changed just by writing to them). We could certainly do that (*), although it is going to make TBAA awfully limited. As @dberlin has asserted, GCC does not implement these rules either.

To be fair, there are inferences we might draw from TBAA on all access that are not incorrect. For example, if we have:

  write x, !tbaa "int"
  write y, !tbaa "float"
  read x, !tbaa "int"

and we can indeed conclude that the write to y and the read from x don't alias (because the write happens before the read). This is because the effective type of y must be float after the write and so we know that the read from x must be accessing a different object. We can also conclude that the writes don't alias, but only because of the later read. The sad part is that if we use this information to reorder the read before the write to y (which we might do to eliminate the read), we now lose our ability to use TBAA to tell us anything.

Also, a strict reading of C's access rules seems to rule out the premise underlying our struct-path TBAA entirely. So long as I'm accessing a value using a struct that has some member, including recursively, with that type, then it's fine. The matching of the relative offsets is a sufficient, but not necessary, condition for well-defined access. C++ has essentially the same language (and, thus, potentially the same problem).

While I'd like the sanitizer to follow the rules, that's really useful only to the extent that the compilers follow the rules. If the compilers are making stronger assumptions, then I think that the sanitizer should also. OTOH, maybe we should change our TBAA representation/implementation to actually follow the rules, and then have a sanitizer that does the same.

(*) The best way I can think of to do this is to tag globals and allocas with tbaa according to their declared type, something similar for function arguments, and then in TBAA, instead of comparing the TBAA metadata of both operands directly, we call getUnderlyingObjects on the pointers, get the corresponding most-generic TBAA from the objects themselves, and then compare that TBAA to the TBAA from the other access.

> 2. After a placement new in C++, you should be able to use the storage as a new type
> 3. Storing directly to a member access on a union (ie, with the syntax `x.a = b`) in C++ permits using the storage as the new type

Yes, although for the sake of discussion, this is only true if a is a "non-class, non-array type, or of a class type with a trivial default constructor that is not
deleted, or an array of such types." That seems potentially useful.

> If we want to follow the relevant language rules by default, that would suggest that "writes always set the type" should be enabled by default in C and disabled by default in C++. That may not be the right decision for other reasons, though. In C++, writes through union members and new-expressions should probably (re)set the type (do you have intrinsics the frontend can use to do so?).

Also, in C, memcpy gets to copy the type for a destination that does not have declared types. It looks like the same is true for C++ for trivially-copyable types. Is the first read/write sets the unknown type (i.e. memory from malloc/calloc/memset, etc.) correct for C++ also? I recall discussing something along these lines in Kona.

https://reviews.llvm.org/D32199