[llvm-dev] [RFC] Design of a TBAA sanitizer

Tue Apr 11 13:37:20 PDT 2017

Hi Kostya,

On April 11, 2017 at 1:30:10 PM, Kostya Serebryany (kcc at google.com) wrote:

> of course, but accesses are done via pointers, and if TBAA queries
> MayAlias(AccessViaP1, AccessViaP2)
> there should (??) be a point in the IR where both P1 and P2 exist together
> and can be compared.

That may not be possible (though I'm second guessing what exactly you
have in mind so maybe I'm missing something here):

ptr0 = malloc();
*(int*)ptr0 = 20;  // access0
free(ptr0);
ptr1 = calloc();   // bitwise equal to ptr0 by chance
float f = *(float *)ptr1;  // access1

The program above is fine (no TBAA violations), but at location
access1 ptr1 and ptr0 overlap despite being NoAlias.

-- Sanjoy

>
>
> >
> > This means, at least in LLVM IR, you have to account for cases like:
> >
> > if (A)
> > *(int *)ptr = 20;
> > if (B)
> > float f = *(float *)ptr;
> >
> > where ptr does not have a type (according to TBAA), but you want to
> > crash the program if both A and B are true.
> >
> > -- Sanjoy
> >
> > > For larger functions -- only for those pairs where the optimizer actually
> > > queried MayAlias(P1, P2).
> > >
> > > --kcc
> > >
> > >
> > > On Tue, Apr 11, 2017 at 3:49 AM, Hal Finkel via llvm-dev <
> > > llvm-dev at lists.llvm.org> wrote:
> > >
> > > >
> > > > On 04/11/2017 03:46 AM, Andrey Bokhanko wrote:
> > > >
> > > > Hal,
> > > >
> > > > To clarify, my example meant to illustrate that for memory references
> > to
> > > > structures' fields you have to keep a user-defined type, even for one
> > byte
> > > > accesses. C++ allows references to "initial member sequence" using
> > pointers
> > > > to structures of different types. And yes, there are programs in the
> > wild
> > > > that rely on this (don't remember details... this was from previous
> > life).
> > > >
> > > > Another thing to consider -- what about memset / memcpy / etc that
> > > > inherently rely on type punning? If not inlined, they don't present
> > > > problems for an optimizer -- probably shouldn't be considered as
> > aliasing
> > > > rules violation?
> > > >
> > > >
> > > > Good point. You (likely) wouldn't want to compile your memcpy / memset
> > /
> > > > etc. implementations with the TBAA sanitizer enabled (or we'd need to
> > > > provide an attribute to disable it for certain functions). If they
> > only use
> > > > char* themselves, then it's fine, but if not, that's implementation
> > magic.
> > > > However, memset, etc. does need to be able to 'unset' the type of
> > memory
> > > > and memcpy needs to be able to copy types (at least for PODs). The
> > > > sanitizer would need to hook them for that purpose.
> > > >
> > > > -Hal
> > > >
> > > >
> > > >
> > > > Yours,
> > > > Andrey
> > > > ===
> > > > Compiler Architect
> > > > NXP
> > > >
> > > >
> > > > On Tue, Apr 11, 2017 at 12:05 AM, Hal Finkel wrote:
> > > >
> > > >>
> > > >> On 04/10/2017 09:55 AM, Andrey Bokhanko wrote:
> > > >>
> > > >> Hi Hal,
> > > >>
> > > >> I wonder how your solution will handle the following?
> > > >>
> > > >> struct {
> > > >> int s1_f1;
> > > >> float s1_f2;
> > > >> int s1_f3;
> > > >> float s1_f4;
> > > >> } S1;
> > > >>
> > > >> struct {
> > > >> int s2_f1;
> > > >> float s2_f2;
> > > >> int *s2_f3; // to add some interest, suppose that sizeof(int) ==
> > > >> sizeof(int *)
> > > >> float s2_f4;
> > > >> } S2;
> > > >>
> > > >> S1 *s1; S2 *s2;
> > > >> ...
> > > >> s2 = (S1*)s1;
> > > >> s2->s2_f1 = 0; // allowed
> > > >> s2->s2_f2 = 0; // allowed
> > > >> s2->s2_f3 = 0; // not allowed
> > > >> s2->s2_f4 = 0; // not allowed
> > > >>
> > > >> Also, when you plan to set types for allocated memory?
> > > >>
> > > >>
> > > >> The most-general thing seems to be to set the types along with a
> > store.
> > > >> As a result, the proposed scheme would not find a fault with the code
> > > >> above, but would complain if anyone actually later read S1.s1_f3.
> > > >>
> > > >> If we want to catch these kinds of problems directly we'd need to have
> > > >> the compiler insert code when the type is constructed to mark the
> > types,
> > > >> and then we'd need to check those types around stores. This also
> > sounds
> > > >> like a useful enhancement (although somewhat more complicated to
> > implement).
> > > >>
> > > >> What types will be set for memory allocated by a malloc call?
> > > >>
> > > >>
> > > >> Memory would be untyped (or of unknown type) when allocated.
> > > >>
> > > >> Thanks again,
> > > >> Hal
> > > >>
> > > >>
> > > >>
> > > >> Yours,
> > > >> Andrey
> > > >>
> > > >>
> > > >> On Tue, Apr 4, 2017 at 10:13 PM, Hal Finkel via llvm-dev <
> > > >> llvm-dev at lists.llvm.org> wrote:
> > > >>
> > > >>> Hi everyone,
> > > >>>
> > > >>> At EuroLLVM, Chandler and I chatted about the design for a potential
> > > >>> TBAA sanitizer. Here's my attempt to summarize:
> > > >>>
> > > >>> C/C++ have type-based aliasing rules, and LLVM's optimizer can
> > exploit
> > > >>> these given TBAA metadata added by Clang. Roughly, a pointer of
> > given type
> > > >>> cannot be used to access an object of a different type (with, of
> > course,
> > > >>> certain exceptions). Unfortunately, there's a lot of code in the
> > wild that
> > > >>> violates these rules (e.g. for type punning), and such code often
> > must be
> > > >>> built with -fno-strict-aliasing. Performance is often sacrificed as a
> > > >>> result. Part of the problem is the difficulty of finding TBAA
> > violations. A
> > > >>> sanitizer would help.
> > > >>>
> > > >>> A design goal of a TBAA sanitizer is to limit the shadow-memory
> > overhead
> > > >>> of the implementation. ASan, for example, uses 1 bit per byte. Here
> > we're
> > > >>> hoping to keep the overhead down to 2 bits per byte for the TBAA
> > > >>> sanitizing. We might be able to do this, while handling all common
> > types on
> > > >>> the fast path, if we use both alignment and type information. When
> > > >>> accessing data of B bytes, 2*B bits of shadow memory can be used.
> > Thus,
> > > >>> we'll get 2 bits for a one-byte type, 4 bits for a two-byte type,
> > etc.
> > > >>> Moreover, we need appropriate holes in the encoding space so that no
> > type
> > > >>> has a shadow encoding that overlaps with an aligned part of a larger
> > type's
> > > >>> encoding.
> > > >>> For example, we need to detect:
> > > >>>
> > > >>> double f = ...; return *(int*) &f; // We should catch this.
> > > >>>
> > > >>> We might use the following encoding. The idea is that the common
> > case,
> > > >>> for which we need a reasonable fast path, is that type types are
> > exactly
> > > >>> equal. For this case, we want a simple comparison of the shadow type
> > > >>> encodings to be sufficient to validate the access. For cases where
> > the
> > > >>> encodings don't match (and isn't zero to indicate an unknown type),
> > or for
> > > >>> which there is no direct encoding for the access type, a slow path
> > must be
> > > >>> used. All of this assumes that we're validating the the pointer
> > alignment
> > > >>> first, and then checking the type encodings.
> > > >>>
> > > >>> 1 Byte:
> > > >>> 00 = 0 = unknown type
> > > >>> 01 = 1 = hole
> > > >>> 10 = 2 = hole
> > > >>> 11 = 3 = all one-byte types (slow path, see note later on this)
> > > >>>
> > > >>> 2 Bytes:
> > > >>> 0000 = 0 = unknown type
> > > >>> 0101 = 5 = short
> > > >>> 0110 = 6 = hole (A)
> > > >>> 0111 = 7 = wchar_t (under some ABIs)
> > > >>> 1001 = 9 = hole (B)
> > > >>> 1010 = 10 = hole (C)
> > > >>> 1011 = 11 = char16_t
> > > >>> 1111 = 15 = all other types (slow path)
> > > >>>
> > > >>> It is important here to have wchar_t have a direct encoding here
> > because
> > > >>> wchar_t is two bytes on Windows, and moreover, wchar_t is very
> > commonly
> > > >>> used on Windows. The partial encoding overlap of wchar_t (i.e. 0111)
> > with
> > > >>> the 11 one-byte-type encoding works because 11 always indicates a
> > slow-path
> > > >>> check.
> > > >>>
> > > >>> 4 Bytes:
> > > >>> 0000 0000 = 0 = unknown type
> > > >>> A A = int
> > > >>> A B = float
> > > >>> B A = pointer (under some ABIs)
> > > >>> B B = long (under some ABIs)
> > > >>> A 1111 = wchar_t (under some ABIs)
> > > >>> B 1111 = char32_t
> > > >>> A C = hole (D)
> > > >>> C A = hole (E)
> > > >>> B C = hole (F)
> > > >>> C B = hole (G)
> > > >>> C C = hole (H)
> > > >>> 1111 1111 = 255 = all other types (slow path)
> > > >>>
> > > >>> 8 Bytes:
> > > >>> 0000 0000 0000 0000 = 0 = unknown type
> > > >>> D D = double
> > > >>> D E = long (under some ABIs)
> > > >>> E D = long long (under some ABIs)
> > > >>> E E = long double (under some ABIs)
> > > >>> D F = pointer (under some ABIs)
> > > >>> F D = hole (I)
> > > >>> E F = hole (J)
> > > >>> F E = hole
> > > >>> F F = hole
> > > >>> ...
> > > >>> 1111 1111 1111 1111 = 65535 = all other types
> > > >>>
> > > >>> 16 Bytes:
> > > >>> 0 = unknown type
> > > >>> | | = __int128_t
> > > >>> I J = long long (under some ABIs)
> > > >>> J I = long double (under some ABIs)
> > > >>> J J = hole
> > > >>> ...
> > > >>> -1 = all other types
> > > >>>
> > > >>> For pointers, this scheme would consider all pointers to be the same
> > > >>> (regardless of pointee type). Doing otherwise would mostly requiring
> > > >>> putting pointer-type checking on the slow path (i.e. access via a
> > pointer
> > > >>> pointer), and that could add considerable overhead. We might,
> > however,
> > > >>> split out function pointers from other pointers. We could provide a
> > > >>> compile-time option to control the granularity of pointer-type
> > checks.
> > > >>>
> > > >>> Builtin vector types for which the vector element type has a direct
> > > >>> encoding also naturally have a direct encoding (the concatenation of
> > the
> > > >>> encoding for the element type).
> > > >>>
> > > >>> Obviously the fact that we have no fast-path encodings for one-byte
> > > >>> types could be problematic. Note however that:
> > > >>>
> > > >>> 1. If a larger type is being used to access a smaller type (plus
> > more),
> > > >>> the encodings won't match, so we always end up on the slow path.
> > > >>>
> > > >>> 2. If the access type is a one-byte type, we would want to validate
> > > >>> quickly. However, most common one-byte types are universally
> > aliasing (i.e.
> > > >>> not subject to TBAA violations). Specifically, for C/C++, pointers
> > to char,
> > > >>> unsigned char, signed char (C only), and std::byte, can be used to
> > access
> > > >>> any part of any type. That leaves signed char (C++ only),
> > bool/_Bool, and
> > > >>> enums with a [signed/unsigned] char base type (C++ only, std::byte
> > > >>> exempted) as pointee types we might wish to validate. We'd always
> > need to
> > > >>> fall back to the slow path to validate these. We could provide a
> > > >>> compile-time option to disable such one-byte access checking if
> > necessary.
> > > >>>
> > > >>> How would the slow path work? First, the code needs to find the
> > > >>> beginning of the allocation. It can do this by scanning backwards in
> > the
> > > >>> ASan shadow memory. Once located, we'll read a pointer to a
> > > >>> type-description structure from that "red zone" location. For dynamic
> > > >>> allocations, ASan's allocator will ensure such a space for the
> > pointer
> > > >>> exists. For static allocations and globals, the compiler will ensure
> > it
> > > >>> exists. The compiler will make sure that all constructors locate
> > this field
> > > >>> and fill it in. Destructors can clear it. If two of these
> > > >>> type-description-structure pointers are equal, then we can conclude
> > that
> > > >>> the types are equal. If not, then we need to interpret the
> > structure. The
> > > >>> pointer itself might be to an interval map (to deal with arrays,
> > placement
> > > >>> new, etc. - we can use the low bit of the pointer to differentiate
> > between
> > > >>> an actual type-description structure and an interval map), and the
> > leaves
> > > >>> of the interval map point to actual type-description structures. The
> > > >>> type-description structure is an array of (offset, type) pairs,
> > where the
> > > >>> type field is also a type-description-structure pointer. The
> > > >>> type-description structures themselves are comdat globals emitted in
> > each
> > > >>> relevant translation unit, where the comdat key is formed using the
> > mangled
> > > >>> type name (and size, etc.), and pointers to these symbols are then
> > used to
> > > >>> identify the types.
> > > >>>
> > > >>> Thoughts?
> > > >>>
> > > >>> Thanks again,
> > > >>> Hal
> > > >>>
> > > >>> --
> > > >>> Hal Finkel
> > > >>> Lead, Compiler Technology and Programming Languages
> > > >>> Leadership Computing Facility
> > > >>> Argonne National Laboratory
> > > >>>
> > > >>> _______________________________________________
> > > >>> LLVM Developers mailing list
> > > >>> llvm-dev at lists.llvm.org
> > > >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > > >>>
> > > >>
> > > >>
> > > >> --
> > > >> Hal Finkel
> > > >> Lead, Compiler Technology and Programming Languages
> > > >> Leadership Computing Facility
> > > >> Argonne National Laboratory
> > > >>
> > > >>
> > > >
> > > > --
> > > > Hal Finkel
> > > > Lead, Compiler Technology and Programming Languages
> > > > Leadership Computing Facility
> > > > Argonne National Laboratory
> > > >
> > > >
> > > > _______________________________________________
> > > > LLVM Developers mailing list
> > > > llvm-dev at lists.llvm.org
> > > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > > >
> > > >
> > > _______________________________________________
> > > LLVM Developers mailing list
> > > llvm-dev at lists.llvm.org
> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > >
> >
>