[LLVMdev] [cfe-dev] ubsan - active member check for unions

Richard Smith richard at metafoo.co.uk
Thu Dec 18 17:26:52 PST 2014


On Mon, Dec 15, 2014 at 11:24 AM, Ismail Pazarbasi <
ismail.pazarbasi at gmail.com> wrote:
>
> Hi,
>
> I decided to implement a sanitizer that tracks the active member, and
> reports reads from inactive members.
>
> e.g.:
>   struct S {
>     union {
>       long l;
>       char s[42];
>       double d;
>     };
>   };
>   void f() {
>     S s;
>     s.d = 42.0;
>     if (s.l > 100) // fire here
>       ;
>   }
>
> My current implementation is naïve (also possibly wrong), and doesn't
> care about "special guarantee" that allows inspecting common initial
> sequence. Briefly;
> 1. in EmitBinaryOperatorLValue, if we are storing value of a union
> member, we call a ubsan runtime function (not a handler) that maps
> pointer value (of the union) to a tuple/struct (of TypeDescr, active
> field index, expression location that has changed the active member,
> and array of field names)
> 2. When a union member is being loaded, a call is emitted to a ubsan
> runtime function that both checks for, and reports access to inactive
> members
>

I do not believe this is correct, and I do not believe it can be made to be
correct. Here's why:

* In C, there is no restriction on accessing an inactive union member. The
only relevant rule is the effective type rule, in C11 6.5/6, that says that
you can't store to memory with one type and then load from that same memory
with an incompatible type. But beware! C is deeply confused, and C11
footnote 95 says that reading a union member with the wrong effective type
will produce "the appropriate part of the object representation of the
value [...] reinterpreted as an object representation in the new type", and
then refers the reader to another part of the standard which says no such
thing.

* In C++, the only standard way to change the active member of a union
object is by applying placement new to the member (and even that doesn't
work in some cases). No-one actually writes code that way, so every
implementation in practice allows the active union member to change in
various undocumented and non-standard ways.

* It is an explicit goal of UBSan to allow some TUs to be built with it
enabled and some to be built with it disabled. This check does not and
cannot support that, because the active union member / effective type can
change in uninstrumented code. So this does not belong in UBSan.

However...

Implementing a sanitizer for C's effective type rule seems like a good
idea. Such a sanitizer would also be correct in C++ (whose rules are
strictly less permissive), but not complete. The best way to approach this
would be to provide shadow memory indicating the effective type of real
memory. In order to keep things simple, you'd probably want to ignore
aggregate types and simply track the primitive effective type of memory.

Up to compatibility, there are not many such types (bool, char, short, int,
long, long long, float, double, long double, and some language extensions),
so you can probably get away with a 4 bit representation of the type.
Further, if a byte has effective type of (signed/unsigned/plain) char, the
adjacent byte must have that type too (due to alignment restrictions), so
you can use the same type representation for each pair of bytes. There are
likely to be enough spare values that you can separately encode [4 char],
[2 char][short], [short][2 char], and [short][short], which would let you
use a single macrotype for each 4 byte chunk of memory. That gives only a
12.5% memory overhead, which seems very respectable.

For above example, sanitizer says:
>   union-track-active-t1.cpp:11:9: runtime error: active field of union
> 'S::(anonymous union at union-track-active-t1.cpp:2:3)' is 'd'; access
> to field 'l' is undefined
>   union-track-active-t1.cpp:10:7: note: active index set to 'd' here
>
> Runtime functions I've added:
>   struct StringVec {
>     unsigned NumElts;
>     const char *Elts[];
>   };
>   struct UnionStaticData {
>     TypeDescriptor *Type;
>     StringVec* FieldNames;
>   };
>
> extern "C" SANITIZER_INTERFACE_ATTRIBUTE
> void __ubsan_union_set_active_field_index(uptr Addr, const
> UnionStaticData *Data,
>                                             const SourceLocation
> *ActivationLoc,
>                                             unsigned ActiveFieldIndex);
>
> extern "C" SANITIZER_INTERFACE_ATTRIBUTE
> void __ubsan_union_check_access(uptr Addr, const SourceLocation *Loc,
>                                   unsigned FieldIndex);
>
> For above code, IR looks like:
>   // s.d = 42.0;
>   store double 4.200000e+01, double* %d, align 8
>   // injected by sanitizer
>   // __ubsan_union_set_active_field_index(/*union addr*/&s,
>   // /*type descr=*/'S::anonymous union',
>   // /*field names*/{"l", "s", "d"},
>   // /*expr loc=*/"file.cpp:10:7",
>   // /*field index*/2);
>   %1 = bitcast %union.anon* %0 to i8*
>   call void @__ubsan_union_set_active_field_index(i8* %1, i8* bitcast
> ({ { i16, i16, [56 x i8] }*, { i32, [3 x i8*] }* }* @6 to i8*), i8*
> bitcast ({ [26 x i8]*, i32, i32 }* @5 to i8*), i32 2)
>
>   // if (s.l > 100L) ;
>   %2 = getelementptr inbounds %struct.S* %s, i32 0, i32 0
>   %l = bitcast %union.anon* %2 to i64*
>   %3 = bitcast i64* %l to i8*
>   call void @__ubsan_union_check_access(i8* %3, i8* bitcast ({ [26 x
> i8]*, i32, i32 }* @7 to i8*), i32 0)
>   %4 = load i64* %l, align 8
>   %cmp = icmp sgt i64 %4, 100
>
> I have a few questions regarding the overall design:
> 1. Do you think this is a useful check?
> 2. Where can I store type and field info about the union; some form of
> a shadow memory or a simple array/map?
> 3. Should sanitizer abort immediately or continue upon detection?
> 4. When/how can I remove entries from ubsan shadow memory when union's
> lifetime ends; perhaps in a module pass or at the end of each
> function?
>
> It's far from submission quality at the moment, that's why I am
> holding back the patch. Before proceeding further, I'd like to get
> some feedback as to whether this is a valuable feature, and whether I
> am on the right track.
>
> Ismail
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141218/3aee0288/attachment.html>


More information about the llvm-dev mailing list