[LLVMdev] ubsan - active member check for unions

Ismail Pazarbasi ismail.pazarbasi at gmail.com
Mon Dec 15 11:24:22 PST 2014


I decided to implement a sanitizer that tracks the active member, and
reports reads from inactive members.

  struct S {
    union {
      long l;
      char s[42];
      double d;
  void f() {
    S s;
    s.d = 42.0;
    if (s.l > 100) // fire here

My current implementation is naïve (also possibly wrong), and doesn't
care about "special guarantee" that allows inspecting common initial
sequence. Briefly;
1. in EmitBinaryOperatorLValue, if we are storing value of a union
member, we call a ubsan runtime function (not a handler) that maps
pointer value (of the union) to a tuple/struct (of TypeDescr, active
field index, expression location that has changed the active member,
and array of field names)
2. When a union member is being loaded, a call is emitted to a ubsan
runtime function that both checks for, and reports access to inactive

For above example, sanitizer says:
  union-track-active-t1.cpp:11:9: runtime error: active field of union
'S::(anonymous union at union-track-active-t1.cpp:2:3)' is 'd'; access
to field 'l' is undefined
  union-track-active-t1.cpp:10:7: note: active index set to 'd' here

Runtime functions I've added:
  struct StringVec {
    unsigned NumElts;
    const char *Elts[];
  struct UnionStaticData {
    TypeDescriptor *Type;
    StringVec* FieldNames;

void __ubsan_union_set_active_field_index(uptr Addr, const
UnionStaticData *Data,
                                            const SourceLocation *ActivationLoc,
                                            unsigned ActiveFieldIndex);

void __ubsan_union_check_access(uptr Addr, const SourceLocation *Loc,
                                  unsigned FieldIndex);

For above code, IR looks like:
  // s.d = 42.0;
  store double 4.200000e+01, double* %d, align 8
  // injected by sanitizer
  // __ubsan_union_set_active_field_index(/*union addr*/&s,
  // /*type descr=*/'S::anonymous union',
  // /*field names*/{"l", "s", "d"},
  // /*expr loc=*/"file.cpp:10:7",
  // /*field index*/2);
  %1 = bitcast %union.anon* %0 to i8*
  call void @__ubsan_union_set_active_field_index(i8* %1, i8* bitcast
({ { i16, i16, [56 x i8] }*, { i32, [3 x i8*] }* }* @6 to i8*), i8*
bitcast ({ [26 x i8]*, i32, i32 }* @5 to i8*), i32 2)

  // if (s.l > 100L) ;
  %2 = getelementptr inbounds %struct.S* %s, i32 0, i32 0
  %l = bitcast %union.anon* %2 to i64*
  %3 = bitcast i64* %l to i8*
  call void @__ubsan_union_check_access(i8* %3, i8* bitcast ({ [26 x
i8]*, i32, i32 }* @7 to i8*), i32 0)
  %4 = load i64* %l, align 8
  %cmp = icmp sgt i64 %4, 100

I have a few questions regarding the overall design:
1. Do you think this is a useful check?
2. Where can I store type and field info about the union; some form of
a shadow memory or a simple array/map?
3. Should sanitizer abort immediately or continue upon detection?
4. When/how can I remove entries from ubsan shadow memory when union's
lifetime ends; perhaps in a module pass or at the end of each

It's far from submission quality at the moment, that's why I am
holding back the patch. Before proceeding further, I'd like to get
some feedback as to whether this is a valuable feature, and whether I
am on the right track.


More information about the llvm-dev mailing list