[LLVMdev] ubsan - active member check for unions
Ismail Pazarbasi
ismail.pazarbasi at gmail.com
Mon Dec 15 11:24:22 PST 2014
Hi,
I decided to implement a sanitizer that tracks the active member, and
reports reads from inactive members.
e.g.:
struct S {
union {
long l;
char s[42];
double d;
};
};
void f() {
S s;
s.d = 42.0;
if (s.l > 100) // fire here
;
}
My current implementation is naïve (also possibly wrong), and doesn't
care about "special guarantee" that allows inspecting common initial
sequence. Briefly;
1. in EmitBinaryOperatorLValue, if we are storing value of a union
member, we call a ubsan runtime function (not a handler) that maps
pointer value (of the union) to a tuple/struct (of TypeDescr, active
field index, expression location that has changed the active member,
and array of field names)
2. When a union member is being loaded, a call is emitted to a ubsan
runtime function that both checks for, and reports access to inactive
members
For above example, sanitizer says:
union-track-active-t1.cpp:11:9: runtime error: active field of union
'S::(anonymous union at union-track-active-t1.cpp:2:3)' is 'd'; access
to field 'l' is undefined
union-track-active-t1.cpp:10:7: note: active index set to 'd' here
Runtime functions I've added:
struct StringVec {
unsigned NumElts;
const char *Elts[];
};
struct UnionStaticData {
TypeDescriptor *Type;
StringVec* FieldNames;
};
extern "C" SANITIZER_INTERFACE_ATTRIBUTE
void __ubsan_union_set_active_field_index(uptr Addr, const
UnionStaticData *Data,
const SourceLocation *ActivationLoc,
unsigned ActiveFieldIndex);
extern "C" SANITIZER_INTERFACE_ATTRIBUTE
void __ubsan_union_check_access(uptr Addr, const SourceLocation *Loc,
unsigned FieldIndex);
For above code, IR looks like:
// s.d = 42.0;
store double 4.200000e+01, double* %d, align 8
// injected by sanitizer
// __ubsan_union_set_active_field_index(/*union addr*/&s,
// /*type descr=*/'S::anonymous union',
// /*field names*/{"l", "s", "d"},
// /*expr loc=*/"file.cpp:10:7",
// /*field index*/2);
%1 = bitcast %union.anon* %0 to i8*
call void @__ubsan_union_set_active_field_index(i8* %1, i8* bitcast
({ { i16, i16, [56 x i8] }*, { i32, [3 x i8*] }* }* @6 to i8*), i8*
bitcast ({ [26 x i8]*, i32, i32 }* @5 to i8*), i32 2)
// if (s.l > 100L) ;
%2 = getelementptr inbounds %struct.S* %s, i32 0, i32 0
%l = bitcast %union.anon* %2 to i64*
%3 = bitcast i64* %l to i8*
call void @__ubsan_union_check_access(i8* %3, i8* bitcast ({ [26 x
i8]*, i32, i32 }* @7 to i8*), i32 0)
%4 = load i64* %l, align 8
%cmp = icmp sgt i64 %4, 100
I have a few questions regarding the overall design:
1. Do you think this is a useful check?
2. Where can I store type and field info about the union; some form of
a shadow memory or a simple array/map?
3. Should sanitizer abort immediately or continue upon detection?
4. When/how can I remove entries from ubsan shadow memory when union's
lifetime ends; perhaps in a module pass or at the end of each
function?
It's far from submission quality at the moment, that's why I am
holding back the patch. Before proceeding further, I'd like to get
some feedback as to whether this is a valuable feature, and whether I
am on the right track.
Ismail
More information about the llvm-dev
mailing list