[PATCH] Implement function type checker for the undefined behavior sanitizer.
Peter Collingbourne
peter at pcc.me.uk
Fri Oct 18 16:09:55 PDT 2013
> It would be polite to issue a diagnostic if someone explicitly asks for this (not via -fsanitize=undefined) and we don't support it for their target.
Hmm, we should probably be doing this for other sanitizers too (e.g. MSan and DFSan are only supported on 64-bit Linux).
================
Comment at: lib/CodeGen/CGExpr.cpp:3128
@@ +3127,3 @@
+ if (getLangOpts().CPlusPlus && SanOpts->Function &&
+ !dyn_cast_or_null<FunctionDecl>(TargetDecl)) {
+ if (llvm::Constant* PDSig =
----------------
Richard Smith wrote:
> This is a bit hard to parse. `(!TargetDecl || !isa<FunctionDecl>(TargetDecl))` maybe?
Done.
================
Comment at: lib/CodeGen/CGExpr.cpp:3129
@@ +3128,3 @@
+ !dyn_cast_or_null<FunctionDecl>(TargetDecl)) {
+ if (llvm::Constant* PDSig =
+ CGM.getTargetCodeGenInfo().getUBSanFunctionSignature(CGM)) {
----------------
Richard Smith wrote:
> `*` on the right.
Done.
================
Comment at: lib/CodeGen/CGExpr.cpp:3133-3134
@@ +3132,4 @@
+ CGM.GetAddrOfRTTIDescriptor(QualType(FnType, 0), /*ForEH=*/true);
+ llvm::Type *PDStructTyElems[] = {PDSig->getType(),
+ FTRTTIConst->getType()};
+ llvm::StructType *PDStructTy = llvm::StructType::get(
----------------
Richard Smith wrote:
> Richard Smith wrote:
> > We usually format this as
> >
> > llvm::Type *PDStructTyElems[] = {
> > PDSig->getType(),
> > FTRTTIConst->getType()
> > };
> >
> > or similar.
> Some indication of what "PD" stands for here would be useful.
Done. clang-format prefers something closer to the formatting I originally used; I'll raise this with the clang-format folks.
================
Comment at: lib/CodeGen/CGExpr.cpp:3138-3142
@@ +3137,7 @@
+
+ llvm::Value *CalleePDStruct = Builder.CreateBitCast(
+ Callee, llvm::PointerType::getUnqual(PDStructTy));
+ llvm::Value *CalleeSigPtr =
+ Builder.CreateConstGEP2_32(CalleePDStruct, 0, 0);
+ llvm::Value *CalleeSig = Builder.CreateLoad(CalleeSigPtr);
+ llvm::Value *CalleeSigMatch = Builder.CreateICmpEQ(CalleeSig, PDSig);
----------------
Richard Smith wrote:
> What happens if the callee address is at the end of a page and doesn't have the extra data? Could this load trigger a segfault?
It could in principle. In practice, Clang always aligns function bodies to 16 bytes, GCC does it at -O>=2 and GCC (4.4 and 4.6) codegens an empty function body at -O0 with >4 bytes. A quick survey of a number of system libraries (libc, libm, libstdc++) on an Ubuntu machine reveals that the functions are suitably aligned. I think the circumstances in which a segfault might occur are sufficiently rare as to justify doing a single load. If it does turn out to be a problem in practice we could consider splitting the load (perhaps conditional on a command line flag).
================
Comment at: lib/CodeGen/CGExpr.cpp:3155-3156
@@ +3154,4 @@
+ Builder.CreateICmpEQ(CalleeRTTI, FTRTTIConst);
+ llvm::Constant *StaticData[] = {EmitCheckSourceLocation(CallLoc),
+ EmitCheckTypeDescriptor(CalleeType)};
+ EmitCheck(CalleeRTTIMatch,
----------------
Richard Smith wrote:
> Likewise.
Done.
================
Comment at: lib/CodeGen/CodeGenFunction.cpp:523-524
@@ -521,1 +522,4 @@
+ // If we are checking function types, emit a function type signature as
+ // prefix data.
+ if (getLangOpts().CPlusPlus && SanOpts->Function) {
----------------
Richard Smith wrote:
> It'd be nice if we could only do this for address-taken functions. When we take the address of a function, we could emit `linkonce_odr` thunk with the extra data, and use that instead of the original address... except that would break address-of-function comparisons between instrumented and uninstrumented code. Can we provide an option to enable that behavior for the case where we're OK with an ABI change?
I like the idea of doing this as an option, but I would prefer to land this version first.
================
Comment at: lib/CodeGen/CodeGenFunction.cpp:529-530
@@ +528,4 @@
+ CGM.getTargetCodeGenInfo().getUBSanFunctionSignature(CGM)) {
+ llvm::Constant *FTRTTIConst =
+ CGM.GetAddrOfRTTIDescriptor(FD->getType(), /*ForEH=*/true);
+ llvm::Constant *PDStructElems[] = {PDSig, FTRTTIConst};
----------------
Richard Smith wrote:
> Does this include the calling convention attributes?
Not with the Itanium ABI, which does not have a representation for calling conventions, but the Microsoft ABI (for which calling conventions matter more) does, so once RTTI has been implemented for that ABI, this should respect calling conventions there.
================
Comment at: lib/CodeGen/CodeGenFunction.cpp:531
@@ +530,3 @@
+ CGM.GetAddrOfRTTIDescriptor(FD->getType(), /*ForEH=*/true);
+ llvm::Constant *PDStructElems[] = {PDSig, FTRTTIConst};
+ llvm::Constant *PDStructConst =
----------------
Richard Smith wrote:
> Spaces around braces.
Done.
================
Comment at: lib/CodeGen/CGExpr.cpp:3133
@@ +3132,3 @@
+ CGM.GetAddrOfRTTIDescriptor(QualType(FnType, 0), /*ForEH=*/true);
+ llvm::Type *PDStructTyElems[] = {PDSig->getType(),
+ FTRTTIConst->getType()};
----------------
Peter Collingbourne wrote:
> Richard Smith wrote:
> > Richard Smith wrote:
> > > We usually format this as
> > >
> > > llvm::Type *PDStructTyElems[] = {
> > > PDSig->getType(),
> > > FTRTTIConst->getType()
> > > };
> > >
> > > or similar.
> > Some indication of what "PD" stands for here would be useful.
> Done. clang-format prefers something closer to the formatting I originally used; I'll raise this with the clang-format folks.
Globally replaced 'PD' with 'Prefix'.
================
Comment at: lib/CodeGen/TargetInfo.cpp:607-608
@@ +606,4 @@
+ (0x06 << 8) | // .+0x08
+ ('F' << 16) |
+ ('T' << 24);
+ return llvm::ConstantInt::get(CGM.Int32Ty, Sig);
----------------
Richard Smith wrote:
> What do 'F' and 'T' demangle as? An invalid instruction encoding would give me more confidence here.
"rex.RX push %rsp", according to objdump. But I don't think it really matters what this decodes to. If the instruction pointer finds itself here the situation isn't dissimilar to jumping into the middle of a multibyte instruction. (Besides, the RTTI pointer could decode to anything.)
http://llvm-reviews.chandlerc.com/D1338
More information about the cfe-commits
mailing list