[llvm-commits] [PATCH] MemorySanitizer instrumentation pass.

Chandler Carruth chandlerc at gmail.com
Sat Oct 20 01:50:42 PDT 2012


  Deleted some comments that were already handled, and added some.

  I still need to make a more detailed pass over the visitor.

  One meta-comment -- you might consider adding the VA-args intrinsics to the IntrinsicInst thing, and adding visitor support for them much the same as the memory intrinsics now have. I'm happy with doing that before or afterward, but in a separate commit either way.


================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:106
@@ +105,3 @@
+// As of 2012-08-28 this flag adds 20% slowdown.
+static cl::opt<bool> ClTrapOnDirtyAccess("msan-trap-on-dirty-access",
+       cl::desc("trap on access to a pointer which has poisoned shadow"),
----------------
How about "msan-check-access-address"?

In general, I think the flag names could be regularized.

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:120
@@ +119,3 @@
+
+/// MemorySanitizer: instrument the code in module to find uninitialized reads.
+struct MemorySanitizer : public FunctionPass {
----------------
Please use a proper doxygen class comment, with plenty of details here.

The way I would differentiate between this comment and the file comment:
- The class comment is for clients of the pass. Not necessarily people hacking on the class.
- The file comment is for developers working on the pass.

The file comment can defer to the class comment for details of how to use the current incarnation, and give background and all the other great information you have in the existing file comment. =]

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:121
@@ +120,3 @@
+/// MemorySanitizer: instrument the code in module to find uninitialized reads.
+struct MemorySanitizer : public FunctionPass {
+  MemorySanitizer() : FunctionPass(ID), TD(NULL) {  }
----------------
Typically, passes are 'class'es with most of their implementation details private, and mostly just the FunctionPass interface bits public.

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:122
@@ +121,3 @@
+struct MemorySanitizer : public FunctionPass {
+  MemorySanitizer() : FunctionPass(ID), TD(NULL) {  }
+  const char *getPassName() const { return "MemorySanitizer"; }
----------------
LLVM style is to use '0' for null poniters.

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:163
@@ +162,3 @@
+
+// Split the basic block and insert an if-then code.
+// Before:
----------------
I would appreciate using doxygen comments throughout the pass. I think it makes it easier to read them.

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:177-178
@@ +176,4 @@
+//
+// FIXME: AddressSanitizer has a similar function.
+// What is the best place to move it?
+static BranchInst *splitBlockAndInsertIfThen(Value *Cmp,
----------------
include/llvm/Transforms/Utils/BasicBlockUtils.h

Feel free to send a separate patch that just hoists this from ASan into that utility location, we can get that checked in quickly.

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:179-180
@@ +178,4 @@
+// What is the best place to move it?
+static BranchInst *splitBlockAndInsertIfThen(Value *Cmp,
+    MDNode *BranchWeights = 0) {
+  Instruction *SplitBefore = cast<Instruction>(Cmp)->getNextNode();
----------------
Please line up parameters after the (

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:196
@@ +195,3 @@
+// Create a non-const global for Str so that we can pass it to the run-time lib.
+static GlobalVariable *createPrivateNonConstGlobalForString(Module &M, StringRef Str) {
+  Constant *StrConst = ConstantDataArray::getString(M.getContext(), Str);
----------------
80-columns

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:209
@@ +208,3 @@
+  C = &(M.getContext());
+  int PtrSize = TD->getPointerSizeInBits();
+  switch (PtrSize) {
----------------
'unsigned' is the more conventional type for places like this.

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:222
@@ +221,3 @@
+  IntptrTy = Type::getIntNTy(*C, PtrSize);
+  OriginTy = Type::getIntNTy(*C, 32);
+
----------------
IRB.getInt32Ty()?

We should probably add IRB.getIntPtrTy(TD) ....

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:232
@@ +231,3 @@
+  new GlobalVariable(M, IRB.getInt32Ty(), true, GlobalValue::LinkOnceODRLinkage,
+                     ConstantInt::get(IRB.getInt32Ty(), ClTrackOrigins),
+                     "__msan_track_origins");
----------------
IRB.getInt32(ClTrackOrigins)

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:236-237
@@ +235,4 @@
+  // Create the callback.
+  // FIXME: this function should have "Cold" calling conv,
+  // which is not yet implemented. Alternatively, we may use llvm.trap.
+  if (ClUseTrap) {
----------------
The alternate here doesn't make sense to me.... if we are OK trapping, we don't need any calling convention...

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:239-242
@@ +238,6 @@
+  if (ClUseTrap) {
+    // WarningFn = Intrinsic::getDeclaration(&M, Intrinsic::trap);
+    // We use inline asm because Intrinsic::trap is treated as never return.
+    WarningFn = InlineAsm::get(FunctionType::get(Type::getVoidTy(*C), false),
+                                  StringRef("ud2"), StringRef(""), true);
+  } else {
----------------
I don't understand -- this does in fact never return... This should just use llvm.trap AFAICT.

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:253-254
@@ +252,4 @@
+    IRB.getVoidTy(), IRB.getInt8PtrTy(), IntptrTy, NULL);
+  MemmoveFn = M.getOrInsertFunction("memmove",
+    IRB.getInt8PtrTy(), IRB.getInt8PtrTy(), IRB.getInt8PtrTy(), IntptrTy, NULL);
+  // Create globals.
----------------
Why not use the LLVM memmove intrinsic?

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:284-285
@@ +283,4 @@
+namespace {
+// This class does all the work for a given function.
+struct MemorySanitizerVisitor : public InstVisitor<MemorySanitizerVisitor> {
+  Function &F;
----------------
Doxygen comments please.

Also, I find that making these visitors have fairly narrow public interfaces and befriending the base class so that the visit dispatch can take place makes for a nicer final interface.

See SROA.cpp for lots of examples here.

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:318
@@ +317,3 @@
+
+  LLVM_ATTRIBUTE_NOINLINE
+  void materializeChecks() {
----------------
Huh?

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:9
@@ +8,3 @@
+//===----------------------------------------------------------------------===//
+//
+// This file is a part of MemorySanitizer, a detector of uninitialized
----------------
Use a `/// \file` comment block for this so the high level description makes it to doxygen?

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:46
@@ +45,3 @@
+// (__msan_retval_tls) and parameters (__msan_param_tls).
+// ===----------------------------------------------------------------------===//
+
----------------
80-columns

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:129
@@ +128,3 @@
+
+  TargetData *TD;
+  LLVMContext *C;
----------------
DataLayout and DL now...

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:122
@@ +121,3 @@
+/// uninitialized reads.
+struct MemorySanitizer : public FunctionPass {
+  MemorySanitizer() : FunctionPass(ID), TD(NULL) { }
----------------
I generally prefer to only make the actual public interface of the FunctionPass public on passes... Not that it *really* matters, but it makes me a bit more comfortable.

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:133-138
@@ +132,8 @@
+  Type *OriginTy;
+  // We store the shadow for parameters and retvals in separate TLS globals.
+  GlobalVariable *ParamTLS, *ParamOriginTLS;
+  GlobalVariable *RetvalTLS, *RetvalOriginTLS;
+  GlobalVariable *VAArgTLS;
+  GlobalVariable *VAArgOverflowSizeTLS;
+  GlobalVariable *OriginTLS;
+  // The run-time callback to print a warning.
----------------
Give this block some vertical whitespace, and maybe use a Doxygen group to document it?

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:139-144
@@ +138,8 @@
+  GlobalVariable *OriginTLS;
+  // The run-time callback to print a warning.
+  Value *WarningFn;
+  Value *MsanCopyOriginFn;
+  Value *MsanSetAllocaOriginFn;
+  Value *MsanPoisonStackFn;
+  Value *MemmoveFn;
+  // ShadowAddr is computed as ApplicationAddr & ~ShadowMask.
----------------
Same here.

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:157-158
@@ +156,4 @@
+INITIALIZE_PASS(MemorySanitizer, "msan",
+    "MemorySanitizer: detects uninitialized reads.",
+    false, false)
+
----------------
Indent with the `(` please.

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:164-166
@@ +163,5 @@
+
+// Create a non-const global for Str so that we can pass it to the
+// run-time lib. Runtime uses first 4 bytes of the string to keep the
+// frame ID, so the string needs to be mutable.
+static GlobalVariable *createPrivateNonConstGlobalForString(Module &M,
----------------
Doxygen comment please.

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:175
@@ +174,3 @@
+
+bool MemorySanitizer::doInitialization(Module &M) {
+  TD = getAnalysisIfAvailable<TargetData>();
----------------
Doxygen comments for these and other interfaces on FunctionPass that you are implementing. Maintainers should know about these, especially when they do something unusual -- doInitialization of this complexity is fairly uncommon...

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:183-190
@@ +182,10 @@
+  switch (PtrSize) {
+    case 64:
+      ShadowMask = 1ULL << 46;
+      OriginOffset = 1ULL << 45;
+      break;
+    case 32:
+      ShadowMask = 1ULL << 31;
+      OriginOffset = 1ULL << 30;
+      break;
+    default: report_fatal_error("unsupported pointer size");
----------------
I would feel a bit more comfortable if the 32-bit and 64-bit constants were named constants with their own docs above, and this just selected them.  Magical constants worry me.

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:191
@@ +190,3 @@
+      break;
+    default: report_fatal_error("unsupported pointer size");
+  }
----------------
If you break after a `case-label:`, please handle `default-label:` consistently...

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:208-214
@@ +207,9 @@
+  // Create the callback.
+  // FIXME: this function should have "Cold" calling conv,
+  // which is not yet implemented. Alternatively, we may use llvm.trap.
+  if (ClUseTrap) {
+    // WarningFn = Intrinsic::getDeclaration(&M, Intrinsic::trap);
+    // We use inline asm because Intrinsic::trap is treated as never return.
+    WarningFn = InlineAsm::get(FunctionType::get(Type::getVoidTy(*C), false),
+                                  StringRef("ud2"), StringRef(""), true);
+  } else {
----------------
I don't understand any of this. Why aren't we using trap? Why aren't we following the pattern of ASan? What is the goal of UseTrap?

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:198-199
@@ +197,4 @@
+
+  // Insert a call to __msan_init/__msan_track_origins into the module's CTORs.
+  IRBuilder<> IRB(*C);
+  appendToGlobalCtors(M, cast<Function>(M.getOrInsertFunction(
----------------
If you bild this 3 statemetns earlier you can use its integer type building facilities.

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:227
@@ +226,3 @@
+    IRB.getInt8PtrTy(), IRB.getInt8PtrTy(), IRB.getInt8PtrTy(), IntptrTy, NULL);
+  // Create globals.
+  RetvalTLS = new GlobalVariable(M, ArrayType::get(IRB.getInt64Ty(), 8),
----------------
Some vertical whitespace before this batch would help group things.

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:225-226
@@ +224,4 @@
+    IRB.getVoidTy(), IRB.getInt8PtrTy(), IntptrTy, NULL);
+  MemmoveFn = M.getOrInsertFunction("memmove",
+    IRB.getInt8PtrTy(), IRB.getInt8PtrTy(), IRB.getInt8PtrTy(), IntptrTy, NULL);
+  // Create globals.
----------------
Shouldn't this use the LLVM intrinsic?

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:284
@@ +283,3 @@
+  };
+  SmallVector<ShadowOriginAndInsertPoint, 16> InstrumentationSet;
+
----------------
...List instead of ...Set?

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:286
@@ +285,3 @@
+
+  SmallVector<CallInst*, 16> VAStartInstrumentationSet;
+
----------------
ditto

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:398
@@ +397,3 @@
+
+  Type *getShadowTy(Type *OrigTy) {
+    if (!OrigTy->isSized()) {
----------------
Doxygen comments throughout please.

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:401
@@ +400,3 @@
+      // dbgs() << " notSized() " << *V << "\n";
+      return NULL;
+    }
----------------
s/NULL/0/ 

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:838-846
@@ +837,11 @@
+    Value *MinusOne = Constant::getAllOnesValue(Sc->getType());
+#if 0
+    errs() << "Sc:  " << *Sc << "\n";
+    errs() << "Sa:  " << *Sa << "\n";
+    errs() << "Sb:  " << *Sb << "\n";
+    errs() << "Zero:" << *Zero << "\n";
+    errs() << "C:   " << *C << "\n";
+    errs() << "A:   " << *A << "\n";
+    errs() << "B:   " << *B << "\n";
+#endif
+    Value *Si = IRB.CreateAnd(IRB.CreateICmpNE(Sc, Zero),
----------------
No if-ed out code.

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:979-984
@@ +978,8 @@
+      if (IntrinsicInst* II = dyn_cast<IntrinsicInst>(&I)) {
+        if (MemSetInst *MemSet = dyn_cast<MemSetInst>(&I))
+          handleMemSet(*MemSet);
+        else if (MemCpyInst *MemCpy = dyn_cast<MemCpyInst>(&I))
+          handleMemCpy(*MemCpy);
+        else if (MemMoveInst *MemMove = dyn_cast<MemMoveInst>(&I))
+          handleMemMove(*MemMove);
+        else if (II->getIntrinsicID() == llvm::Intrinsic::vastart)
----------------
We now have dedicated visitor methods for these intrinsics.

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:1035-1042
@@ +1034,10 @@
+    DEBUG(dbgs() << "  done with call args\n");
+    // For VarArg functions, store the argument shadow in an ABI-specific format
+    // that corresponds to va_list layout.
+    // We do this because Clang lowers va_arg in the frontend, and this pass
+    // only sees the low level code that deals with va_list internals.
+    // A much easier alternative (provided that Clang emits va_arg instructions)
+    // would have been to associate each live instance of va_list with a copy of
+    // MSanParamTLS, and extract shadow on va_arg() call in the argument list
+    // order.
+    FunctionType *FT = cast<FunctionType>(CS.getCalledValue()->getType()->
----------------
I would split the entire varargs handling into a helper.


http://llvm-reviews.chandlerc.com/D56



More information about the llvm-commits mailing list