[llvm-commits] [PATCH] MemorySanitizer instrumentation pass.

Tue Oct 16 07:28:18 PDT 2012

  What would be the utility of the new TBAA tag?

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:105
@@ +104,3 @@
+// likely to be zeroed.
+// As of 2012-08-28 this flag adds 20% slowdown.
+static cl::opt<bool> ClTrapOnDirtyAccess("msan-trap-on-dirty-access",
----------------
Nick Lewycky wrote:
> How slow is it if we only check these on stores but not loads? The thing about wild stores is that they can really ruin the rest of your analysis.
A very quick benchmark shows 4-5% slowdown when checking only stores, compared to 18-22% slowdown when checking both stores and loads.

Unchecked stores can potentially lead to confusing reports, but it seems unlikely, and can be verified with ASan.

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:195
@@ +194,3 @@
+
+// Create a non-const global for Str so that we can pass it to the run-time lib.
+static GlobalVariable *createPrivateNonConstGlobalForString(Module &M, StringRef Str) {
----------------
Nick Lewycky wrote:
> Could you elaborate here? Why is the run-time library going to modify the contents of this string?
I've expanded the comment a bit.
If the string is made const, __msan_set_alloca_origin will be accepting 1 extra argument which will increase regalloc pressure (it is called on entry to most functions).

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:208
@@ +207,3 @@
+  BL.reset(new BlackList(ClBlackListFile));
+  C = &(M.getContext());
+  int PtrSize = TD->getPointerSizeInBits();
----------------
Nick Lewycky wrote:
> It's common to just store it as LLVMContext & instead of a pointer.
It seems that it's not yet available in the MemorySanitizer constructor.

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:240
@@ +239,3 @@
+    // WarningFn = Intrinsic::getDeclaration(&M, Intrinsic::trap);
+    // We use inline asm because Intrinsic::trap is treated as never return.
+    WarningFn = InlineAsm::get(FunctionType::get(Type::getVoidTy(*C), false),
----------------
Nick Lewycky wrote:
> True! Why is that a problem? ud2 doesn't return either, right?
It does. We can restore context in the SIGILL handle and let it run further. Essentially, we use it as "cold call".

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:284
@@ +283,3 @@
+namespace {
+// This class does all the work for a given function.
+struct MemorySanitizerVisitor : public InstVisitor<MemorySanitizerVisitor> {
----------------
Nick Lewycky wrote:
> Suggestion: // If the instruction is accessing memory, insert the instrumentation for it.
Not exactly true. I've expanded the comment.

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:294
@@ +293,3 @@
+
+  static const unsigned AMD64GpEndOffset = 48; // AMD64 ABI Draft 0.99.6 p3.5.7
+  static const unsigned AMD64FpEndOffset = 176;
----------------
Nick Lewycky wrote:
> I don't have my copy of the ABI spec handy, so I'm not sure what it is. I'm thinking that it'd be great if some day this pass weren't target-specific, either by removing the need for this value, or by moving it into TargetData^WDataLayout.
That will be possible when Clang emits va_arg intrinsics and stops lowering them in the frontend. There is quite a bit of code below (see visitCallInst) that uses these constants, we can not move all that outside of MSan, and constants don't make much sense by themselves.

Yes, this is ugly, but works so far.

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:315
@@ +314,3 @@
+      dbgs() << "MemorySanitizer is not inserting checks into "
+             << F.getName() << "\n";
+  }
----------------
Nick Lewycky wrote:
> What if it doesn't have a name? If I write a test with
> 
>   define void @0() {
>     ret void
>   }
> 
> will it crash? (I realize blacklist has to know this function somehow, but assume I have my own that recognizes the function by pointer.)
getName returns an empty string when there is no name.
Added '' to make output a bit prettier. 

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:430
@@ +429,3 @@
+    if (VectorType *VT = dyn_cast<VectorType>(OrigTy))
+      return VectorType::getInteger(VT);
+    if (StructType *ST = dyn_cast<StructType>(OrigTy)) {
----------------
Nick Lewycky wrote:
> What happens if VT is <4 x i64*>? Does this just assert inside getInteger()?
VectorType::getInteger code seems to handle this case correctly.
http://llvm.org/docs/doxygen/html/DerivedTypes_8h_source.html#l00374

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:443
@@ +442,3 @@
+
+  LLVM_ATTRIBUTE_NOINLINE
+  Type *getShadowTyNoVec(Type *ty) {
----------------
Nick Lewycky wrote:
> What? Why? Please explain all the inlining hints. The bar for adding these is very high (as in, include a link to the llvm bug explaining how the inliner goofed up).
Right. Those are debugging leftovers. Removed.

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:521
@@ +520,3 @@
+  void setShadow(Value *V, Value *SV) {
+    assert(ShadowMap[V] == 0);
+    ShadowMap[V] = SV;
----------------
Nick Lewycky wrote:
> Did you mean "assert(!ShadowMap.count(V));" ?
> 
> Also, asserts should include some text explaining what the condition really means. Something like
> 
>   assert(ShadowMap[V] == 0 && "Values may only have one shadow.");
Yes, fixed.

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:532
@@ +531,3 @@
+
+  // Create a clean (zero) shadow value for a given value.
+  Value *getCleanShadow(Value *V) {
----------------
Nick Lewycky wrote:
> It's not clear to me whether this is the shadow value that means that all accesses are okay, or all accesses are invalid.
It means that this value does not carry any undefined-ness with it. Expanded the comment.

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:548
@@ +547,3 @@
+    for (unsigned i = 0, n = ST->getNumElements(); i < n; i++)
+      Vals.push_back(getPoisonedShadow(ST->getElementType(i)));
+    return ConstantStruct::get(ST, Vals);
----------------
Nick Lewycky wrote:
> This function assumes that ShadowTy will be an Integer, Vector or Struct. However, the contents of a struct could also be a Half, Float, Double, X86_FP80, FP128, PPC_FP128, Array or Pointer. (Can't be void, metadata, label or function.)
All primitive subtypes in shadow types are integers, see getShadowTy().

================
Comment at: llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp:919
@@ +918,3 @@
+
+    IRB.CreateMemSet(ShadowPtr, ShadowVal, Size, Align, isVolatile);
+  }
----------------
Nick Lewycky wrote:
> Why would the memset to the shadow have the same alignment and volatility as the original memset?
Alignment - because shadow address differs from the app address by a very large and very much aligned constant. Volatility - there is a general feeling that shadow access should be kept as close as possible to the original access and not be the subject of certain optimizations that are inapplicable to the latter.

http://llvm-reviews.chandlerc.com/D56