[LLVMdev] MemorySanitizer, a tool that finds uninitialized reads and more

Kostya Serebryany kcc at google.com
Mon Jun 18 03:39:34 PDT 2012


Hello llvmdev,

I would like to propose and discuss yet another dynamic tool, which we call
MemorySanitizer (msan).
The main goal of the tool is to detect uses of uninitialized memory (the
major feature of Valgrind/Memcheck not covered by AddressSanitizer).
It will also find use-after-destruction-but-before-free in C++.

The algorithm of the tool is similar to that of Memcheck (
http://static.usenix.org/event/usenix05/tech/general/full_papers/seward/seward.pdf
).
We associate a few shadow bits with every byte of the application memory,
poison the shadow of the malloc-ed or alloca-ed memory,
load the shadow bits on every memory read,
propagate the shadow bits through some of the arithmetic instruction
(including MOV),
store the shadow bits on every memory write,
report a bug on some other instructions (e.g. JMP) if the associated shadow
is poisoned.

But there are differences too.

The first and the major one: compiler instrumentation instead of binary
instrumentation.
This gives us much better register allocation (function-wide instead of
local),
possible compiler optimizations (static analysis can prove that some
accesses always read initialized memory),
and a fast start-up.
Our preliminary measurements show 3x-4x slowdown; compare it to Memchecks's
20x and DrMemory's 10x.
(See
http://groups.csail.mit.edu/commit/papers/2011/bruening-cgo11-drmemory.pdf for
those numbers).
But this brings the major issue as well: msan needs to see all program
events, including system calls and reads/writes in system libraries,
so we either need to compile *everything* with msan or use a binary
translation component to instrument pre-built libraries (with DynamoRIO?
PIN?).

Question: is there any usable project in LLVM land which performs binary
instrumentation (x86->LLVM->x86), either statically or dynamically?

Another difference from Memcheck is that we propose to use 8 shadow bits
per byte of application memory and use a
direct shadow mapping (for 64-bit linux that is just clearing 46-th bit of
the application memory address).
This greatly simplifies the instrumentation code and avoids races on shadow
updates
(Memcheck is single-threaded so races are not a concern there.
Memcheck uses 2 shadow bits per byte with a slow path storage that uses 8
bits per byte).

Suggestions? Objections?
Unless there is a general resentment against msan, we will soon start
sending the code for review.
(we already have a bit messy implementation, which at the top level looks
very much like asan and tsan, and even shares some code with them.
The major difference here is that the compiler part is relatively more
complicated than asan/tsan and run-time part is very simple).


FAQ:
  Q. Why can't we combine msan and asan?
  A: Valgrind/Memcheck and DrMemory do exactly that -- and pay large
performance and memory costs.
      Addressability checker (like asan) requires little shadow memory, but
needs large redzone around allocated objects.
      Tools that track uninitialized/tainted data need bit-per-bit shadow
in worst case, but don't need redzones.
      So, if we merge the tools together we multiply the memory overheads.
      The instrumentation costs in a combined tool are mostly added to each
other (e.g. asan needs to poison redzones and msan needs to propagate
shadow through arithmetic insns).

Thanks,

--kcc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120618/ded615c4/attachment.html>


More information about the llvm-dev mailing list