[LLVMdev] MemorySanitizer, a tool that finds uninitialized reads and more

Kostya Serebryany kcc at google.com
Tue Jun 19 00:34:35 PDT 2012


I've just sent a code review request to llvm-commits.

--kcc

On Mon, Jun 18, 2012 at 2:39 PM, Kostya Serebryany <kcc at google.com> wrote:

> Hello llvmdev,
>
> I would like to propose and discuss yet another dynamic tool, which we
> call MemorySanitizer (msan).
> The main goal of the tool is to detect uses of uninitialized memory (the
> major feature of Valgrind/Memcheck not covered by AddressSanitizer).
> It will also find use-after-destruction-but-before-free in C++.
>
> The algorithm of the tool is similar to that of Memcheck (
> http://static.usenix.org/event/usenix05/tech/general/full_papers/seward/seward.pdf
> ).
> We associate a few shadow bits with every byte of the application memory,
> poison the shadow of the malloc-ed or alloca-ed memory,
> load the shadow bits on every memory read,
> propagate the shadow bits through some of the arithmetic instruction
> (including MOV),
> store the shadow bits on every memory write,
> report a bug on some other instructions (e.g. JMP) if the associated
> shadow is poisoned.
>
> But there are differences too.
>
> The first and the major one: compiler instrumentation instead of binary
> instrumentation.
> This gives us much better register allocation (function-wide instead of
> local),
> possible compiler optimizations (static analysis can prove that some
> accesses always read initialized memory),
> and a fast start-up.
>  Our preliminary measurements show 3x-4x slowdown; compare it to
> Memchecks's 20x and DrMemory's 10x.
> (See
> http://groups.csail.mit.edu/commit/papers/2011/bruening-cgo11-drmemory.pdf for
> those numbers).
> But this brings the major issue as well: msan needs to see all program
> events, including system calls and reads/writes in system libraries,
> so we either need to compile *everything* with msan or use a binary
> translation component to instrument pre-built libraries (with DynamoRIO?
> PIN?).
>
> Question: is there any usable project in LLVM land which performs binary
> instrumentation (x86->LLVM->x86), either statically or dynamically?
>
> Another difference from Memcheck is that we propose to use 8 shadow bits
> per byte of application memory and use a
> direct shadow mapping (for 64-bit linux that is just clearing 46-th bit of
> the application memory address).
> This greatly simplifies the instrumentation code and avoids races on
> shadow updates
> (Memcheck is single-threaded so races are not a concern there.
> Memcheck uses 2 shadow bits per byte with a slow path storage that uses 8
> bits per byte).
>
> Suggestions? Objections?
> Unless there is a general resentment against msan, we will soon start
> sending the code for review.
> (we already have a bit messy implementation, which at the top level looks
> very much like asan and tsan, and even shares some code with them.
> The major difference here is that the compiler part is relatively more
> complicated than asan/tsan and run-time part is very simple).
>
>
> FAQ:
>   Q. Why can't we combine msan and asan?
>   A: Valgrind/Memcheck and DrMemory do exactly that -- and pay large
> performance and memory costs.
>       Addressability checker (like asan) requires little shadow memory,
> but needs large redzone around allocated objects.
>       Tools that track uninitialized/tainted data need bit-per-bit shadow
> in worst case, but don't need redzones.
>       So, if we merge the tools together we multiply the memory overheads.
>       The instrumentation costs in a combined tool are mostly added to
> each other (e.g. asan needs to poison redzones and msan needs to propagate
> shadow through arithmetic insns).
>
> Thanks,
>
> --kcc
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120619/842a4ff0/attachment.html>


More information about the llvm-dev mailing list