[LLVMdev] MemorySanitizer, a tool that finds uninitialized reads and more
kcc at google.com
Mon Oct 15 23:48:04 PDT 2012
MemorySanitizer (msan) is now mature enough to bootstrap LLVM and run it
w/o any additional tools.
Msan has already found one bug in LLVM itself:
Would anyone be willing to do a codereview (it was sent to llvm-commits:
On Tue, Jun 19, 2012 at 11:34 AM, Kostya Serebryany <kcc at google.com> wrote:
> I've just sent a code review request to llvm-commits.
> On Mon, Jun 18, 2012 at 2:39 PM, Kostya Serebryany <kcc at google.com> wrote:
>> Hello llvmdev,
>> I would like to propose and discuss yet another dynamic tool, which we
>> call MemorySanitizer (msan).
>> The main goal of the tool is to detect uses of uninitialized memory (the
>> major feature of Valgrind/Memcheck not covered by AddressSanitizer).
>> It will also find use-after-destruction-but-before-free in C++.
>> The algorithm of the tool is similar to that of Memcheck (
>> We associate a few shadow bits with every byte of the application memory,
>> poison the shadow of the malloc-ed or alloca-ed memory,
>> load the shadow bits on every memory read,
>> propagate the shadow bits through some of the arithmetic instruction
>> (including MOV),
>> store the shadow bits on every memory write,
>> report a bug on some other instructions (e.g. JMP) if the associated
>> shadow is poisoned.
>> But there are differences too.
>> The first and the major one: compiler instrumentation instead of binary
>> This gives us much better register allocation (function-wide instead of
>> possible compiler optimizations (static analysis can prove that some
>> accesses always read initialized memory),
>> and a fast start-up.
>> Our preliminary measurements show 3x-4x slowdown; compare it to
>> Memchecks's 20x and DrMemory's 10x.
>> http://groups.csail.mit.edu/commit/papers/2011/bruening-cgo11-drmemory.pdf for
>> those numbers).
>> But this brings the major issue as well: msan needs to see all program
>> events, including system calls and reads/writes in system libraries,
>> so we either need to compile *everything* with msan or use a binary
>> translation component to instrument pre-built libraries (with DynamoRIO?
>> Question: is there any usable project in LLVM land which performs binary
>> instrumentation (x86->LLVM->x86), either statically or dynamically?
>> Another difference from Memcheck is that we propose to use 8 shadow bits
>> per byte of application memory and use a
>> direct shadow mapping (for 64-bit linux that is just clearing 46-th bit
>> of the application memory address).
>> This greatly simplifies the instrumentation code and avoids races on
>> shadow updates
>> (Memcheck is single-threaded so races are not a concern there.
>> Memcheck uses 2 shadow bits per byte with a slow path storage that uses 8
>> bits per byte).
>> Suggestions? Objections?
>> Unless there is a general resentment against msan, we will soon start
>> sending the code for review.
>> (we already have a bit messy implementation, which at the top level looks
>> very much like asan and tsan, and even shares some code with them.
>> The major difference here is that the compiler part is relatively more
>> complicated than asan/tsan and run-time part is very simple).
>> Q. Why can't we combine msan and asan?
>> A: Valgrind/Memcheck and DrMemory do exactly that -- and pay large
>> performance and memory costs.
>> Addressability checker (like asan) requires little shadow memory,
>> but needs large redzone around allocated objects.
>> Tools that track uninitialized/tainted data need bit-per-bit shadow
>> in worst case, but don't need redzones.
>> So, if we merge the tools together we multiply the memory overheads.
>> The instrumentation costs in a combined tool are mostly added to
>> each other (e.g. asan needs to poison redzones and msan needs to propagate
>> shadow through arithmetic insns).
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev