[LLVMdev] MemorySanitizer, a tool that finds uninitialized reads and more

Kostya Serebryany kcc at google.com
Mon Oct 15 23:48:04 PDT 2012


Hi again,

MemorySanitizer (msan) is now mature enough to bootstrap LLVM and run it
w/o any additional tools.
Msan has already found one bug in LLVM itself:
http://llvm.org/bugs/show_bug.cgi?id=13929

Would anyone be willing to do a codereview (it was sent to llvm-commits:
http://permalink.gmane.org/gmane.comp.compilers.llvm.cvs/123253)

Thanks,

--kcc

On Tue, Jun 19, 2012 at 11:34 AM, Kostya Serebryany <kcc at google.com> wrote:

> I've just sent a code review request to llvm-commits.
>
> --kcc
>
>
> On Mon, Jun 18, 2012 at 2:39 PM, Kostya Serebryany <kcc at google.com> wrote:
>
>> Hello llvmdev,
>>
>> I would like to propose and discuss yet another dynamic tool, which we
>> call MemorySanitizer (msan).
>> The main goal of the tool is to detect uses of uninitialized memory (the
>> major feature of Valgrind/Memcheck not covered by AddressSanitizer).
>> It will also find use-after-destruction-but-before-free in C++.
>>
>> The algorithm of the tool is similar to that of Memcheck (
>> http://static.usenix.org/event/usenix05/tech/general/full_papers/seward/seward.pdf
>> ).
>> We associate a few shadow bits with every byte of the application memory,
>> poison the shadow of the malloc-ed or alloca-ed memory,
>> load the shadow bits on every memory read,
>> propagate the shadow bits through some of the arithmetic instruction
>> (including MOV),
>> store the shadow bits on every memory write,
>> report a bug on some other instructions (e.g. JMP) if the associated
>> shadow is poisoned.
>>
>> But there are differences too.
>>
>> The first and the major one: compiler instrumentation instead of binary
>> instrumentation.
>> This gives us much better register allocation (function-wide instead of
>> local),
>> possible compiler optimizations (static analysis can prove that some
>> accesses always read initialized memory),
>> and a fast start-up.
>>  Our preliminary measurements show 3x-4x slowdown; compare it to
>> Memchecks's 20x and DrMemory's 10x.
>> (See
>> http://groups.csail.mit.edu/commit/papers/2011/bruening-cgo11-drmemory.pdf for
>> those numbers).
>> But this brings the major issue as well: msan needs to see all program
>> events, including system calls and reads/writes in system libraries,
>> so we either need to compile *everything* with msan or use a binary
>> translation component to instrument pre-built libraries (with DynamoRIO?
>> PIN?).
>>
>> Question: is there any usable project in LLVM land which performs binary
>> instrumentation (x86->LLVM->x86), either statically or dynamically?
>>
>> Another difference from Memcheck is that we propose to use 8 shadow bits
>> per byte of application memory and use a
>> direct shadow mapping (for 64-bit linux that is just clearing 46-th bit
>> of the application memory address).
>> This greatly simplifies the instrumentation code and avoids races on
>> shadow updates
>> (Memcheck is single-threaded so races are not a concern there.
>> Memcheck uses 2 shadow bits per byte with a slow path storage that uses 8
>> bits per byte).
>>
>> Suggestions? Objections?
>> Unless there is a general resentment against msan, we will soon start
>> sending the code for review.
>> (we already have a bit messy implementation, which at the top level looks
>> very much like asan and tsan, and even shares some code with them.
>> The major difference here is that the compiler part is relatively more
>> complicated than asan/tsan and run-time part is very simple).
>>
>>
>> FAQ:
>>   Q. Why can't we combine msan and asan?
>>   A: Valgrind/Memcheck and DrMemory do exactly that -- and pay large
>> performance and memory costs.
>>       Addressability checker (like asan) requires little shadow memory,
>> but needs large redzone around allocated objects.
>>       Tools that track uninitialized/tainted data need bit-per-bit shadow
>> in worst case, but don't need redzones.
>>       So, if we merge the tools together we multiply the memory overheads.
>>       The instrumentation costs in a combined tool are mostly added to
>> each other (e.g. asan needs to poison redzones and msan needs to propagate
>> shadow through arithmetic insns).
>>
>> Thanks,
>>
>> --kcc
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121016/352b0490/attachment.html>


More information about the llvm-dev mailing list