[llvm-commits] MemorySanitizer, first patch
Chandler Carruth
chandlerc at google.com
Fri Jul 27 02:26:08 PDT 2012
I'm going to be going over this code with a fine tooth comb over the next
few days and getting you a thorough review, but I wanted to start off by
saying: *wow*. Those are phenomenal performance numbers. I'm very excited
about this.
On Fri, Jul 27, 2012 at 2:20 AM, Kostya Serebryany <kcc at google.com> wrote:
> Fresh numbers after several performance fixes in msan run-time.
> The average slowdown of msan is 3x (ranges between 2x and 4x).
> When running under msan+DynamoRio (msandr) the average slowdown is 3.6x
>
> # benchmark native-time(seconds) msan-slowdown(times)
> msan+dr-slowdown(times)
> 400.perlbench 379 3.71 4.75
> 401.bzip2 483 3.02 2.93
> 403.gcc 312 2.91 4.22
> 429.mcf 313 2.97 2.97
> 445.gobmk 429 3.32 6.41
> 456.hmmer 606 3.69 2.55
> 458.sjeng 491 3.98 6.39
> 462.libquantum 446 2.72 2.18
> 464.h264ref 573 3.61 5.13
> 471.omnetpp 317 2.16 2.91
> 473.astar 412 2.26 2.32
> 483.xalancbmk 241 3.11 4.99
> 433.milc 402 2.25 2.31
> 444.namd 374 2.53 2.22
> 447.dealII 311 2.82 3.39
> 450.soplex 232 2.60 2.80
> 453.povray 197 3.35 5.28
> 470.lbm 321 2.10 2.28
> 482.sphinx3 496 4.00 3.35
>
> Funny fact: sometimes the DynamoRio tool makes things faster due to "trace
> optimization". We could probably fix this in the LLVM pass.
>
> Thanks,
> --kcc
>
>
>
> On Thu, Jul 19, 2012 at 7:42 PM, Evgeniy Stepanov <
> eugeni.stepanov at gmail.com> wrote:
>
>> We've solved the issue of external code with the help of dynamic
>> binary instrumentation. A simple (~500LOK) DynamoRio client zero-fills
>> shadow memory for all stores and return values in uninstrumented code,
>> thus avoiding false positives.
>>
>> I'm attaching the new patch to LLVM with a slightly improved
>> instrumentation pass. Repository at
>> http://code.google.com/p/memory-sanitizer/source/browse/ contains the
>> runtime library and the DynamoRio client.
>>
>> In its current state, the tool can handle all C and C++ tests in SPEC
>> CPU2006, finding at least one use of uninitialized data that's also
>> detected by Memcheck (Valgrind), and one or two false positives coming
>> from our inexact handling of certain instructions.
>>
>> See the performance numbers. "MSan time" was measured by running
>> without the dynamic tool, initializing shadow to 0x0 instead of 0xFF
>> to avoid reporting any errors. "MSan/DR time" is from an honest run of
>> an instrumented benchmark under the DynamoRio-based tool.
>>
>> # test, native time, MSan time, MSan slowdown, MSan/DR
>> time, MSan/DR slowdown
>> 400.perlbench, 22.40, 116.00, 5.18
>> 138.00, 6.16
>> 401.bzip2, 41.00, 155.00, 3.78
>> 149.00, 3.63
>> 403.gcc, 0.79, 3.52, 4.48
>> 10.50, 13.38
>> 429.mcf, 13.80, 37.00, 2.68
>> 46.70, 3.38
>> 445.gobmk, 83.10, 328.00, 3.95
>> 440.00, 5.29
>> 456.hmmer, 59.70, 172.00, 2.88
>> 145.00, 2.43
>> 458.sjeng, 105.00, 305.00, 2.90
>> 391.00, 3.72
>> 462.libquantum, 1.66, 4.37, 2.63
>> 3.88, 2.34
>> 464.h264ref, 71.30, 267.00, 3.74
>> 358.00, 5.02
>> 471.omnetpp, 42.20, 246.00, 5.83
>> 436.00, 10.33
>> 473.astar, 82.40, 243.00, 2.95
>> 285.00, 3.46
>> 483.xalancbmk, 48.40, 644.00, 13.31
>> 1164.00, 24.05
>> 433.milc, 14.00, 37.60, 2.69
>> 40.80, 2.91
>> 444.namd, 10.20, 33.20, 3.25
>> 34.20, 3.35
>> 453.povray, 6.50, 32.20, 4.95
>> 48.80, 7.51
>> 470.lbm, 31.00, 107.00, 3.45
>> 138.00, 4.45
>> 482.sphinx3, 7.66, 23.90, 3.12
>> 26.80, 3.50
>>
>>
>> On Wed, Jun 27, 2012 at 10:50 PM, Kostya Serebryany <kcc at google.com>
>> wrote:
>> >
>> >
>> > On Wed, Jun 27, 2012 at 7:30 PM, John Criswell <criswell at illinois.edu>
>> > wrote:
>> >>
>> >> On 6/26/12 11:20 AM, Kostya Serebryany wrote:
>> >>
>> >> John,
>> >>
>> >> On Tue, Jun 26, 2012 at 7:45 PM, John Criswell <criswell at illinois.edu>
>> >> wrote:
>> >>>
>> >>> On 6/26/12 4:18 AM, Kostya Serebryany wrote:
>> >>>
>> >>> Updated the patch to use the new interface for TLS
>> >>> (GlobalVariable::GeneralDynamicTLSModel).
>> >>> Comments?
>> >>>
>> >>>
>> >>> First, there's the non-technical question of whether this belongs in
>> LLVM
>> >>> mainline or not in its current state. Do you have users of this tool
>> in the
>> >>> same way that you had users of ASan? What platforms have you tested
>> this
>> >>> tool on? Do you get false positives when you link with native code
>> >>> libraries, and if so, is this false positive rate low? I get the
>> impression
>> >>> that this tool isn't as mature as TSan or ASan,
>> >>
>> >>
>> >> Your impression is correct -- msan is not yet usable. It works only on
>> >> linux-x86_64, only on small libc-free tests.
>> >> It has been heavily tested (on SPEC and other large programs), but w/o
>> >> actually poisoning the shadow.
>> >>
>> >>>
>> >>> so I'm wondering whether it's really ready to go in yet.
>> >>
>> >>
>> >> Having these 500 LOC in trunk will simplify further development a lot.
>> >> (we will also need a few lines of driver changes).
>> >> I can't tell whether this justifies the commit to trunk now (it
>> certainly
>> >> does for me).
>> >>
>> >>
>> >> I think it would be better to fix the external code issue first before
>> >> trying to commit to LLVM trunk; it's the only obstacle I see that
>> prevents
>> >> you from having a working tool. That way, you can have a working tool
>> and
>> >> get feedback on it; you'll also know what the ramifications are for
>> your
>> >> approach to handling external library code. As it is right now, I
>> think
>> >> this new tool is still in too early a stage to add to LLVM.
>> >
>> >
>> > Having this code in trunk will simplify and speedup the development of
>> the
>> > dynamic component(s).
>> > It may also inspire some research in the area of LLVM-based binary
>> > instrumentation.
>> > But I'll agree on what the community says.
>> >
>> >
>> >>
>> >>
>> >>
>> >>
>> >>>
>> >>>
>> >>> Second, on the technical side, I think dealing with the external code
>> >>> problem has to be addressed before the code can be committed; I
>> suspect the
>> >>> tool will return too many false positives if you don't address it (if
>> you
>> >>> have experiments indicating otherwise, please let us know). I'm
>> assuming
>> >>> from the comment at the beginning of the pass's implementation that
>> it's
>> >>> still unaddressed.
>> >>
>> >>
>> >> We are working on this, in fact experimenting in a few different
>> >> directions (DynamoRIO, PIN, static instrumentation, even full
>> recompile).
>> >>
>> >>
>> >> Okay.
>> >>
>> >>
>> >>
>> >>>
>> >>> I think instrumenting all library code is unrealistic; you'd have to
>> have
>> >>> every library on your system compiled with the sanitizer (at least
>> for large
>> >>> GUI applications).
>> >>
>> >>
>> >> It may still be doable in some restricted environments, but yes, I
>> agree
>> >> in general.
>> >>
>> >>>
>> >>> I think there's a few other approaches that you can take:
>> >>>
>> >>> 1) Assume external code works properly. To do this, you first ensure
>> >>> that memory allocated by external code is always marked as
>> initialized.
>> >>> Second, when passing memory objects from internal, instrumented code
>> to
>> >>> external library code, you mark the memory as initialized (since
>> you're
>> >>> assuming the external code will initialize it).
>> >>
>> >>
>> >> That doesn't work.
>> >> What if I pass my pointer to a library function.
>> >> What has to be un-poisoned (marked as inited)?
>> >> The contents of the heap/stack/bss block to which this pointer belongs?
>> >>
>> >>
>> >> Yes, that's what I was thinking.
>> >>
>> >>
>> >> How about pointers that reside in this block? Do we need to go deeper?
>> >>
>> >>
>> >> Ah, that's a problem. You might be able to do this conservatively by
>> >> treating values that point into valid memory objects as pointers (like
>> >> conservative garbage collection), but it'd be expensive and probably
>> not
>> >> worth the trouble.
>> >
>> >
>> > It'll be prohibitively expensive, if at all possible.
>> > Even with all the hacks we put into the allocator code, checking
>> whether a
>> > given integer is a currently addressable pointer is not free. For stack
>> > buffers that's even more expensive.
>> > Some pointers may point slightly off the buffer (e.g. to -1'th element).
>> >
>> >>
>> >>
>> >>
>> >>
>> >>>
>> >>>
>> >>> Of course, you can do things like special casing the C library to
>> reduce
>> >>> false negatives, and you can instruct users to compile library code
>> with the
>> >>> compiler to find more bugs.
>> >>>
>> >>> 2) Use something like SAFECode's CompleteChecks pass to determine
>> which
>> >>> checks are on memory objects handled completely within the analyzed
>> portion
>> >>> of the program and which checks can access memory handled by external
>> code.
>> >>> Checks on memory objects handled by external code always pass (or just
>> >>> generate a warning instead of an error, or something like that). This
>> >>> approach requires a good points-to analysis algorithm. SAFECode uses
>> DSA
>> >>> which is unification-based. You can either use that or, if you
>> prefer, use
>> >>> a subset based algorithm like Anderson's. I think there's one
>> >>> implementation of Anderson's for LLVM out there, but I don't know how
>> robust
>> >>> it is.
>> >>
>> >>
>> >> I'll have to look at this more, but I don't (yet) see how it can help.
>> >> Again, suppose
>> >>
>> >> struct A { int stuff; }
>> >> struct B { A *a; }
>> >> struct C { A *a, B* b};
>> >> ...
>> >> struct Z { A* a, B *b, C *c, ... Y *y;}
>> >>
>> >> ...
>> >> Z *z = new Z;
>> >> library_function(z);
>> >>
>> >> library_function can write to any part of Z, but we can't even traverse
>> >> all of it.
>> >> Can we do anything?
>> >>
>> >>
>> >> No, you can't really say anything about the object to which Z points or
>> >> the objects reachable from the pointer Z. Any checks on z, z->a, etc.
>> would
>> >> have to conservatively pass.
>> >
>> >
>> >
>> >>
>> >> This approach doesn't help you check for initialization errors with
>> >> library code; it helps prevent library code from triggering false
>> positives.
>> >> It ensures that all of your checks are performed on memory objects
>> handled
>> >> by internal code.
>> >>
>> >> For memory safety, relaxing the safety guarantees for memory objects
>> >> passed to library code still results in a usable system that catches
>> memory
>> >> safety errors. I don't know if the same is true for uninitialized
>> value
>> >> detection; you'll probably have to experiment and see.
>> >
>> >
>> > That's not true for uninitialized value detection.
>> >
>> > both asan and tsan know nothing about the code in the libraries and work
>> > just fine.
>> > They miss some percentage of bugs (based on experience with valgrind our
>> > estimate is 5%-10% of bugs).
>> > Adding a dynamic component to asan/tsan will add that coverage, and we
>> are
>> > going to do it.
>> >
>> > For msan the story is completely different.
>> > You either cover everything including system calls or don't go there.
>> > Any memory write which the tool does not observe followed by a read in
>> the
>> > instrumented code leads to a false report.
>> >
>> >
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>>
>> >>>
>> >>> 3) Use dynamic binary instrumentation to handle external code.
>> >>>
>> >>> The first option is probably the easiest to implement. The second
>> option
>> >>> requires a points-to analysis that can go into LLVM mainline. The
>> third
>> >>> option is a fair amount of work but probably easier than using
>> points-to
>> >>> analysis and will probably yield a better debugging tool.
>> >>
>> >>
>> >> We *are* working on #3 (see above).
>> >>
>> >>
>> >> I think using dynamic binary translation is probably the best option
>> out
>> >> of the three. Just take my opinion with the caveat that I haven't run
>> any
>> >> experiments on uninitialized value usage.
>> >
>> >
>> > It's actually the worst option, but the only one I know how to
>> implement.
>> > I'd really prefer a fully static approach (static binary translation, or
>> > full recompile) but I don't see any suitable technology. :(
>> >
>> >>>
>> >>>
>> >>> Third, other technical nit-picks:
>> >>>
>> >>> 1) IMHO, in general, every method and function should have a comment
>> >>> explaining what it does and the meaning of its inputs and return
>> values.
>> >>> Descriptions should use complete sentences.
>> >>
>> >>
>> >> Agree, I'll add comments.
>> >>
>> >>>
>> >>>
>> >>> 2) You may want to consider using a class derived from InstVisitor
>> >>> instead of writing large if/then/else or switch code.
>> >>
>> >>
>> >> I Didn't know about InstVisitor.
>> >> However, I am not sure if that would be simpler. Will have a closer
>> look.
>> >>
>> >>
>> >> InstVisitor is faster than using dyn_cast<>() and is the same as using
>> a
>> >> switch on the instruction opcode (which you code does in some places).
>> It
>> >> may make your code more readable.
>> >
>> >
>> > I am convinced. The new patch uses InstVisitor.
>> >
>> >>
>> >>
>> >>
>> >>
>> >>>
>> >>>
>> >>> 3) If I understand correctly, a "shadow instruction" propagates
>> >>> information about initialization status through a computation (so, for
>> >>> example, if you have a c = add a, b instruction, there's a shadow
>> >>> instruction that computes c's initialization status based on a and b's
>> >>> initialization status).
>> >>
>> >>
>> >> Correct.
>> >>
>> >>>
>> >>> It looks like you add shadow instructions for all instructions and
>> then
>> >>> "fill in" the operands to the shadow instructions by iterating over
>> the
>> >>> instructions a second time. Is this correct?
>> >>
>> >>
>> >> Yes.
>> >>
>> >>
>> >> How do you handle passing taint (i.e., initialization status) between
>> >> actual parameters and formal parameters in a function call? Do you
>> store
>> >> the information to memory and then read it in the callee?
>> >
>> >
>> > Yes, the shadow is passed via a dedicated thread-local memory slot.
>> >
>> >>
>> >>
>> >>
>> >>
>> >>>
>> >>> If so, there's an alternative approach: if you fetch the dominator
>> tree
>> >>> for the function and visit instructions in dominator tree order, then
>> you'll
>> >>> visit all definitions before all uses. That will alleviate the need
>> to make
>> >>> two passes over all the instructions. The only tricky part is phi
>> nodes,
>> >>> but for those, you add shadow instructions (phi nodes with undef
>> operands)
>> >>> to them first and then visit all the other instructions in dominator
>> tree
>> >>> order; you fill in the operands of the shadow phi nodes at the end.
>> >>>
>> >>> I'm not sure which approach is faster (I don't think LLVM's dominator
>> >>> tree pass is the most efficient implementation), but it's something
>> to think
>> >>> about.
>> >>
>> >>
>> >> I actually did think about this. I did not experiment with using the
>> >> dominator tree, but I think that the current way is at least as
>> efficient.
>> >>
>> >>
>> >> It's really a question of whether computing the dominator tree is
>> faster
>> >> or slower than iterating over all instructions twice (or whether
>> dominator
>> >> tree has already been computed for some previously run transform). I
>> >> suppose dominator tree might be better if you're not going to
>> instrument all
>> >> instructions.
>> >
>> >
>> > Unless I am mistaken depth-first-order is enough.
>> > The new patch uses that.
>> >
>> >
>> > Thanks!
>> >
>> > --kcc
>> >
>> > _______________________________________________
>> > llvm-commits mailing list
>> > llvm-commits at cs.uiuc.edu
>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120727/543e64b7/attachment.html>
More information about the llvm-commits
mailing list