[llvm-commits] MemorySanitizer, first patch

Fri Jul 27 02:20:45 PDT 2012

Fresh numbers after several performance fixes in msan run-time.
The average slowdown of msan is 3x (ranges between 2x and 4x).
When running under msan+DynamoRio (msandr) the average slowdown is 3.6x

# benchmark  native-time(seconds) msan-slowdown(times)
msan+dr-slowdown(times)
400.perlbench   379     3.71    4.75
401.bzip2       483     3.02    2.93
403.gcc         312     2.91    4.22
429.mcf         313     2.97    2.97
445.gobmk       429     3.32    6.41
456.hmmer       606     3.69    2.55
458.sjeng       491     3.98    6.39
462.libquantum  446     2.72    2.18
464.h264ref     573     3.61    5.13
471.omnetpp     317     2.16    2.91
473.astar       412     2.26    2.32
483.xalancbmk   241     3.11    4.99
433.milc        402     2.25    2.31
444.namd        374     2.53    2.22
447.dealII      311     2.82    3.39
450.soplex      232     2.60    2.80
453.povray      197     3.35    5.28
470.lbm         321     2.10    2.28
482.sphinx3     496     4.00    3.35

Funny fact: sometimes the DynamoRio tool makes things faster due to "trace
optimization". We could probably fix this in the LLVM pass.

Thanks,
--kcc

On Thu, Jul 19, 2012 at 7:42 PM, Evgeniy Stepanov <eugeni.stepanov at gmail.com
> wrote:

> We've solved the issue of external code with the help of dynamic
> binary instrumentation. A simple (~500LOK) DynamoRio client zero-fills
> shadow memory for all stores and return values in uninstrumented code,
> thus avoiding false positives.
>
> I'm attaching the new patch to LLVM with a slightly improved
> instrumentation pass. Repository at
> http://code.google.com/p/memory-sanitizer/source/browse/ contains the
> runtime library and the DynamoRio client.
>
> In its current state, the tool can handle all C and C++ tests in SPEC
> CPU2006, finding at least one use of uninitialized data that's also
> detected by Memcheck (Valgrind), and one or two false positives coming
> from our inexact handling of certain instructions.
>
> See the performance numbers. "MSan time" was measured by running
> without the dynamic tool, initializing shadow to 0x0 instead of 0xFF
> to avoid reporting any errors. "MSan/DR time" is from an honest run of
> an instrumented benchmark under the DynamoRio-based tool.
>
> #    test,     native time,   MSan time,   MSan slowdown,    MSan/DR
> time,    MSan/DR slowdown
>   400.perlbench,        22.40,       116.00,         5.18
> 138.00,         6.16
>       401.bzip2,        41.00,       155.00,         3.78
> 149.00,         3.63
>         403.gcc,         0.79,         3.52,         4.48
> 10.50,        13.38
>         429.mcf,        13.80,        37.00,         2.68
> 46.70,         3.38
>       445.gobmk,        83.10,       328.00,         3.95
> 440.00,         5.29
>       456.hmmer,        59.70,       172.00,         2.88
> 145.00,         2.43
>       458.sjeng,       105.00,       305.00,         2.90
> 391.00,         3.72
>  462.libquantum,         1.66,         4.37,         2.63
> 3.88,         2.34
>     464.h264ref,        71.30,       267.00,         3.74
> 358.00,         5.02
>     471.omnetpp,        42.20,       246.00,         5.83
> 436.00,        10.33
>       473.astar,        82.40,       243.00,         2.95
> 285.00,         3.46
>   483.xalancbmk,        48.40,       644.00,        13.31
> 1164.00,        24.05
>        433.milc,        14.00,        37.60,         2.69
> 40.80,         2.91
>        444.namd,        10.20,        33.20,         3.25
> 34.20,         3.35
>      453.povray,         6.50,        32.20,         4.95
> 48.80,         7.51
>         470.lbm,        31.00,       107.00,         3.45
> 138.00,         4.45
>     482.sphinx3,         7.66,        23.90,         3.12
> 26.80,         3.50
>
>
> On Wed, Jun 27, 2012 at 10:50 PM, Kostya Serebryany <kcc at google.com>
> wrote:
> >
> >
> > On Wed, Jun 27, 2012 at 7:30 PM, John Criswell <criswell at illinois.edu>
> > wrote:
> >>
> >> On 6/26/12 11:20 AM, Kostya Serebryany wrote:
> >>
> >> John,
> >>
> >> On Tue, Jun 26, 2012 at 7:45 PM, John Criswell <criswell at illinois.edu>
> >> wrote:
> >>>
> >>> On 6/26/12 4:18 AM, Kostya Serebryany wrote:
> >>>
> >>> Updated the patch to use the new interface for TLS
> >>> (GlobalVariable::GeneralDynamicTLSModel).
> >>> Comments?
> >>>
> >>>
> >>> First, there's the non-technical question of whether this belongs in
> LLVM
> >>> mainline or not in its current state.  Do you have users of this tool
> in the
> >>> same way that you had users of ASan?  What platforms have you tested
> this
> >>> tool on?  Do you get false positives when you link with native code
> >>> libraries, and if so, is this false positive rate low?  I get the
> impression
> >>> that this tool isn't as mature as TSan or ASan,
> >>
> >>
> >> Your impression is correct -- msan is not yet usable. It works only on
> >> linux-x86_64, only on small libc-free tests.
> >> It has been heavily tested (on SPEC and other large programs), but w/o
> >> actually poisoning the shadow.
> >>
> >>>
> >>> so I'm wondering whether it's really ready to go in yet.
> >>
> >>
> >> Having these 500 LOC in trunk will simplify further development a lot.
> >> (we will also need a few lines of driver changes).
> >> I can't tell whether this justifies the commit to trunk now (it
> certainly
> >> does for me).
> >>
> >>
> >> I think it would be better to fix the external code issue first before
> >> trying to commit to LLVM trunk; it's the only obstacle I see that
> prevents
> >> you from having a working tool.  That way, you can have a working tool
> and
> >> get feedback on it; you'll also know what the ramifications are for your
> >> approach to handling external library code.  As it is right now, I think
> >> this new tool is still in too early a stage to add to LLVM.
> >
> >
> > Having this code in trunk will simplify and speedup the development of
> the
> > dynamic component(s).
> > It may also inspire some research in the area of LLVM-based binary
> > instrumentation.
> > But I'll agree on what the community says.
> >
> >
> >>
> >>
> >>
> >>
> >>>
> >>>
> >>> Second, on the technical side, I think dealing with the external code
> >>> problem has to be addressed before the code can be committed; I
> suspect the
> >>> tool will return too many false positives if you don't address it (if
> you
> >>> have experiments indicating otherwise, please let us know).  I'm
> assuming
> >>> from the comment at the beginning of the pass's implementation that
> it's
> >>> still unaddressed.
> >>
> >>
> >> We are working on this, in fact experimenting in a few different
> >> directions (DynamoRIO, PIN, static instrumentation, even full
> recompile).
> >>
> >>
> >> Okay.
> >>
> >>
> >>
> >>>
> >>> I think instrumenting all library code is unrealistic; you'd have to
> have
> >>> every library on your system compiled with the sanitizer (at least for
> large
> >>> GUI applications).
> >>
> >>
> >> It may still be doable in some restricted environments, but yes, I agree
> >> in general.
> >>
> >>>
> >>> I think there's a few other approaches that you can take:
> >>>
> >>> 1) Assume external code works properly.  To do this, you first ensure
> >>> that memory allocated by external code is always marked as initialized.
> >>> Second, when passing memory objects from internal, instrumented code to
> >>> external library code, you mark the memory as initialized (since you're
> >>> assuming the external code will initialize it).
> >>
> >>
> >> That doesn't work.
> >> What if I pass my pointer to a library function.
> >> What has to be un-poisoned (marked as inited)?
> >> The contents of the heap/stack/bss block to which this pointer belongs?
> >>
> >>
> >> Yes, that's what I was thinking.
> >>
> >>
> >> How about pointers that reside in this block? Do we need to go deeper?
> >>
> >>
> >> Ah, that's a problem.  You might be able to do this conservatively by
> >> treating values that point into valid memory objects as pointers (like
> >> conservative garbage collection), but it'd be expensive and probably not
> >> worth the trouble.
> >
> >
> > It'll be prohibitively expensive, if at all possible.
> > Even with all the hacks we put into the allocator code, checking whether
> a
> > given integer is a currently addressable pointer is not free. For stack
> > buffers that's even more expensive.
> > Some pointers may point slightly off the buffer (e.g. to -1'th element).
> >
> >>
> >>
> >>
> >>
> >>>
> >>>
> >>> Of course, you can do things like special casing the C library to
> reduce
> >>> false negatives, and you can instruct users to compile library code
> with the
> >>> compiler to find more bugs.
> >>>
> >>> 2) Use something like SAFECode's CompleteChecks pass to determine which
> >>> checks are on memory objects handled completely within the analyzed
> portion
> >>> of the program and which checks can access memory handled by external
> code.
> >>> Checks on memory objects handled by external code always pass (or just
> >>> generate a warning instead of an error, or something like that).  This
> >>> approach requires a good points-to analysis algorithm.  SAFECode uses
> DSA
> >>> which is unification-based.  You can either use that or, if you
> prefer, use
> >>> a subset based algorithm like Anderson's.  I think there's one
> >>> implementation of Anderson's for LLVM out there, but I don't know how
> robust
> >>> it is.
> >>
> >>
> >> I'll have to look at this more, but I don't (yet) see how it can help.
> >> Again, suppose
> >>
> >> struct A { int stuff; }
> >> struct B { A *a; }
> >> struct C { A *a, B* b};
> >> ...
> >> struct Z { A* a, B *b, C *c, ... Y *y;}
> >>
> >> ...
> >> Z *z = new Z;
> >> library_function(z);
> >>
> >> library_function can write to any part of Z, but we can't even traverse
> >> all of it.
> >> Can we do anything?
> >>
> >>
> >> No, you can't really say anything about the object to which Z points or
> >> the objects reachable from the pointer Z.  Any checks on z, z->a, etc.
> would
> >> have to conservatively pass.
> >
> >
> >
> >>
> >> This approach doesn't help you check for initialization errors with
> >> library code; it helps prevent library code from triggering false
> positives.
> >> It ensures that all of your checks are performed on memory objects
> handled
> >> by internal code.
> >>
> >> For memory safety, relaxing the safety guarantees for memory objects
> >> passed to library code still results in a usable system that catches
> memory
> >> safety errors.  I don't know if the same is true for uninitialized value
> >> detection; you'll probably have to experiment and see.
> >
> >
> > That's not true for uninitialized value detection.
> >
> > both asan and tsan know nothing about the code in the libraries and work
> > just fine.
> > They miss some percentage of bugs (based on experience with valgrind our
> > estimate is 5%-10% of bugs).
> > Adding a dynamic component to asan/tsan will add that coverage, and we
> are
> > going to do it.
> >
> > For msan the story is completely different.
> > You either cover everything including system calls or don't go there.
> > Any memory write which the tool does not observe followed by a read in
> the
> > instrumented code leads to a false report.
> >
> >
> >>
> >>
> >>
> >>
> >>
> >>>
> >>>
> >>> 3) Use dynamic binary instrumentation to handle external code.
> >>>
> >>> The first option is probably the easiest to implement.  The second
> option
> >>> requires a points-to analysis that can go into LLVM mainline.  The
> third
> >>> option is a fair amount of work but probably easier than using
> points-to
> >>> analysis and will probably yield a better debugging tool.
> >>
> >>
> >> We *are* working on #3 (see above).
> >>
> >>
> >> I think using dynamic binary translation is probably the best option out
> >> of the three.  Just take my opinion with the caveat that I haven't run
> any
> >> experiments on uninitialized value usage.
> >
> >
> > It's actually the worst option, but the only one I know how to implement.
> > I'd really prefer a fully static approach (static binary translation, or
> > full recompile) but I don't see any suitable technology. :(
> >
> >>>
> >>>
> >>> Third, other technical nit-picks:
> >>>
> >>> 1) IMHO, in general, every method and function should have a comment
> >>> explaining what it does and the meaning of its inputs and return
> values.
> >>> Descriptions should use complete sentences.
> >>
> >>
> >> Agree, I'll add comments.
> >>
> >>>
> >>>
> >>> 2) You may want to consider using a class derived from InstVisitor
> >>> instead of writing large if/then/else or switch code.
> >>
> >>
> >> I Didn't know about InstVisitor.
> >> However, I am not sure if that would be simpler. Will have a closer
> look.
> >>
> >>
> >> InstVisitor is faster than using dyn_cast<>() and is the same as using a
> >> switch on the instruction opcode (which you code does in some places).
>  It
> >> may make your code more readable.
> >
> >
> > I am convinced. The new patch uses InstVisitor.
> >
> >>
> >>
> >>
> >>
> >>>
> >>>
> >>> 3) If I understand correctly, a "shadow instruction" propagates
> >>> information about initialization status through a computation (so, for
> >>> example, if you have a c = add a, b instruction, there's a shadow
> >>> instruction that computes c's initialization status based on a and b's
> >>> initialization status).
> >>
> >>
> >> Correct.
> >>
> >>>
> >>> It looks like you add shadow instructions for all instructions and then
> >>> "fill in" the operands to the shadow instructions by iterating over the
> >>> instructions a second time.  Is this correct?
> >>
> >>
> >> Yes.
> >>
> >>
> >> How do you handle passing taint (i.e., initialization status) between
> >> actual parameters and formal parameters in a function call?  Do you
> store
> >> the information to memory and then read it in the callee?
> >
> >
> > Yes, the shadow is passed via a dedicated thread-local memory slot.
> >
> >>
> >>
> >>
> >>
> >>>
> >>> If so, there's an alternative approach: if you fetch the dominator tree
> >>> for the function and visit instructions in dominator tree order, then
> you'll
> >>> visit all definitions before all uses.  That will alleviate the need
> to make
> >>> two passes over all the instructions.  The only tricky part is phi
> nodes,
> >>> but for those, you add shadow instructions (phi nodes with undef
> operands)
> >>> to them first and then visit all the other instructions in dominator
> tree
> >>> order; you fill in the operands of the shadow phi nodes at the end.
> >>>
> >>> I'm not sure which approach is faster (I don't think LLVM's dominator
> >>> tree pass is the most efficient implementation), but it's something to
> think
> >>> about.
> >>
> >>
> >> I actually did think about this. I did not experiment with using the
> >> dominator tree, but I think that the current way is at least as
> efficient.
> >>
> >>
> >> It's really a question of whether computing the dominator tree is faster
> >> or slower than iterating over all instructions twice (or whether
> dominator
> >> tree has already been computed for some previously run transform).  I
> >> suppose dominator tree might be better if you're not going to
> instrument all
> >> instructions.
> >
> >
> > Unless I am mistaken depth-first-order is enough.
> > The new patch uses that.
> >
> >
> > Thanks!
> >
> > --kcc
> >
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120727/b86836bc/attachment.html>