[llvm-commits] MemorySanitizer, first patch

Wed Aug 1 12:37:19 PDT 2012

We've improved performance some more. Average slowdown of msan now is
just 2.47x, msan+dynamorio 3.5x.
DynamoRio no longer speeds up any of the spec benchmarks.
Repeating an assortment of general optimization passes after the msan
pass improved performance by 9% - we'll need to take a better look at
what's really needed, and may be make msan run a bit earlier in the
pipeline.

# benchmark  native-time(seconds) msan-slowdown(times) msan+dr-slowdown(times)
  400.perlbench,          383,         2.77,         3.85
      401.bzip2,          483,         1.94,         1.94
        403.gcc,          311,         2.56,         3.55
        429.mcf,          313,         2.90,         3.06
      445.gobmk,          429,         2.49,         5.99
      456.hmmer,          606,         1.91,         2.07
      458.sjeng,          490,         3.77,         6.24
 462.libquantum,          445,         1.90,         2.40
    464.h264ref,          572,         2.67,         4.51
    471.omnetpp,          312,         1.78,         3.30
      473.astar,          395,         1.70,         2.18
  483.xalancbmk,          221,         2.33,         6.40
       433.milc,          407,         1.95,         2.07
       444.namd,          374,         1.58,         1.72
     447.dealII,          311,         1.84,         3.39
     450.soplex,          223,         2.02,         2.64
     453.povray,          196,         2.86,         6.27
        470.lbm,          303,         2.19,         2.34
    482.sphinx3,          495,         2.05,         2.49

On Fri, Jul 27, 2012 at 1:26 PM, Chandler Carruth <chandlerc at google.com> wrote:
> I'm going to be going over this code with a fine tooth comb over the next
> few days and getting you a thorough review, but I wanted to start off by
> saying: *wow*. Those are phenomenal performance numbers. I'm very excited
> about this.
>
> On Fri, Jul 27, 2012 at 2:20 AM, Kostya Serebryany <kcc at google.com> wrote:
>>
>> Fresh numbers after several performance fixes in msan run-time.
>> The average slowdown of msan is 3x (ranges between 2x and 4x).
>> When running under msan+DynamoRio (msandr) the average slowdown is 3.6x
>>
>> # benchmark  native-time(seconds) msan-slowdown(times)
>> msan+dr-slowdown(times)
>> 400.perlbench   379     3.71    4.75
>> 401.bzip2       483     3.02    2.93
>> 403.gcc         312     2.91    4.22
>> 429.mcf         313     2.97    2.97
>> 445.gobmk       429     3.32    6.41
>> 456.hmmer       606     3.69    2.55
>> 458.sjeng       491     3.98    6.39
>> 462.libquantum  446     2.72    2.18
>> 464.h264ref     573     3.61    5.13
>> 471.omnetpp     317     2.16    2.91
>> 473.astar       412     2.26    2.32
>> 483.xalancbmk   241     3.11    4.99
>> 433.milc        402     2.25    2.31
>> 444.namd        374     2.53    2.22
>> 447.dealII      311     2.82    3.39
>> 450.soplex      232     2.60    2.80
>> 453.povray      197     3.35    5.28
>> 470.lbm         321     2.10    2.28
>> 482.sphinx3     496     4.00    3.35
>>
>> Funny fact: sometimes the DynamoRio tool makes things faster due to "trace
>> optimization". We could probably fix this in the LLVM pass.
>>
>> Thanks,
>> --kcc
>>
>>
>>
>> On Thu, Jul 19, 2012 at 7:42 PM, Evgeniy Stepanov
>> <eugeni.stepanov at gmail.com> wrote:
>>>
>>> We've solved the issue of external code with the help of dynamic
>>> binary instrumentation. A simple (~500LOK) DynamoRio client zero-fills
>>> shadow memory for all stores and return values in uninstrumented code,
>>> thus avoiding false positives.
>>>
>>> I'm attaching the new patch to LLVM with a slightly improved
>>> instrumentation pass. Repository at
>>> http://code.google.com/p/memory-sanitizer/source/browse/ contains the
>>> runtime library and the DynamoRio client.
>>>
>>> In its current state, the tool can handle all C and C++ tests in SPEC
>>> CPU2006, finding at least one use of uninitialized data that's also
>>> detected by Memcheck (Valgrind), and one or two false positives coming
>>> from our inexact handling of certain instructions.
>>>
>>> See the performance numbers. "MSan time" was measured by running
>>> without the dynamic tool, initializing shadow to 0x0 instead of 0xFF
>>> to avoid reporting any errors. "MSan/DR time" is from an honest run of
>>> an instrumented benchmark under the DynamoRio-based tool.
>>>
>>> #    test,     native time,   MSan time,   MSan slowdown,    MSan/DR
>>> time,    MSan/DR slowdown
>>>   400.perlbench,        22.40,       116.00,         5.18
>>> 138.00,         6.16
>>>       401.bzip2,        41.00,       155.00,         3.78
>>> 149.00,         3.63
>>>         403.gcc,         0.79,         3.52,         4.48
>>> 10.50,        13.38
>>>         429.mcf,        13.80,        37.00,         2.68
>>> 46.70,         3.38
>>>       445.gobmk,        83.10,       328.00,         3.95
>>> 440.00,         5.29
>>>       456.hmmer,        59.70,       172.00,         2.88
>>> 145.00,         2.43
>>>       458.sjeng,       105.00,       305.00,         2.90
>>> 391.00,         3.72
>>>  462.libquantum,         1.66,         4.37,         2.63
>>> 3.88,         2.34
>>>     464.h264ref,        71.30,       267.00,         3.74
>>> 358.00,         5.02
>>>     471.omnetpp,        42.20,       246.00,         5.83
>>> 436.00,        10.33
>>>       473.astar,        82.40,       243.00,         2.95
>>> 285.00,         3.46
>>>   483.xalancbmk,        48.40,       644.00,        13.31
>>> 1164.00,        24.05
>>>        433.milc,        14.00,        37.60,         2.69
>>> 40.80,         2.91
>>>        444.namd,        10.20,        33.20,         3.25
>>> 34.20,         3.35
>>>      453.povray,         6.50,        32.20,         4.95
>>> 48.80,         7.51
>>>         470.lbm,        31.00,       107.00,         3.45
>>> 138.00,         4.45
>>>     482.sphinx3,         7.66,        23.90,         3.12
>>> 26.80,         3.50
>>>
>>>
>>> On Wed, Jun 27, 2012 at 10:50 PM, Kostya Serebryany <kcc at google.com>
>>> wrote:
>>> >
>>> >
>>> > On Wed, Jun 27, 2012 at 7:30 PM, John Criswell <criswell at illinois.edu>
>>> > wrote:
>>> >>
>>> >> On 6/26/12 11:20 AM, Kostya Serebryany wrote:
>>> >>
>>> >> John,
>>> >>
>>> >> On Tue, Jun 26, 2012 at 7:45 PM, John Criswell <criswell at illinois.edu>
>>> >> wrote:
>>> >>>
>>> >>> On 6/26/12 4:18 AM, Kostya Serebryany wrote:
>>> >>>
>>> >>> Updated the patch to use the new interface for TLS
>>> >>> (GlobalVariable::GeneralDynamicTLSModel).
>>> >>> Comments?
>>> >>>
>>> >>>
>>> >>> First, there's the non-technical question of whether this belongs in
>>> >>> LLVM
>>> >>> mainline or not in its current state.  Do you have users of this tool
>>> >>> in the
>>> >>> same way that you had users of ASan?  What platforms have you tested
>>> >>> this
>>> >>> tool on?  Do you get false positives when you link with native code
>>> >>> libraries, and if so, is this false positive rate low?  I get the
>>> >>> impression
>>> >>> that this tool isn't as mature as TSan or ASan,
>>> >>
>>> >>
>>> >> Your impression is correct -- msan is not yet usable. It works only on
>>> >> linux-x86_64, only on small libc-free tests.
>>> >> It has been heavily tested (on SPEC and other large programs), but w/o
>>> >> actually poisoning the shadow.
>>> >>
>>> >>>
>>> >>> so I'm wondering whether it's really ready to go in yet.
>>> >>
>>> >>
>>> >> Having these 500 LOC in trunk will simplify further development a lot.
>>> >> (we will also need a few lines of driver changes).
>>> >> I can't tell whether this justifies the commit to trunk now (it
>>> >> certainly
>>> >> does for me).
>>> >>
>>> >>
>>> >> I think it would be better to fix the external code issue first before
>>> >> trying to commit to LLVM trunk; it's the only obstacle I see that
>>> >> prevents
>>> >> you from having a working tool.  That way, you can have a working tool
>>> >> and
>>> >> get feedback on it; you'll also know what the ramifications are for
>>> >> your
>>> >> approach to handling external library code.  As it is right now, I
>>> >> think
>>> >> this new tool is still in too early a stage to add to LLVM.
>>> >
>>> >
>>> > Having this code in trunk will simplify and speedup the development of
>>> > the
>>> > dynamic component(s).
>>> > It may also inspire some research in the area of LLVM-based binary
>>> > instrumentation.
>>> > But I'll agree on what the community says.
>>> >
>>> >
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>>
>>> >>>
>>> >>> Second, on the technical side, I think dealing with the external code
>>> >>> problem has to be addressed before the code can be committed; I
>>> >>> suspect the
>>> >>> tool will return too many false positives if you don't address it (if
>>> >>> you
>>> >>> have experiments indicating otherwise, please let us know).  I'm
>>> >>> assuming
>>> >>> from the comment at the beginning of the pass's implementation that
>>> >>> it's
>>> >>> still unaddressed.
>>> >>
>>> >>
>>> >> We are working on this, in fact experimenting in a few different
>>> >> directions (DynamoRIO, PIN, static instrumentation, even full
>>> >> recompile).
>>> >>
>>> >>
>>> >> Okay.
>>> >>
>>> >>
>>> >>
>>> >>>
>>> >>> I think instrumenting all library code is unrealistic; you'd have to
>>> >>> have
>>> >>> every library on your system compiled with the sanitizer (at least
>>> >>> for large
>>> >>> GUI applications).
>>> >>
>>> >>
>>> >> It may still be doable in some restricted environments, but yes, I
>>> >> agree
>>> >> in general.
>>> >>
>>> >>>
>>> >>> I think there's a few other approaches that you can take:
>>> >>>
>>> >>> 1) Assume external code works properly.  To do this, you first ensure
>>> >>> that memory allocated by external code is always marked as
>>> >>> initialized.
>>> >>> Second, when passing memory objects from internal, instrumented code
>>> >>> to
>>> >>> external library code, you mark the memory as initialized (since
>>> >>> you're
>>> >>> assuming the external code will initialize it).
>>> >>
>>> >>
>>> >> That doesn't work.
>>> >> What if I pass my pointer to a library function.
>>> >> What has to be un-poisoned (marked as inited)?
>>> >> The contents of the heap/stack/bss block to which this pointer
>>> >> belongs?
>>> >>
>>> >>
>>> >> Yes, that's what I was thinking.
>>> >>
>>> >>
>>> >> How about pointers that reside in this block? Do we need to go deeper?
>>> >>
>>> >>
>>> >> Ah, that's a problem.  You might be able to do this conservatively by
>>> >> treating values that point into valid memory objects as pointers (like
>>> >> conservative garbage collection), but it'd be expensive and probably
>>> >> not
>>> >> worth the trouble.
>>> >
>>> >
>>> > It'll be prohibitively expensive, if at all possible.
>>> > Even with all the hacks we put into the allocator code, checking
>>> > whether a
>>> > given integer is a currently addressable pointer is not free. For stack
>>> > buffers that's even more expensive.
>>> > Some pointers may point slightly off the buffer (e.g. to -1'th
>>> > element).
>>> >
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>>
>>> >>>
>>> >>> Of course, you can do things like special casing the C library to
>>> >>> reduce
>>> >>> false negatives, and you can instruct users to compile library code
>>> >>> with the
>>> >>> compiler to find more bugs.
>>> >>>
>>> >>> 2) Use something like SAFECode's CompleteChecks pass to determine
>>> >>> which
>>> >>> checks are on memory objects handled completely within the analyzed
>>> >>> portion
>>> >>> of the program and which checks can access memory handled by external
>>> >>> code.
>>> >>> Checks on memory objects handled by external code always pass (or
>>> >>> just
>>> >>> generate a warning instead of an error, or something like that).
>>> >>> This
>>> >>> approach requires a good points-to analysis algorithm.  SAFECode uses
>>> >>> DSA
>>> >>> which is unification-based.  You can either use that or, if you
>>> >>> prefer, use
>>> >>> a subset based algorithm like Anderson's.  I think there's one
>>> >>> implementation of Anderson's for LLVM out there, but I don't know how
>>> >>> robust
>>> >>> it is.
>>> >>
>>> >>
>>> >> I'll have to look at this more, but I don't (yet) see how it can help.
>>> >> Again, suppose
>>> >>
>>> >> struct A { int stuff; }
>>> >> struct B { A *a; }
>>> >> struct C { A *a, B* b};
>>> >> ...
>>> >> struct Z { A* a, B *b, C *c, ... Y *y;}
>>> >>
>>> >> ...
>>> >> Z *z = new Z;
>>> >> library_function(z);
>>> >>
>>> >> library_function can write to any part of Z, but we can't even
>>> >> traverse
>>> >> all of it.
>>> >> Can we do anything?
>>> >>
>>> >>
>>> >> No, you can't really say anything about the object to which Z points
>>> >> or
>>> >> the objects reachable from the pointer Z.  Any checks on z, z->a, etc.
>>> >> would
>>> >> have to conservatively pass.
>>> >
>>> >
>>> >
>>> >>
>>> >> This approach doesn't help you check for initialization errors with
>>> >> library code; it helps prevent library code from triggering false
>>> >> positives.
>>> >> It ensures that all of your checks are performed on memory objects
>>> >> handled
>>> >> by internal code.
>>> >>
>>> >> For memory safety, relaxing the safety guarantees for memory objects
>>> >> passed to library code still results in a usable system that catches
>>> >> memory
>>> >> safety errors.  I don't know if the same is true for uninitialized
>>> >> value
>>> >> detection; you'll probably have to experiment and see.
>>> >
>>> >
>>> > That's not true for uninitialized value detection.
>>> >
>>> > both asan and tsan know nothing about the code in the libraries and
>>> > work
>>> > just fine.
>>> > They miss some percentage of bugs (based on experience with valgrind
>>> > our
>>> > estimate is 5%-10% of bugs).
>>> > Adding a dynamic component to asan/tsan will add that coverage, and we
>>> > are
>>> > going to do it.
>>> >
>>> > For msan the story is completely different.
>>> > You either cover everything including system calls or don't go there.
>>> > Any memory write which the tool does not observe followed by a read in
>>> > the
>>> > instrumented code leads to a false report.
>>> >
>>> >
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>>
>>> >>>
>>> >>> 3) Use dynamic binary instrumentation to handle external code.
>>> >>>
>>> >>> The first option is probably the easiest to implement.  The second
>>> >>> option
>>> >>> requires a points-to analysis that can go into LLVM mainline.  The
>>> >>> third
>>> >>> option is a fair amount of work but probably easier than using
>>> >>> points-to
>>> >>> analysis and will probably yield a better debugging tool.
>>> >>
>>> >>
>>> >> We *are* working on #3 (see above).
>>> >>
>>> >>
>>> >> I think using dynamic binary translation is probably the best option
>>> >> out
>>> >> of the three.  Just take my opinion with the caveat that I haven't run
>>> >> any
>>> >> experiments on uninitialized value usage.
>>> >
>>> >
>>> > It's actually the worst option, but the only one I know how to
>>> > implement.
>>> > I'd really prefer a fully static approach (static binary translation,
>>> > or
>>> > full recompile) but I don't see any suitable technology. :(
>>> >
>>> >>>
>>> >>>
>>> >>> Third, other technical nit-picks:
>>> >>>
>>> >>> 1) IMHO, in general, every method and function should have a comment
>>> >>> explaining what it does and the meaning of its inputs and return
>>> >>> values.
>>> >>> Descriptions should use complete sentences.
>>> >>
>>> >>
>>> >> Agree, I'll add comments.
>>> >>
>>> >>>
>>> >>>
>>> >>> 2) You may want to consider using a class derived from InstVisitor
>>> >>> instead of writing large if/then/else or switch code.
>>> >>
>>> >>
>>> >> I Didn't know about InstVisitor.
>>> >> However, I am not sure if that would be simpler. Will have a closer
>>> >> look.
>>> >>
>>> >>
>>> >> InstVisitor is faster than using dyn_cast<>() and is the same as using
>>> >> a
>>> >> switch on the instruction opcode (which you code does in some places).
>>> >> It
>>> >> may make your code more readable.
>>> >
>>> >
>>> > I am convinced. The new patch uses InstVisitor.
>>> >
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>>
>>> >>>
>>> >>> 3) If I understand correctly, a "shadow instruction" propagates
>>> >>> information about initialization status through a computation (so,
>>> >>> for
>>> >>> example, if you have a c = add a, b instruction, there's a shadow
>>> >>> instruction that computes c's initialization status based on a and
>>> >>> b's
>>> >>> initialization status).
>>> >>
>>> >>
>>> >> Correct.
>>> >>
>>> >>>
>>> >>> It looks like you add shadow instructions for all instructions and
>>> >>> then
>>> >>> "fill in" the operands to the shadow instructions by iterating over
>>> >>> the
>>> >>> instructions a second time.  Is this correct?
>>> >>
>>> >>
>>> >> Yes.
>>> >>
>>> >>
>>> >> How do you handle passing taint (i.e., initialization status) between
>>> >> actual parameters and formal parameters in a function call?  Do you
>>> >> store
>>> >> the information to memory and then read it in the callee?
>>> >
>>> >
>>> > Yes, the shadow is passed via a dedicated thread-local memory slot.
>>> >
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>>
>>> >>> If so, there's an alternative approach: if you fetch the dominator
>>> >>> tree
>>> >>> for the function and visit instructions in dominator tree order, then
>>> >>> you'll
>>> >>> visit all definitions before all uses.  That will alleviate the need
>>> >>> to make
>>> >>> two passes over all the instructions.  The only tricky part is phi
>>> >>> nodes,
>>> >>> but for those, you add shadow instructions (phi nodes with undef
>>> >>> operands)
>>> >>> to them first and then visit all the other instructions in dominator
>>> >>> tree
>>> >>> order; you fill in the operands of the shadow phi nodes at the end.
>>> >>>
>>> >>> I'm not sure which approach is faster (I don't think LLVM's dominator
>>> >>> tree pass is the most efficient implementation), but it's something
>>> >>> to think
>>> >>> about.
>>> >>
>>> >>
>>> >> I actually did think about this. I did not experiment with using the
>>> >> dominator tree, but I think that the current way is at least as
>>> >> efficient.
>>> >>
>>> >>
>>> >> It's really a question of whether computing the dominator tree is
>>> >> faster
>>> >> or slower than iterating over all instructions twice (or whether
>>> >> dominator
>>> >> tree has already been computed for some previously run transform).  I
>>> >> suppose dominator tree might be better if you're not going to
>>> >> instrument all
>>> >> instructions.
>>> >
>>> >
>>> > Unless I am mistaken depth-first-order is enough.
>>> > The new patch uses that.
>>> >
>>> >
>>> > Thanks!
>>> >
>>> > --kcc
>>> >
>>> > _______________________________________________
>>> > llvm-commits mailing list
>>> > llvm-commits at cs.uiuc.edu
>>> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>> >
>>
>>
>