<div>I'm going to be going over this code with a fine tooth comb over the next few days and getting you a thorough review, but I wanted to start off by saying: *wow*. Those are phenomenal performance numbers. I'm very excited about this.</div>
<br><div class="gmail_extra"><div class="gmail_quote">On Fri, Jul 27, 2012 at 2:20 AM, Kostya Serebryany <span dir="ltr"><<a href="mailto:kcc@google.com" target="_blank" class="cremed">kcc@google.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Fresh numbers after several performance fixes in msan run-time. <div>The average slowdown of msan is 3x (ranges between 2x and 4x). </div>
<div>When running under msan+DynamoRio (msandr) the average slowdown is 3.6x</div><div>
<br></div><div><div># benchmark native-time(seconds) msan-slowdown(times) msan+dr-slowdown(times)</div><div>400.perlbench 379 3.71 4.75</div><div><div>401.bzip2 483 3.02 2.93</div><div>403.gcc 312 2.91 4.22</div>
<div>429.mcf 313 2.97 2.97</div><div>445.gobmk 429 3.32 6.41</div><div>456.hmmer 606 3.69 2.55</div><div>458.sjeng 491 3.98 6.39</div><div>462.libquantum 446 2.72 2.18</div>
<div>464.h264ref 573 3.61 5.13</div><div>471.omnetpp 317 2.16 2.91</div><div>473.astar 412 2.26 2.32</div><div>483.xalancbmk 241 3.11 4.99</div><div>433.milc 402 2.25 2.31</div>
<div>444.namd 374 2.53 2.22</div><div>447.dealII 311 2.82 3.39</div><div>450.soplex 232 2.60 2.80</div><div>453.povray 197 3.35 5.28</div><div>470.lbm 321 2.10 2.28</div>
<div>482.sphinx3 496 4.00 3.35</div><div><br></div><div><div>Funny fact: sometimes the DynamoRio tool makes things faster due to "trace optimization". We could probably fix this in the LLVM pass. </div>
<div><br></div><div>Thanks, </div><div>--kcc </div></div><div><div class="h5"><div><br></div><div><br></div><br><div class="gmail_quote">On Thu, Jul 19, 2012 at 7:42 PM, Evgeniy Stepanov <span dir="ltr"><<a href="mailto:eugeni.stepanov@gmail.com" target="_blank" class="cremed">eugeni.stepanov@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">We've solved the issue of external code with the help of dynamic<br>
binary instrumentation. A simple (~500LOK) DynamoRio client zero-fills<br>
shadow memory for all stores and return values in uninstrumented code,<br>
thus avoiding false positives.<br>
<br>
I'm attaching the new patch to LLVM with a slightly improved<br>
instrumentation pass. Repository at<br>
<a href="http://code.google.com/p/memory-sanitizer/source/browse/" target="_blank" class="cremed">http://code.google.com/p/memory-sanitizer/source/browse/</a> contains the<br>
runtime library and the DynamoRio client.<br>
<br>
In its current state, the tool can handle all C and C++ tests in SPEC<br>
CPU2006, finding at least one use of uninitialized data that's also<br>
detected by Memcheck (Valgrind), and one or two false positives coming<br>
from our inexact handling of certain instructions.<br>
<br>
See the performance numbers. "MSan time" was measured by running<br>
without the dynamic tool, initializing shadow to 0x0 instead of 0xFF<br>
to avoid reporting any errors. "MSan/DR time" is from an honest run of<br>
an instrumented benchmark under the DynamoRio-based tool.<br>
<br>
# test, native time, MSan time, MSan slowdown, MSan/DR<br>
time, MSan/DR slowdown<br>
400.perlbench, 22.40, 116.00, 5.18<br>
138.00, 6.16<br>
401.bzip2, 41.00, 155.00, 3.78<br>
149.00, 3.63<br>
403.gcc, 0.79, 3.52, 4.48<br>
10.50, 13.38<br>
429.mcf, 13.80, 37.00, 2.68<br>
46.70, 3.38<br>
445.gobmk, 83.10, 328.00, 3.95<br>
440.00, 5.29<br>
456.hmmer, 59.70, 172.00, 2.88<br>
145.00, 2.43<br>
458.sjeng, 105.00, 305.00, 2.90<br>
391.00, 3.72<br>
462.libquantum, 1.66, 4.37, 2.63<br>
3.88, 2.34<br>
464.h264ref, 71.30, 267.00, 3.74<br>
358.00, 5.02<br>
471.omnetpp, 42.20, 246.00, 5.83<br>
436.00, 10.33<br>
473.astar, 82.40, 243.00, 2.95<br>
285.00, 3.46<br>
483.xalancbmk, 48.40, 644.00, 13.31<br>
1164.00, 24.05<br>
433.milc, 14.00, 37.60, 2.69<br>
40.80, 2.91<br>
444.namd, 10.20, 33.20, 3.25<br>
34.20, 3.35<br>
453.povray, 6.50, 32.20, 4.95<br>
48.80, 7.51<br>
470.lbm, 31.00, 107.00, 3.45<br>
138.00, 4.45<br>
482.sphinx3, 7.66, 23.90, 3.12<br>
26.80, 3.50<br>
<div><div><br>
<br>
On Wed, Jun 27, 2012 at 10:50 PM, Kostya Serebryany <<a href="mailto:kcc@google.com" target="_blank" class="cremed">kcc@google.com</a>> wrote:<br>
><br>
><br>
> On Wed, Jun 27, 2012 at 7:30 PM, John Criswell <<a href="mailto:criswell@illinois.edu" target="_blank" class="cremed">criswell@illinois.edu</a>><br>
> wrote:<br>
>><br>
>> On 6/26/12 11:20 AM, Kostya Serebryany wrote:<br>
>><br>
>> John,<br>
>><br>
>> On Tue, Jun 26, 2012 at 7:45 PM, John Criswell <<a href="mailto:criswell@illinois.edu" target="_blank" class="cremed">criswell@illinois.edu</a>><br>
>> wrote:<br>
>>><br>
>>> On 6/26/12 4:18 AM, Kostya Serebryany wrote:<br>
>>><br>
>>> Updated the patch to use the new interface for TLS<br>
>>> (GlobalVariable::GeneralDynamicTLSModel).<br>
>>> Comments?<br>
>>><br>
>>><br>
>>> First, there's the non-technical question of whether this belongs in LLVM<br>
>>> mainline or not in its current state. Do you have users of this tool in the<br>
>>> same way that you had users of ASan? What platforms have you tested this<br>
>>> tool on? Do you get false positives when you link with native code<br>
>>> libraries, and if so, is this false positive rate low? I get the impression<br>
>>> that this tool isn't as mature as TSan or ASan,<br>
>><br>
>><br>
>> Your impression is correct -- msan is not yet usable. It works only on<br>
>> linux-x86_64, only on small libc-free tests.<br>
>> It has been heavily tested (on SPEC and other large programs), but w/o<br>
>> actually poisoning the shadow.<br>
>><br>
>>><br>
>>> so I'm wondering whether it's really ready to go in yet.<br>
>><br>
>><br>
>> Having these 500 LOC in trunk will simplify further development a lot.<br>
>> (we will also need a few lines of driver changes).<br>
>> I can't tell whether this justifies the commit to trunk now (it certainly<br>
>> does for me).<br>
>><br>
>><br>
>> I think it would be better to fix the external code issue first before<br>
>> trying to commit to LLVM trunk; it's the only obstacle I see that prevents<br>
>> you from having a working tool. That way, you can have a working tool and<br>
>> get feedback on it; you'll also know what the ramifications are for your<br>
>> approach to handling external library code. As it is right now, I think<br>
>> this new tool is still in too early a stage to add to LLVM.<br>
><br>
><br>
> Having this code in trunk will simplify and speedup the development of the<br>
> dynamic component(s).<br>
> It may also inspire some research in the area of LLVM-based binary<br>
> instrumentation.<br>
> But I'll agree on what the community says.<br>
><br>
><br>
>><br>
>><br>
>><br>
>><br>
>>><br>
>>><br>
>>> Second, on the technical side, I think dealing with the external code<br>
>>> problem has to be addressed before the code can be committed; I suspect the<br>
>>> tool will return too many false positives if you don't address it (if you<br>
>>> have experiments indicating otherwise, please let us know). I'm assuming<br>
>>> from the comment at the beginning of the pass's implementation that it's<br>
>>> still unaddressed.<br>
>><br>
>><br>
>> We are working on this, in fact experimenting in a few different<br>
>> directions (DynamoRIO, PIN, static instrumentation, even full recompile).<br>
>><br>
>><br>
>> Okay.<br>
>><br>
>><br>
>><br>
>>><br>
>>> I think instrumenting all library code is unrealistic; you'd have to have<br>
>>> every library on your system compiled with the sanitizer (at least for large<br>
>>> GUI applications).<br>
>><br>
>><br>
>> It may still be doable in some restricted environments, but yes, I agree<br>
>> in general.<br>
>><br>
>>><br>
>>> I think there's a few other approaches that you can take:<br>
>>><br>
>>> 1) Assume external code works properly. To do this, you first ensure<br>
>>> that memory allocated by external code is always marked as initialized.<br>
>>> Second, when passing memory objects from internal, instrumented code to<br>
>>> external library code, you mark the memory as initialized (since you're<br>
>>> assuming the external code will initialize it).<br>
>><br>
>><br>
>> That doesn't work.<br>
>> What if I pass my pointer to a library function.<br>
>> What has to be un-poisoned (marked as inited)?<br>
>> The contents of the heap/stack/bss block to which this pointer belongs?<br>
>><br>
>><br>
>> Yes, that's what I was thinking.<br>
>><br>
>><br>
>> How about pointers that reside in this block? Do we need to go deeper?<br>
>><br>
>><br>
>> Ah, that's a problem. You might be able to do this conservatively by<br>
>> treating values that point into valid memory objects as pointers (like<br>
>> conservative garbage collection), but it'd be expensive and probably not<br>
>> worth the trouble.<br>
><br>
><br>
> It'll be prohibitively expensive, if at all possible.<br>
> Even with all the hacks we put into the allocator code, checking whether a<br>
> given integer is a currently addressable pointer is not free. For stack<br>
> buffers that's even more expensive.<br>
> Some pointers may point slightly off the buffer (e.g. to -1'th element).<br>
><br>
>><br>
>><br>
>><br>
>><br>
>>><br>
>>><br>
>>> Of course, you can do things like special casing the C library to reduce<br>
>>> false negatives, and you can instruct users to compile library code with the<br>
>>> compiler to find more bugs.<br>
>>><br>
>>> 2) Use something like SAFECode's CompleteChecks pass to determine which<br>
>>> checks are on memory objects handled completely within the analyzed portion<br>
>>> of the program and which checks can access memory handled by external code.<br>
>>> Checks on memory objects handled by external code always pass (or just<br>
>>> generate a warning instead of an error, or something like that). This<br>
>>> approach requires a good points-to analysis algorithm. SAFECode uses DSA<br>
>>> which is unification-based. You can either use that or, if you prefer, use<br>
>>> a subset based algorithm like Anderson's. I think there's one<br>
>>> implementation of Anderson's for LLVM out there, but I don't know how robust<br>
>>> it is.<br>
>><br>
>><br>
>> I'll have to look at this more, but I don't (yet) see how it can help.<br>
>> Again, suppose<br>
>><br>
>> struct A { int stuff; }<br>
>> struct B { A *a; }<br>
>> struct C { A *a, B* b};<br>
>> ...<br>
>> struct Z { A* a, B *b, C *c, ... Y *y;}<br>
>><br>
>> ...<br>
>> Z *z = new Z;<br>
>> library_function(z);<br>
>><br>
>> library_function can write to any part of Z, but we can't even traverse<br>
>> all of it.<br>
>> Can we do anything?<br>
>><br>
>><br>
>> No, you can't really say anything about the object to which Z points or<br>
>> the objects reachable from the pointer Z. Any checks on z, z->a, etc. would<br>
>> have to conservatively pass.<br>
><br>
><br>
><br>
>><br>
>> This approach doesn't help you check for initialization errors with<br>
>> library code; it helps prevent library code from triggering false positives.<br>
>> It ensures that all of your checks are performed on memory objects handled<br>
>> by internal code.<br>
>><br>
>> For memory safety, relaxing the safety guarantees for memory objects<br>
>> passed to library code still results in a usable system that catches memory<br>
>> safety errors. I don't know if the same is true for uninitialized value<br>
>> detection; you'll probably have to experiment and see.<br>
><br>
><br>
> That's not true for uninitialized value detection.<br>
><br>
> both asan and tsan know nothing about the code in the libraries and work<br>
> just fine.<br>
> They miss some percentage of bugs (based on experience with valgrind our<br>
> estimate is 5%-10% of bugs).<br>
> Adding a dynamic component to asan/tsan will add that coverage, and we are<br>
> going to do it.<br>
><br>
> For msan the story is completely different.<br>
> You either cover everything including system calls or don't go there.<br>
> Any memory write which the tool does not observe followed by a read in the<br>
> instrumented code leads to a false report.<br>
><br>
><br>
>><br>
>><br>
>><br>
>><br>
>><br>
>>><br>
>>><br>
>>> 3) Use dynamic binary instrumentation to handle external code.<br>
>>><br>
>>> The first option is probably the easiest to implement. The second option<br>
>>> requires a points-to analysis that can go into LLVM mainline. The third<br>
>>> option is a fair amount of work but probably easier than using points-to<br>
>>> analysis and will probably yield a better debugging tool.<br>
>><br>
>><br>
>> We *are* working on #3 (see above).<br>
>><br>
>><br>
>> I think using dynamic binary translation is probably the best option out<br>
>> of the three. Just take my opinion with the caveat that I haven't run any<br>
>> experiments on uninitialized value usage.<br>
><br>
><br>
> It's actually the worst option, but the only one I know how to implement.<br>
> I'd really prefer a fully static approach (static binary translation, or<br>
> full recompile) but I don't see any suitable technology. :(<br>
><br>
>>><br>
>>><br>
>>> Third, other technical nit-picks:<br>
>>><br>
>>> 1) IMHO, in general, every method and function should have a comment<br>
>>> explaining what it does and the meaning of its inputs and return values.<br>
>>> Descriptions should use complete sentences.<br>
>><br>
>><br>
>> Agree, I'll add comments.<br>
>><br>
>>><br>
>>><br>
>>> 2) You may want to consider using a class derived from InstVisitor<br>
>>> instead of writing large if/then/else or switch code.<br>
>><br>
>><br>
>> I Didn't know about InstVisitor.<br>
>> However, I am not sure if that would be simpler. Will have a closer look.<br>
>><br>
>><br>
>> InstVisitor is faster than using dyn_cast<>() and is the same as using a<br>
>> switch on the instruction opcode (which you code does in some places). It<br>
>> may make your code more readable.<br>
><br>
><br>
> I am convinced. The new patch uses InstVisitor.<br>
><br>
>><br>
>><br>
>><br>
>><br>
>>><br>
>>><br>
>>> 3) If I understand correctly, a "shadow instruction" propagates<br>
>>> information about initialization status through a computation (so, for<br>
>>> example, if you have a c = add a, b instruction, there's a shadow<br>
>>> instruction that computes c's initialization status based on a and b's<br>
>>> initialization status).<br>
>><br>
>><br>
>> Correct.<br>
>><br>
>>><br>
>>> It looks like you add shadow instructions for all instructions and then<br>
>>> "fill in" the operands to the shadow instructions by iterating over the<br>
>>> instructions a second time. Is this correct?<br>
>><br>
>><br>
>> Yes.<br>
>><br>
>><br>
>> How do you handle passing taint (i.e., initialization status) between<br>
>> actual parameters and formal parameters in a function call? Do you store<br>
>> the information to memory and then read it in the callee?<br>
><br>
><br>
> Yes, the shadow is passed via a dedicated thread-local memory slot.<br>
><br>
>><br>
>><br>
>><br>
>><br>
>>><br>
>>> If so, there's an alternative approach: if you fetch the dominator tree<br>
>>> for the function and visit instructions in dominator tree order, then you'll<br>
>>> visit all definitions before all uses. That will alleviate the need to make<br>
>>> two passes over all the instructions. The only tricky part is phi nodes,<br>
>>> but for those, you add shadow instructions (phi nodes with undef operands)<br>
>>> to them first and then visit all the other instructions in dominator tree<br>
>>> order; you fill in the operands of the shadow phi nodes at the end.<br>
>>><br>
>>> I'm not sure which approach is faster (I don't think LLVM's dominator<br>
>>> tree pass is the most efficient implementation), but it's something to think<br>
>>> about.<br>
>><br>
>><br>
>> I actually did think about this. I did not experiment with using the<br>
>> dominator tree, but I think that the current way is at least as efficient.<br>
>><br>
>><br>
>> It's really a question of whether computing the dominator tree is faster<br>
>> or slower than iterating over all instructions twice (or whether dominator<br>
>> tree has already been computed for some previously run transform). I<br>
>> suppose dominator tree might be better if you're not going to instrument all<br>
>> instructions.<br>
><br>
><br>
> Unless I am mistaken depth-first-order is enough.<br>
> The new patch uses that.<br>
><br>
><br>
> Thanks!<br>
><br>
> --kcc<br>
><br>
</div></div><div><div>> _______________________________________________<br>
> llvm-commits mailing list<br>
> <a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank" class="cremed">llvm-commits@cs.uiuc.edu</a><br>
> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank" class="cremed">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>
><br>
</div></div></blockquote></div><br></div></div></div></div>
</blockquote></div><br></div>