[llvm-commits] MemorySanitizer, first patch

Wed Jun 27 08:05:10 PDT 2012

Hello,
I've addressed John's comments on the code (thanks again!).
The new patch attached, please comment.
http://codereview.appspot.com/6298091

Thanks,

--kcc

On Tue, Jun 26, 2012 at 8:20 PM, Kostya Serebryany <kcc at google.com> wrote:

> John,
>
> On Tue, Jun 26, 2012 at 7:45 PM, John Criswell <criswell at illinois.edu>wrote:
>
>>  On 6/26/12 4:18 AM, Kostya Serebryany wrote:
>>
>> Updated the patch to use the new interface for TLS
>> (GlobalVariable::GeneralDynamicTLSModel).
>> Comments?
>>
>>
>> First, there's the non-technical question of whether this belongs in LLVM
>> mainline or not in its current state.  Do you have users of this tool in
>> the same way that you had users of ASan?  What platforms have you tested
>> this tool on?  Do you get false positives when you link with native code
>> libraries, and if so, is this false positive rate low?  I get the
>> impression that this tool isn't as mature as TSan or ASan,
>>
>
> Your impression is correct -- msan is not yet usable. It works only on
> linux-x86_64, only on small libc-free tests.
> It has been heavily tested (on SPEC and other large programs), but w/o
> actually poisoning the shadow.
>
>
>> so I'm wondering whether it's really ready to go in yet.
>>
>
> Having these 500 LOC in trunk will simplify further development a lot.
> (we will also need a few lines of driver changes).
> I can't tell whether this justifies the commit to trunk now (it certainly
> does for me).
>
>
>>
>> Second, on the technical side, I think dealing with the external code
>> problem has to be addressed before the code can be committed; I suspect the
>> tool will return too many false positives if you don't address it (if you
>> have experiments indicating otherwise, please let us know).  I'm assuming
>> from the comment at the beginning of the pass's implementation that it's
>> still unaddressed.
>>
>
> We are working on this, in fact experimenting in a few different
> directions (DynamoRIO, PIN, static instrumentation, even full recompile).
>
>
>
>> I think instrumenting all library code is unrealistic; you'd have to have
>> every library on your system compiled with the sanitizer (at least for
>> large GUI applications).
>>
>
> It may still be doable in some restricted environments, but yes, I agree
> in general.
>
>
>>  I think there's a few other approaches that you can take:
>>
>> 1) Assume external code works properly.  To do this, you first ensure
>> that memory allocated by external code is always marked as initialized.
>> Second, when passing memory objects from internal, instrumented code to
>> external library code, you mark the memory as initialized (since you're
>> assuming the external code will initialize it).
>>
>
> That doesn't work.
> What if I pass my pointer to a library function.
> What has to be un-poisoned (marked as inited)?
> The contents of the heap/stack/bss block to which this pointer belongs?
> How about pointers that reside in this block? Do we need to go deeper?
>
>
>>
>> Of course, you can do things like special casing the C library to reduce
>> false negatives, and you can instruct users to compile library code with
>> the compiler to find more bugs.
>>
>> 2) Use something like SAFECode's CompleteChecks pass to determine which
>> checks are on memory objects handled completely within the analyzed portion
>> of the program and which checks can access memory handled by external
>> code.  Checks on memory objects handled by external code always pass (or
>> just generate a warning instead of an error, or something like that).  This
>> approach requires a good points-to analysis algorithm.  SAFECode uses DSA
>> which is unification-based.  You can either use that or, if you prefer, use
>> a subset based algorithm like Anderson's.  I think there's one
>> implementation of Anderson's for LLVM out there, but I don't know how
>> robust it is.
>>
>
> I'll have to look at this more, but I don't (yet) see how it can help.
> Again, suppose
>
> struct A { int stuff; }
> struct B { A *a; }
> struct C { A *a, B* b};
> ...
> struct Z { A* a, B *b, C *c, ... Y *y;}
>
> ...
> Z *z = new Z;
> library_function(z);
>
> library_function can write to any part of Z, but we can't even traverse
> all of it.
> Can we do anything?
>
>
>>
>> 3) Use dynamic binary instrumentation to handle external code.
>>
>> The first option is probably the easiest to implement.  The second option
>> requires a points-to analysis that can go into LLVM mainline.  The third
>> option is a fair amount of work but probably easier than using points-to
>> analysis and will probably yield a better debugging tool.
>>
>
> We *are* working on #3 (see above).
>
>
>>
>> Third, other technical nit-picks:
>>
>> 1) IMHO, in general, every method and function should have a comment
>> explaining what it does and the meaning of its inputs and return values.
>> Descriptions should use complete sentences.
>>
>
> Agree, I'll add comments.
>
>
>>
>> 2) You may want to consider using a class derived from InstVisitor
>> instead of writing large if/then/else or switch code.
>>
>
> I Didn't know about InstVisitor.
> However, I am not sure if that would be simpler. Will have a closer look.
>
>
>
>>
>> 3) If I understand correctly, a "shadow instruction" propagates
>> information about initialization status through a computation (so, for
>> example, if you have a c = add a, b instruction, there's a shadow
>> instruction that computes c's initialization status based on a and b's
>> initialization status).
>>
>
> Correct.
>
>
>> It looks like you add shadow instructions for all instructions and then
>> "fill in" the operands to the shadow instructions by iterating over the
>> instructions a second time.  Is this correct?
>>
>
> Yes.
>
>
>> If so, there's an alternative approach: if you fetch the dominator tree
>> for the function and visit instructions in dominator tree order, then
>> you'll visit all definitions before all uses.  That will alleviate the need
>> to make two passes over all the instructions.  The only tricky part is phi
>> nodes, but for those, you add shadow instructions (phi nodes with undef
>> operands) to them first and then visit all the other instructions in
>> dominator tree order; you fill in the operands of the shadow phi nodes at
>> the end.
>>
>> I'm not sure which approach is faster (I don't think LLVM's dominator
>> tree pass is the most efficient implementation), but it's something to
>> think about.
>>
>
> I actually did think about this. I did not experiment with using
> the dominator tree, but I think that the current way is at least as
> efficient.
> Another point is that this will make the implementation simpler -- I'll
> give it a try.
>
> Thanks!
>
> --kcc
>
>
>
>>
>> -- John T.
>>
>>
>>
>>
>>  On Fri, Jun 22, 2012 at 8:26 PM, Kostya Serebryany <kcc at google.com>wrote:
>>
>>> Any comments? Objections?
>>> May I commit with post-commit review?
>>>
>>>  --kcc
>>>
>>>
>>>  On Tue, Jun 19, 2012 at 11:33 AM, Kostya Serebryany <kcc at google.com>wrote:
>>>
>>>> Hello,
>>>>
>>>>  Please review the following patch which adds MemorySanitizer
>>>> instrumentation pass.
>>>> For the detailed description see
>>>> http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-June/050821.html
>>>> The patch contains instrumentation pass itself and basic LLVM-ish
>>>> tests.
>>>> Clang flag (-fmemory-sanitizer), the run-time library and more tests
>>>> will follow.
>>>>
>>>>  Code review URL: http://codereview.appspot.com/6298091/
>>>>
>>>>  Thanks,
>>>>
>>>>  --kcc
>>>>
>>>
>>>
>>
>>
>> _______________________________________________
>> llvm-commits mailing listllvm-commits at cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120627/54007780/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: issue6298091_5001.diff
Type: application/octet-stream
Size: 25587 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120627/54007780/attachment.obj>