[llvm-commits] MemorySanitizer, first patch

John Criswell criswell at illinois.edu
Tue Jun 26 08:45:53 PDT 2012


On 6/26/12 4:18 AM, Kostya Serebryany wrote:
> Updated the patch to use the new interface for TLS 
> (GlobalVariable::GeneralDynamicTLSModel).
> Comments?

First, there's the non-technical question of whether this belongs in 
LLVM mainline or not in its current state.  Do you have users of this 
tool in the same way that you had users of ASan?  What platforms have 
you tested this tool on?  Do you get false positives when you link with 
native code libraries, and if so, is this false positive rate low?  I 
get the impression that this tool isn't as mature as TSan or ASan, so 
I'm wondering whether it's really ready to go in yet.

Second, on the technical side, I think dealing with the external code 
problem has to be addressed before the code can be committed; I suspect 
the tool will return too many false positives if you don't address it 
(if you have experiments indicating otherwise, please let us know).  I'm 
assuming from the comment at the beginning of the pass's implementation 
that it's still unaddressed.

I think instrumenting all library code is unrealistic; you'd have to 
have every library on your system compiled with the sanitizer (at least 
for large GUI applications).  I think there's a few other approaches 
that you can take:

1) Assume external code works properly.  To do this, you first ensure 
that memory allocated by external code is always marked as initialized.  
Second, when passing memory objects from internal, instrumented code to 
external library code, you mark the memory as initialized (since you're 
assuming the external code will initialize it).

Of course, you can do things like special casing the C library to reduce 
false negatives, and you can instruct users to compile library code with 
the compiler to find more bugs.

2) Use something like SAFECode's CompleteChecks pass to determine which 
checks are on memory objects handled completely within the analyzed 
portion of the program and which checks can access memory handled by 
external code.  Checks on memory objects handled by external code always 
pass (or just generate a warning instead of an error, or something like 
that).  This approach requires a good points-to analysis algorithm.  
SAFECode uses DSA which is unification-based.  You can either use that 
or, if you prefer, use a subset based algorithm like Anderson's.  I 
think there's one implementation of Anderson's for LLVM out there, but I 
don't know how robust it is.

3) Use dynamic binary instrumentation to handle external code.

The first option is probably the easiest to implement.  The second 
option requires a points-to analysis that can go into LLVM mainline.  
The third option is a fair amount of work but probably easier than using 
points-to analysis and will probably yield a better debugging tool.

Third, other technical nit-picks:

1) IMHO, in general, every method and function should have a comment 
explaining what it does and the meaning of its inputs and return 
values.  Descriptions should use complete sentences.

2) You may want to consider using a class derived from InstVisitor 
instead of writing large if/then/else or switch code.

3) If I understand correctly, a "shadow instruction" propagates 
information about initialization status through a computation (so, for 
example, if you have a c = add a, b instruction, there's a shadow 
instruction that computes c's initialization status based on a and b's 
initialization status).  It looks like you add shadow instructions for 
all instructions and then "fill in" the operands to the shadow 
instructions by iterating over the instructions a second time.  Is this 
correct?

If so, there's an alternative approach: if you fetch the dominator tree 
for the function and visit instructions in dominator tree order, then 
you'll visit all definitions before all uses.  That will alleviate the 
need to make two passes over all the instructions. The only tricky part 
is phi nodes, but for those, you add shadow instructions (phi nodes with 
undef operands) to them first and then visit all the other instructions 
in dominator tree order; you fill in the operands of the shadow phi 
nodes at the end.

I'm not sure which approach is faster (I don't think LLVM's dominator 
tree pass is the most efficient implementation), but it's something to 
think about.

-- John T.

>
>
>
> On Fri, Jun 22, 2012 at 8:26 PM, Kostya Serebryany <kcc at google.com 
> <mailto:kcc at google.com>> wrote:
>
>     Any comments? Objections?
>     May I commit with post-commit review?
>
>     --kcc
>
>
>     On Tue, Jun 19, 2012 at 11:33 AM, Kostya Serebryany
>     <kcc at google.com <mailto:kcc at google.com>> wrote:
>
>         Hello,
>
>         Please review the following patch which adds MemorySanitizer
>         instrumentation pass.
>         For the detailed description see
>         http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-June/050821.html
>         The patch contains instrumentation pass itself and basic
>         LLVM-ish tests.
>         Clang flag (-fmemory-sanitizer), the run-time library and more
>         tests will follow.
>
>         Code review URL: http://codereview.appspot.com/6298091/
>
>         Thanks,
>
>         --kcc
>
>
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120626/a26fb6db/attachment.html>


More information about the llvm-commits mailing list