<div style="font-family: arial, helvetica, sans-serif"><font size="2"><div>Hello, </div>I've addressed John's comments on the code (thanks again!). <div>The new patch attached, please comment. </div><div><a href="http://codereview.appspot.com/6298091">http://codereview.appspot.com/6298091</a></div>

<div><br></div><div>Thanks, </div><div><br></div><div>--kcc </div><div><br><br><div class="gmail_quote">On Tue, Jun 26, 2012 at 8:20 PM, Kostya Serebryany <span dir="ltr"><<a href="mailto:kcc@google.com" target="_blank">kcc@google.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="font-family:arial,helvetica,sans-serif"><font size="2">John, <br><br><div class="gmail_quote"><div class="im">

On Tue, Jun 26, 2012 at 7:45 PM, John Criswell <span dir="ltr"><<a href="mailto:criswell@illinois.edu" target="_blank">criswell@illinois.edu</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div bgcolor="#FFFFFF" text="#000000"><div>

    <div>On 6/26/12 4:18 AM, Kostya Serebryany

      wrote:<br>

    </div>

    <blockquote type="cite">

      <div style="font-family:arial,helvetica,sans-serif"><font size="2">Updated the patch to use the new interface for TLS

          (GlobalVariable::GeneralDynamicTLSModel).

          <div>Comments? <br>

          </div>

        </font></div>

    </blockquote>

    <br></div>

    First, there's the non-technical question of whether this belongs in

    LLVM mainline or not in its current state.  Do you have users of

    this tool in the same way that you had users of ASan?  What

    platforms have you tested this tool on?  Do you get false positives

    when you link with native code libraries, and if so, is this false

    positive rate low?  I get the impression that this tool isn't as

    mature as TSan or ASan, </div></blockquote><div><br></div></div><div><div>Your impression is correct -- msan is not yet usable. It works only on linux-x86_64, only on small libc-free tests. </div><div>It has been heavily tested (on SPEC and other large programs), but w/o actually poisoning the shadow.</div>

<div> </div></div><div class="im"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">so I'm wondering whether it's really ready

    to go in yet.<br></div></blockquote><div><br></div></div><div>Having these 500 LOC in trunk will simplify further development a lot. </div><div>(we will also need a few lines of driver changes). </div><div>I can't tell whether this justifies the commit to trunk now (it certainly does for me). </div>

<div class="im">

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">

    <br>

    Second, on the technical side, I think dealing with the external

    code problem has to be addressed before the code can be committed; I

    suspect the tool will return too many false positives if you don't

    address it (if you have experiments indicating otherwise, please let

    us know).  I'm assuming from the comment at the beginning of the

    pass's implementation that it's still unaddressed.<br></div></blockquote><div><br></div></div><div>We are working on this, in fact experimenting in a few different directions (DynamoRIO, PIN, static instrumentation, even full recompile).</div>

<div class="im">

<div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">

    I think instrumenting all library code is unrealistic; you'd have to

    have every library on your system compiled with the sanitizer (at

    least for large GUI applications). </div></blockquote><div><br></div></div><div>It may still be doable in some restricted environments, but yes, I agree in general. </div><div class="im"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div bgcolor="#FFFFFF" text="#000000"> I think there's a few other

    approaches that you can take:<br>

    <br>

    1) Assume external code works properly.  To do this, you first

    ensure that memory allocated by external code is always marked as

    initialized.  Second, when passing memory objects from internal,

    instrumented code to external library code, you mark the memory as

    initialized (since you're assuming the external code will initialize

    it).<br></div></blockquote><div><br></div></div><div>That doesn't work. </div><div>What if I pass my pointer to a library function. </div><div>What has to be un-poisoned (marked as inited)? </div><div>The contents of the heap/stack/bss block to which this pointer belongs? </div>

<div>How about pointers that reside in this block? Do we need to go deeper? </div><div class="im"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">

    <br>

    Of course, you can do things like special casing the C library to

    reduce false negatives, and you can instruct users to compile

    library code with the compiler to find more bugs.<br>

    <br>

    2) Use something like SAFECode's CompleteChecks pass to determine

    which checks are on memory objects handled completely within the

    analyzed portion of the program and which checks can access memory

    handled by external code.  Checks on memory objects handled by

    external code always pass (or just generate a warning instead of an

    error, or something like that).  This approach requires a good

    points-to analysis algorithm.  SAFECode uses DSA which is

    unification-based.  You can either use that or, if you prefer, use a

    subset based algorithm like Anderson's.  I think there's one

    implementation of Anderson's for LLVM out there, but I don't know

    how robust it is.<br></div></blockquote><div><br></div></div><div>I'll have to look at this more, but I don't (yet) see how it can help. </div><div>Again, suppose</div><div><br></div><div>struct A { int stuff; }</div>

<div>struct B { A *a; }</div><div>struct C { A *a, B* b};</div><div>...</div><div>struct Z { A* a, B *b, C *c, ... Y *y;}</div><div><br></div><div>...</div><div>Z *z = new Z;</div><div>library_function(z);</div><div><br>

</div>

<div>library_function can write to any part of Z, but we can't even traverse all of it. </div><div>Can we do anything? </div><div class="im"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div bgcolor="#FFFFFF" text="#000000">

    <br>

    3) Use dynamic binary instrumentation to handle external code.<br>

    <br>

    The first option is probably the easiest to implement.  The second

    option requires a points-to analysis that can go into LLVM

    mainline.  The third option is a fair amount of work but probably

    easier than using points-to analysis and will probably yield a

    better debugging tool.<br></div></blockquote><div><br></div></div><div>We *are* working on #3 (see above). </div><div class="im"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div bgcolor="#FFFFFF" text="#000000">

    <br>

    Third, other technical nit-picks:<br>

    <br>

    1) IMHO, in general, every method and function should have a comment

    explaining what it does and the meaning of its inputs and return

    values.  Descriptions should use complete sentences.<br></div></blockquote><div><br></div></div><div>Agree, I'll add comments. </div><div class="im"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div bgcolor="#FFFFFF" text="#000000">

    <br>

    2) You may want to consider using a class derived from InstVisitor

    instead of writing large if/then/else or switch code.<br></div></blockquote><div><br></div></div><div>I Didn't know about InstVisitor. </div><div>However, I am not sure if that would be simpler. Will have a closer look. </div>

<div class="im">

<div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">

    <br>

    3) If I understand correctly, a "shadow instruction" propagates

    information about initialization status through a computation (so,

    for example, if you have a c = add a, b instruction, there's a

    shadow instruction that computes c's initialization status based on

    a and b's initialization status).  </div></blockquote><div><br></div></div><div>Correct. </div><div class="im"><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div bgcolor="#FFFFFF" text="#000000">

It looks like you add shadow

    instructions for all instructions and then "fill in" the operands to

    the shadow instructions by iterating over the instructions a second

    time.  Is this correct?<br></div></blockquote><div><br></div></div><div>Yes.  </div><div class="im"><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div bgcolor="#FFFFFF" text="#000000">

    <br>

    If so, there's an alternative approach: if you fetch the dominator

    tree for the function and visit instructions in dominator tree

    order, then you'll visit all definitions before all uses.  That will

    alleviate the need to make two passes over all the instructions. 

    The only tricky part is phi nodes, but for those, you add shadow

    instructions (phi nodes with undef operands) to them first and then

    visit all the other instructions in dominator tree order; you fill

    in the operands of the shadow phi nodes at the end.<br>

    <br>

    I'm not sure which approach is faster (I don't think LLVM's

    dominator tree pass is the most efficient implementation), but it's

    something to think about.<br></div></blockquote><div><br></div></div><div>I actually did think about this. I did not experiment with using the dominator tree, but I think that the current way is at least as efficient.</div>

<div>Another point is that this will make the implementation simpler -- I'll give it a try. </div><div><br></div><div>Thanks! </div><div><br></div><div>--kcc </div><div class="im"><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div bgcolor="#FFFFFF" text="#000000">

    <br>

    -- John T.<br>

    <br>

    <blockquote type="cite"><div>

      <div style="font-family:arial,helvetica,sans-serif"><font size="2">

          <div><br>

          </div>

          <div><br>

            <br>

            <div class="gmail_quote">

              On Fri, Jun 22, 2012 at 8:26 PM, Kostya Serebryany <span dir="ltr"><<a href="mailto:kcc@google.com" target="_blank">kcc@google.com</a>></span>

              wrote:<br>

              <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                <div style="font-family:arial,helvetica,sans-serif"><font size="2">Any comments? Objections? 

                    <div>May I commit with post-commit review? </div>

                    <div><br>

                    </div>

                    <div>--kcc 

                      <div>

                        <div><br>

                          <br>

                          <div class="gmail_quote">

                            On Tue, Jun 19, 2012 at 11:33 AM, Kostya

                            Serebryany <span dir="ltr"><<a href="mailto:kcc@google.com" target="_blank">kcc@google.com</a>></span>

                            wrote:<br>

                            <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

                              <div style="font-family:arial,helvetica,sans-serif"><font size="2">Hello, 

                                  <div><br>

                                  </div>

                                  <div>Please review the following patch

                                    which adds MemorySanitizer

                                    instrumentation pass. </div>

                                  <div>For the detailed description see <a href="http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-June/050821.html" target="_blank">http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-June/050821.html</a></div>

                                  <div>The patch contains

                                    instrumentation pass itself and

                                    basic LLVM-ish tests. </div>

                                  <div>Clang flag (-fmemory-sanitizer),

                                    the run-time library and more tests

                                    will follow. </div>

                                  <div><br>

                                  </div>

                                  <div>Code review URL: <a href="http://codereview.appspot.com/6298091/" target="_blank">http://codereview.appspot.com/6298091/</a></div>

                                  <div><br>

                                  </div>

                                  <div>Thanks, </div>

                                  <div><br>

                                  </div>

                                  <div>--kcc </div>

                                </font></div>

                            </blockquote>

                          </div>

                          <br>

                        </div>

                      </div>

                    </div>

                  </font></div>

              </blockquote>

            </div>

            <br>

          </div>

        </font></div>

      <br>

      <fieldset></fieldset>

      <br>

      </div><pre>_______________________________________________

llvm-commits mailing list

<a href="mailto:llvm-commits@cs.uiuc.edu" target="_blank">llvm-commits@cs.uiuc.edu</a>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a>

</pre>

    </blockquote>

    <br>

    <br>

  </div>

</blockquote></div></div><br></font></div>

</blockquote></div><br></div></font></div>