<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    On 4/6/12 12:50 AM, Kostya Serebryany wrote:

    <blockquote

cite="mid:CAN=P9phQ-VRe11_nS0YhfWM=T5G=WAiS0XN+qtM4zdJ_JsXthQ@mail.gmail.com"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html;

        charset=ISO-8859-1">

      I'd like some similar work to be done, although I view it a bit

      differently. 

      <div>This might be a separate analysis pass that knows nothing

        about ASAN or SAFECode</div>

      <div>and appends metadata nodes to memory access instructions

        saying things like</div>

    </blockquote>

    <br>

    This is a good idea but is the wrong way to implement the idea. 

    LLVM passes are not required to preserve metadata, and even if they

    were required to do so, there would always be a pass with a bug that

    would fail to preserve the metadata properly.  It's an approach that

    can lead to undesired headaches.  Furthermore, you're not guaranteed

    that an instruction that was deemed safe earlier is safe after

    transformation; there are optimizations that LLVM can do on C code

    exhibiting undefined behavior that can change it from memory safe to

    memory-unsafe code.<br>

    <br>

    The correct way to do this is by writing generic LLVM analysis

    passes that compute this information and can be queried by

    SAFECode/ASAN-specific instrumentation and optimization passes.  In

    this fashion, the LLVM pass manager will re-run the analyses if an

    earlier transform comes along and modifies the IR.  It is also far

    more robust since analysis information cannot be discarded by LLVM

    passes written by other people.<br>

    <br>

    <blockquote

cite="mid:CAN=P9phQ-VRe11_nS0YhfWM=T5G=WAiS0XN+qtM4zdJ_JsXthQ@mail.gmail.com"

      type="cite">

      <div>   - this access can not go out of buffer bounds</div>

      <div>   - this access can not touch free-ed memory</div>

      <div>   - this access can not participate in a race </div>

      <div>   - this read always reads initialized memory </div>

      <div>Then the actual instrumentation passes will simply consult

        the metadata. <br>

      </div>

    </blockquote>

    <br>

    One of the design principles I've been trying to follow in

    refactoring SAFECode is that we have dumb instrumentation passes

    that just instrument everything followed by optimization passes that

    remove run-time checks that are unnecessary.  This follows the

    compiler-building principle called Separation of Concerns, and it's

    useful in tools like SAFECode because it allows us to easily turn

    optimizations on/off by running/not running individual passes.  This

    makes performance analysis easier (we can see the effect of an

    optimization by not running a pass), and it makes it possible for

    bugpoint to figure out which optimization is causing a program to

    break.<br>

    <br>

    SAFECode used to have a single instrumentation pass that inserted

    both load/store and GEP checks with various options to

    enable/disable optimizations.  It made the code complex and

    difficult to read.  The new passes are reusable and so blindingly

    simple that a child can understand what they do.  I highly recommend

    that ASAN not make the mistake that SAFECode originally made.<br>

     <br>

    Finally, the common infrastructure idea I was talking about on the

    SAFECode open projects page is to have a common set of run-time

    check function names and set of instrumentation passes to add them

    and optimize them.  In this way, SAFECode/SoftBound/ASAN can share

    not only the same analysis passes (e.g., an always-safe load/store

    analysis) but the actual optimization and instrumentation passes,

    too.  SAFECode/ASAN specific transforms can be run after the generic

    instrumentation passes to specialize the checks for the specific

    tool (e.g., SAFECode would have a pass that adds pool handles to the

    run-time checks).<br>

    <br>

    <blockquote

cite="mid:CAN=P9phQ-VRe11_nS0YhfWM=T5G=WAiS0XN+qtM4zdJ_JsXthQ@mail.gmail.com"

      type="cite">

      <div><br>

      </div>

      <div>Equally important would be an exhaustive test suite. </div>

      <div>Not sure if it should be in LLVM IR or in C (if in C, other

        compilers will benefit too).</div>

    </blockquote>

    <br>

    Wilander has a new suite of tests out that might be useful.<br>

    <br>

    -- John T.<br>

    <br>

    <blockquote

cite="mid:CAN=P9phQ-VRe11_nS0YhfWM=T5G=WAiS0XN+qtM4zdJ_JsXthQ@mail.gmail.com"

      type="cite">

      <div><br>

        <div class="gmail_quote">On Thu, Apr 5, 2012 at 6:49 PM, Ott

          Tinn <span dir="ltr"><<a moz-do-not-send="true"

              href="mailto:llvm@otinn.com">llvm@otinn.com</a>></span>

          wrote:<br>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            This is a proposal to create memory safety instrumentation

            and<br>

            optimization passes for LLVM.<br>

            <br>

            Abstract:<br>

            The goal of this project is to modify SAFECode and

            AddressSanitizer<br>

            (ASAN) to use a common set of memory safety instrumentation

            and<br>

            optimization passes to increase code reuse. These tools and

            other<br>

            similar ones use varying methods to detect whether memory

            accesses are<br>

            safe, but are fundamentally trying to do the same thing:

            check whether<br>

            each memory access is safe. It is desirable to optimize away

            redundant<br>

            runtime checks to improve such tools' runtime performance.

            This means<br>

            that there is a need for shared memory safety

            instrumentation and<br>

            optimization passes.<br>

            <br>

            <br>

            Proposal:<br>

            The general idea is to make SAFECode and ASAN use the

            following design:<br>

            1. Add checks to memory accesses (loads, stores, and some

            intrinsics).<br>

            2. Run the memory safety check optimization passes.<br>

            3. Transform the remaining checks to tool-specific runtime

            calls.<br>

            4. Do whatever the specific tool did before.<br>

            <br>

            This design would make it possible for SAFECode, ASAN, and

            other<br>

            similar tools to share the memory safety instrumentation and<br>

            optimization passes. The main benefit of the code reuse is

            that the<br>

            memory-safety-specific optimizations could be used by all

            such tools.<br>

            <br>

            The project proposes to modify SAFECode and ASAN as a proof

            of<br>

            concept. It might also be useful to modify SoftBound,

            ThreadSanitizer,<br>

            or some other tool but I have not analysed how

            difficult/useful that<br>

            would be. That is why they are excluded from the current

            proposal.<br>

            <br>

            Implementation plan:<br>

            1. Create the common instrumentation pass.<br>

            2. Add a pass to convert the common checks to ASAN-specific

            ones.<br>

            3. Add a pass to convert the common checks to

            SAFECode-specific ones.<br>

            4. Convert some of the simpler optimizations from SAFECode

            to run on<br>

            the common checks.<br>

            5. Add more optimizations (from SAFECode or otherwise).<br>

            <br>

            The plan is to make sure that it is possible to commit early

            and often<br>

            without breaking anything (unless absolutely needed). The

            conversion<br>

            passes are needed to make the tool work but a side-effect is

            that the<br>

            existing tool-specific optimizations should continue working

            without<br>

            changes.<br>

            <br>

            The "simpler" optimizations are defined to be the ones that

            are easy<br>

            for humans to verify and do not have large extra

            dependencies like<br>

            Poolalloc or SMT solvers.<br>

            <br>

            Optimizations that will definitely be implemented such that

            they work<br>

            on the common memory safety checks (milestone 3 or 4):<br>

            * Remove obviously redundant checks in the same basic block.<br>

            * Remove unnecessary constant checks to global variables /

            allocas.<br>

            * Combine struct member checks in the same basic block.<br>

          </blockquote>

          <div><br>

          </div>

          <div>Beware, that some of such cases will be covered by GVN

            (load widening, etc). Although some will not. </div>

          <div>E.g. </div>

          <div>

            struct S {</div>

          <div>  int alignment;</div>

          <div>  short a, b;</div>

          <div>};</div>

          <div> </div>

          <div>S *x;</div>

          <div>...</div>

          <div>x->a = ... </div>

          <div>... = x->b</div>

          <div><br>

          </div>

          <div>These two accesses can be combined for ASAN, but not for

            TSAN. </div>

          <div><br>

          </div>

          <div>  </div>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            * Hoisting constant checks from loops.<br>

          </blockquote>

          <div><br>

          </div>

          <div>In most cases, this should be handled by general LLVM

            loop invariant code motion. </div>

          <div> </div>

          <blockquote class="gmail_quote" style="margin:0 0 0

            .8ex;border-left:1px #ccc solid;padding-left:1ex">

            * Something more.<br>

            <br>

            An additional plan that outlines the optimizations to be

            added in the<br>

            later part of the program will be produced and agreed upon

            before the<br>

            mid-term evaluations. The general idea is to add slightly

            more<br>

            complicated optimizations that are useful in practice rather

            than<br>

            large and complicated optimizations that are difficult to

            verify by<br>

            humans.<br>

            <br>

            Timeline:<br>

            Milestone 1 (June 1): The common instrumentation pass works

            and there<br>

            are tests to verify it.<br>

            Milestone 2 (June 22): The tool-specific conversion passes

            work and<br>

            there are tests to verify it.<br>

            Milestone 3 (July 6): Some simple optimization passes from

            SAFECode<br>

            work on the common checks; there are unit tests to verify

            that.<br>

            Finished (and agreed upon) a specific plan that outlines

            which<br>

            optimizations will be converted / created for milestone 4.<br>

            Mid-term evaluations deadline (July 13)<br>

            Milestone 4 (August 13): Added more optimizations (and

            relevant unit<br>

            tests) as specified in the additional plan.<br>

            Firm 'pencils down' date (August 20): More testing and

            documentation.<br>

            <br>

            Basically the idea is to produce something practically

            useful and<br>

            thoroughly tested that will definitely be done in time.<br>

            <br>

            <br>

            Contact information:<br>

            Included in the official submission.<br>

            <br>

            <br>

            Interesting to me:<br>

            I am generally interested in developing bug

            finding/detecting systems<br>

            but this project would also have been useful to me for a

            project I<br>

            completed previously (see the experience section). I have

            previously<br>

            used SAFECode for automatically checking whether a program

            has a<br>

            buffer overflow on a specific run.  I was interested in

            reusing the<br>

            static memory safety optimization parts of SAFECode but it

            seemed to<br>

            be too tightly integrated to be easily reused for my

            purposes.<br>

            <br>

            <br>

            Useful for LLVM:<br>

            This project would be useful for LLVM in general because it

            would make<br>

            it easier to develop memory safety tools based on LLVM

            because of the<br>

            available relevant transforms. Reducing the amount of code

            each<br>

            subproject has to add should make it more likely that the

            subprojects<br>

            stay compatible with the latest LLVM changes.<br>

            <br>

            It would be useful for ASAN mostly because the optimizations

            should<br>

            reduce the runtime overhead.<br>

            <br>

            It would be useful for SAFECode because the code should

            become a bit<br>

            more modular and there should be more code reuse. The extra

            testing<br>

            and shared code should make it easier to keep up with the

            changes in<br>

            LLVM because there would be more people who are interested

            in that<br>

            being the case.<br>

            <br>

            It would be useful for both ASAN and SAFECode because

            optimizations<br>

            based on the common instrumentation would be useful for both

            of them.<br>

            <br>

            <br>

            Relevant experience:<br>

            I created a tool based on LLVM and KLEE that aimed to

            optimize a<br>

            specific type of C++ programs such that they would crash on

            exactly<br>

            the same inputs as before the optimizations. This made the

            system find<br>

            inputs on which the programs crashed faster than before.

            Most of the<br>

            project was about creating LLVM passes that might make the

            bug finding<br>

            process faster while retaining that property.<br>

            <br>

            One part of the system was adding and later removing memory

            safety<br>

            checks. That was necessary because a significant part of the

            code<br>

            became otherwise unused after aggressively transforming /

            essentially<br>

            removing output calls (printf, the cout stream, etc.) and

            the aim was<br>

            to still detect invalid but unused memory accesses.<br>

            <br>

            I successfully participated in GSoC 2011 by creating an AI

            player for<br>

            an open source RTS game, Unknown Horizons (written in

            Python). I have<br>

            continued to contribute to that project so far.<br>

            _______________________________________________<br>

            LLVM Developers mailing list<br>

            <a moz-do-not-send="true" href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>

                    <a moz-do-not-send="true"

              href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

            <a moz-do-not-send="true"

              href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev"

              target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

          </blockquote>

        </div>

        <br>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

LLVM Developers mailing list

<a class="moz-txt-link-abbreviated" href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a class="moz-txt-link-freetext" href="http://llvm.cs.uiuc.edu">http://llvm.cs.uiuc.edu</a>

<a class="moz-txt-link-freetext" href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>