<br><br><div class="gmail_quote">On Fri, Mar 30, 2012 at 11:49 AM, John Criswell <span dir="ltr"><<a href="mailto:criswell@illinois.edu">criswell@illinois.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

  <div bgcolor="#FFFFFF" text="#000000"><div class="im">

    On 3/30/12 1:08 PM, Raphael Ernani Rodrigues wrote:

    <blockquote type="cite">

      <span>Dear LLVMers,</span><br>

      <br>

      <span>   My name is Raphael Ernani, and I am doing my MsC

        at the Federal</span><br>

      <span>University of Minas Gerais, Brazil. I have been

        using LLVM for a</span><br>

      <span>while, and I would like to participate in this

        year's Summer of Code.</span><br>

      <span>One particular idea, in your "open projects" page

        caught my eye, and I</span><br>

      <span>decided to write a proposal about it. The line that

        I liked in the</span><br>

      <span>page was "Create an LLVM pass that adds memory

        safety checks to code,</span><br>

      <span>like Valgrind does for binaries, or like mudflap

        does for gcc compiled</span><br>

      <span>code.", and my proposal is written below:</span></blockquote>

    <br></div>

    Hi Raphael!<br>

    <br>

    I haven't read your proposal in detail, but there are actually three

    projects that implement memory safety by transforming LLVM bitcode:<br>

    <br>

    1) ASAN is integrated into LLVM.  It is meant as a low-overhead

    debugging tool in the spirit of Valgrind.  It's primary technique is

    to place guard memory in between memory objects and check loads and

    stores to see if they return a poisoned bit-patter from a guard.  It

    isn't sound (certain dangling pointer errors and array bound

    violations can get past it), but it's easy to implement.<br>

    <br>

    2) SAFECode (my project: <a href="http://safecode.cs.illinois.edu" target="_blank">http://safecode.cs.illinois.edu</a>) aims to

    provide sound memory safety checks and is a sub-project of LLVM.  It

    inserts function pointer checks, precise structure and array bounds

    checks, load/store checks, and checks on C library function calls

    into code.  It is also designed to work with automatic pool

    allocation to provide sound points-to analysis and partial

    type-safety, but these features are currently disabled because they

    are not sufficiently robust.  While initially designed to protect

    applications during production, SAFECode has a generic pass to add

    debug information to its run-time checks, essentially make it a

    valgrind replacement like ASAN.<br>

    <br>

    3) SoftBound and its CETS extension

    (<a href="http://www.cis.upenn.edu/acg/softbound" target="_blank">http://www.cis.upenn.edu/acg/softbound</a>) have been integrated into

    the SAFECode compiler and can be enabled with options in SAFECode's

    clang.  It provides array bounds checking and, with CETS, optional

    dangling pointer detection.<br>

    <br>

    While I don't think we need another memory safety checker for LLVM,

    I do think that there's *a lot* of interesting projects in making

    each of these tools better.  Some ideas that come to mind:<br>

    <br>

    o Creating common instrumentation passes for ASAN, SAFECode, and

    SoftBound to permit code reuse.  All three essentially place checks

    on loads, stores, and atomic operations.  If they use the same

    run-time check names, then the same optimizations can improve the

    speed of all three.<br>

    <br>

    o Creating or improving optimizations for memory safety checks. 

    Static array bounds checking alone provides numerous options, such

    as implementing ABCD, using value-range analysis (mentioned in

    another GSoC post), investigating SMT solver-based techniques, etc,

    etc.  Other optimizations include hoisting checks out of loops,

    polishing and re-enabling the SAFECode type-safety optimization, and

    removing redundant load/store checks.<br></div></blockquote><div><br></div><div>"removing redundant load/store checks" could be quite useful for both</div><div>AddressSanitizer (<a href="http://clang.llvm.org/docs/AddressSanitizer.html">http://clang.llvm.org/docs/AddressSanitizer.html</a>) and</div>

<div>ThreadSanitizer (<a href="http://code.google.com/p/data-race-test/wiki/ThreadSanitizer2">http://code.google.com/p/data-race-test/wiki/ThreadSanitizer2</a>). </div><div>Some basic examples here: <a href="http://code.google.com/p/address-sanitizer/wiki/CompileTimeOptimizations">http://code.google.com/p/address-sanitizer/wiki/CompileTimeOptimizations</a></div>

<div>I am looking into having some simple and human verifiable analysis with lots of unit tests, </div><div>rather than something heavyweight and exhaustive.</div><div><br></div><div>--kcc </div><div><br></div><div> </div>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">

    <br>

    o Polishing and enabling the Baggy Bounds Checking (BBC) run-time. 

    This run-time can speed up the run-time checks and should be faster

    than the splay-tree approach that SAFECode currently uses.<br>

    <br>

    I'll go ahead and create a list of open projects on the SAFECode web

    page (<a href="http://sva.cs.illinois.edu" target="_blank">http://sva.cs.illinois.edu</a>), and I'll remove the "Valgrind"

    tool project off the open projects page as there are now several

    memory safety tools built using LLVM.  If you want my opinion, I

    think the static array bounds checker or the monotonic loop

    optimization make nice, self-contained projects.<br>

    <br>

    Finally, you might be interested in the Memory Safety Menagerie

    (<a href="http://sva.cs.illinois.edu/menagerie/" target="_blank">http://sva.cs.illinois.edu/menagerie/</a>).  This web page contains a

    whole list of papers on the subject of memory safety transforms.<br>

    <br>

    -- John T.<br>

    <br>

    <blockquote type="cite"><div><div class="h5">

      <div><span><br>

        </span></div>

      <div><span><br>

        </span></div>

      <div><span>==============================</span><span>==================</span> <br>

        <span>Adding memory safety checks to the LLVM bitcodes</span><br>

        <span>==============================</span><span>==================</span><br>

        <br>

        <br>

        <span>Objective</span><br>

        <span>---------</span><br>

        <br>

        <span>The objective of this project is to create an

          LLVM pass that adds</span><br>

        <span>memory safety checks to code, like Valgrind does

          for binaries, or like</span><br>

        <span>mudflap does for gcc compiled code.</span><br>

        <br>

        <br>

        <span>How it will be done</span><br>

        <span>-------------------</span><br>

        <br>

        <span>We will achieve our goal by creating a LLVM pass

          that does the following:</span><br>

        <span>1 - Propagate the array size to the sites in the

          program where the</span><br>

        <span>array is accessed.</span><br>

        <span>2 - Insert code in the program to make a

          bound-check and throw an</span><br>

        <span>exception if the program tries to access a

          position out of the array</span><br>

        <span>bounds.</span><br>

        <span>3 - Eliminate redundant bound-checks.</span><br>

        <br>

        <br>

        <span>Background</span><br>

        <span>----------</span><br>

        <br>

        <span>Invalid memory access is a major source of

          program failures. If a</span><br>

        <span>program statement dereferences a pointer that

          points to an invalid</span><br>

        <span>memory cell, the program is either aborted by the

          operating system or,</span><br>

        <span>often worse, the program continues to run with an

          undefined behavior.</span><br>

        <span>To avoid the latter, one can perform checks

          before every memory access</span><br>

        <span>at run time. For some programming languages

          (e.g., Java) this is done</span><br>

        <span>automatically by the compiler/run-time

          environment. For the language</span><br>

        <span>C, neither the compiler nor the run-time

          environment enforces</span><br>

        <span>memory-safety policies [1].</span><br>

        <br>

        <span>An example of access violation is given by the

          program below. When</span><br>

        <span>this program runs, probably no visible error

          happens. However, the</span><br>

        <span>program violates the semantics of well defined C

          code: it is accessing</span><br>

        <span>the 11-th cell of an array that was declared with

          only 10 cells. Thus,</span><br>

        <span>the behavior of this program is undefined.</span><br>

        <br>

        <span>#include <stdlib.h></span><br>

        <span>#include <stdio.h></span><br>

        <span>int main(int argc, char** argv){</span><br>

        <span>       int i;</span><br>

        <span>       int* A;</span><br>

        <span>       A = (int*)calloc(10,sizeof(int));</span><br>

        <span>       for(i=0;i<=11;i++)</span><br>

        <span>               A[i] = i;</span><br>

        <span>       printf("%d\n",A[11]);</span><br>

        <span>       return 0;</span><br>

        <span>}</span><br>

        <br>

        <span>This type of behavior can be the source of some

          very hard-to-find</span><br>

        <span>bugs. If no explicit error occurs, other

          variables can have their</span><br>

        <span>values changed without any reasonable

          explanation. Finding and fixing</span><br>

        <span>this type of bug can be really hard, especially

          if the program runs</span><br>

        <span>with more than one active thread. There exists

          tools that have been</span><br>

        <span>built specially to track access violations like

          the one in the program</span><br>

        <span>above. Valgrind is probably the best known

          example. These tools are</span><br>

        <span>dynamic: they are used during the execution of

          the program, either</span><br>

        <span>emulating it, or instrumenting it to catch errors

          at runtime. Tools</span><br>

        <span>like Valgrind and mudflap are very useful;

          however, LLVM does not have</span><br>

        <span>any similar function yet. Our goal is to help

          filling this gap.</span><br>

        <br>

        <br>

        <span>How we plan to improve the code</span><br>

        <span>------------------------------</span><span>-</span><br>

        <br>

        <span>Before every read or write in a position of an

          array, our pass will</span><br>

        <span>add instructions to check if the index of the

          array is outside the</span><br>

        <span>bounds.</span><br>

        <span>The secured program, without optimizations, is

          shown below:</span><br>

        <br>

        <span>#include <stdlib.h></span><br>

        <span>#include <stdio.h></span><br>

        <span>int main(int argc, char** argv){</span><br>

        <span>       int i, tmp0;</span><br>

        <span>       int* A;</span><br>

        <span>       A = (int*)calloc(10,sizeof(int));</span><br>

        <span>       for(i=0;i<=11;i++){</span><br>

        <span>               if (i<0 or i>10)

          raise("Array out of bounds.");</span><br>

        <span>               A[i] = i;</span><br>

        <span>       }</span><br>

        <span>       tmp0 = 11;</span><br>

        <span>       if (tmp0<0 or tmp0>10) raise("Array

          out of bounds.");</span><br>

        <span>       printf("%d\n",A[11]);</span><br>

        <span>       return 0;</span><br>

        <span>}</span><br>

        <br>

        <span>In order to accomplish this object, the new LLVM

          pass will have to</span><br>

        <span>perform a number of actions:</span><br>

        <span>1) Collect the allocation sizes of each array.

          This can be done at</span><br>

        <span>runtime, simply reading the arguments passed to

          the allocation</span><br>

        <span>function.</span><br>

        <span>2) Store the size of the array together with the

          allocated area. This</span><br>

        <span>can also be done dynamically, increasing in 32

          bits the size of the</span><br>

        <span>allocated area that is given to the array base

          address.</span><br>

        <span>3) Prefix any array access code with a bounds

          check to ensure that the</span><br>

        <span>used index is inside the correct limits.</span><br>

        <br>

        <span>After this, to improve the performance of the

          "safe programs", I can</span><br>

        <span>remove redundant checks, by not adding new checks

          where already exist</span><br>

        <span>a safety check. I can use two different abstract

          domains to accomplish</span><br>

        <span>this goal, a simpler, and another more complex

          one:</span><br>

        <br>

        <span>1) The domain of "less-than" names. This is the

          domain used in the</span><br>

        <span>ABCD array bounds check algorithm [2]. For each

          variable v is gives</span><br>

        <span>the set of variables that are provably less than

          v. It is relatively</span><br>

        <span>simple - in terms of implementation - to compute

          this domain; however,</span><br>

        <span>this computation is costly: it is cubic on the

          number of variables in</span><br>

        <span>the program. Nevertheless, I can implement the

          domain, to see if it</span><br>

        <span>pays off its costs. There was an implementation

          of ABCD in LLVM 2.7,</span><br>

        <span>and I can reuse a substantial part of that code.</span><br>

        <br>

        <span>2) The domain of intervals. For each variable v,

          this domain gives an</span><br>

        <span>interval [l, u] with the minimum and maximum

          values that v can assume</span><br>

        <span>during the program execution. I can use the

          implementation of range</span><br>

        <span>analysis that we have developed in the Federal

          University of Minas</span><br>

        <span>Gerais to compute this domain. The implementation

          is quite mature now,</span><br>

        <span>and I was one of the developers that worked on

          it.</span><br>

        <br>

        <span>Linked programs</span><br>

        <span>----------------</span><br>

        <br>

        <span>In order to be effective, the instrumentation

          pass should be a whole</span><br>

        <span>program analysis. However, even in this case,

          only the code that has</span><br>

        <span>been compiled with the pass will be secure. It is

          possible that</span><br>

        <span>library calls still lead to undefined behavior,

          like in the program</span><br>

        <span>below:</span><br>

        <br>

        <span>// We do not have access to this code, because it

          is an external library:</span><br>

        <span>void foo(int* A, int i){</span><br>

        <span>       A[i] = i;</span><br>

        <span>}</span><br>

        <br>

        <span>// We have secured this code with our

          instrumentation:</span><br>

        <span>#include <stdlib.h></span><br>

        <span>#include <stdio.h></span><br>

        <span>#include "A.h"</span><br>

        <span>int main(int argc, char** argv){</span><br>

        <span>       int i;</span><br>

        <span>       int* A;</span><br>

        <span>       A = (int*)calloc(10,sizeof(int));</span><br>

        <span>       for(i=0;i<=11;i++){</span><br>

        <span>               foo(A,i);</span><br>

        <span>       }</span><br>

        <span>       printf("%d\n",A[11]);</span><br>

        <span>       return 0;</span><br>

        <span>}</span><br>

        <br>

        <span>Timeline</span><br>

        <span>--------</span><br>

        <br>

        <span>This is a three months project, and we would like

          to schedule the time</span><br>

        <span>in the following way (schedule given in weeks):</span><br>

        <br>

        <span>[1-4]: Change the allocation scheme used by LLVM

          to store the array</span><br>

        <span>size next to the area that is allocated to the

          array.</span><br>

        <br>

        <span>[5-6] Write the pass to insert access checks

          before array accesses.</span><br>

        <br>

        <span>[7-10] Improve the performance of safe programs.</span><br>

        <span>- Avoid inserting bound-checks where already

          exists a memory check.</span><br>

        <span>- Avoid inserting bound-checks where the

          "less-than" domain indicates</span><br>

        <span>that the memory access is safe.</span><br>

        <br>

        <span>[11-12] Final tests and report</span><br>

        <span>- Correctly compile the whole LLVM test suite.</span><br>

        <span>- Generate performance numbers.</span><br>

        <span>- Write the final report.</span><br>

        <br>

        <br>

        <span>My background</span><br>

        <span>-------------</span><br>

        <br>

        <span>I am currently a first year master student at the

          Federal University</span><br>

        <span>of Minas Gerais, under orientation of professor

          Fernando Pereira. I</span><br>

        <span>have graduated in this school on the year of

          2011. I've worked with</span><br>

        <span>software development for five years, in different

          projects. I built a</span><br>

        <span>Data Warehouse viewer and Business Intelligence

          graphics and gauges</span><br>

        <span>using Delphi; Commercial programs in MS Visual

          FoxPro; Improvements in</span><br>

        <span>a logistics planning system, in Excel VBA. I have

          five years of</span><br>

        <span>experience with SQL. Now, I'm working with LLVM,

          writing passes to</span><br>

        <span>support scientifical researchers.</span><br>

        <br>

        <br>

        <span>Why I can do well in this project</span><br>

        <span>------------------------------</span><span>---</span><br>

        <br>

        <span>I believe that I am a good candidate for this

          Summer of Code for four</span><br>

        <span>reasons. Firstly, I am a master student of

          computer science in the</span><br>

        <span>Federal</span><br>

        <span>University of Minas Gerais (UFMG), which is one

          of the best schools of</span><br>

        <span>computer science in Brazil. To support this

          claim, notice that UFMG</span><br>

        <span>has the maximum grade in the ranking of CAPES

          (Coordenação de</span><br>

        <span>Aperfeiçoamento de Pessoal de Nivel Superior -

          the Evaluation Body</span><br>

        <span>under the Ministry of Education). I am working in

          the compiler's lab,</span><br>

        <span>under the advice of Fernando Pereira, who was a

          Summer of Coder</span><br>

        <span>himself in 2006. Second, I have good experience

          with programming, with</span><br>

        <span>almost six years of experience in many languages

          (Java, C, C++, C#,</span><br>

        <span>VB, VBA, Object Pascal, SQL, etc...). Third, I'm

          already working with</span><br>

        <span>LLVM passes. I already know some LLVM tools and I

          already have the</span><br>

        <span>basic knowledge to start this project. I have

          written a pass to</span><br>

        <span>instrument LLVM bitcodes, in order to find the

          minimum and maximum</span><br>

        <span>values that each variable assumes at runtime.

          This code has been used</span><br>

        <span>to check the precision of our range analysis. It

          is publicly available</span><br>

        <span>at (</span><a href="http://code.google.com/p/range-analysis/source/browse/#svn%2Ftrunk%2Fsrc%2FRAInstrumentation" target="_blank">http://code.google.com/p/range-analysis/source/browse/#svn%2Ftrunk%2Fsrc%2FRAInstrumentation</a><span>). </span></div>

      <div><span>Finally, in the lab where I work there are six

          other people</span></div>

      <div><span>who work everyday with LLVM; thus, in addition

          to getting help in the</span><br>

        <span>forum, I can easily talk about my project to my

          colleagues.</span><br>

        <br>

        <span>References</span><br>

        <span>----------</span><br>

        <br>

        <span>[1] Dirk Beyer, Thomas A. Henzinger, Ranjit Jhala

          and Rupak Majumdar.</span><br>

        <span>Checking Memory Safety with Blast. FASE 2005,

          LNCS 3442, pages 2-18,</span><br>

        <span>Springer-Verlag, 2005</span><br>

        <br>

        <span>[2] ABCD: eliminating array bounds checks on

          demand. Rastislav Bodik</span><br>

        <span>and Rajiv Gupta and Vivek Sarkar. PLDI 2000, ACM,

          pages 321-333</span>

      </div>

      <br>

      <fieldset></fieldset>

      <br>

      </div></div><pre>_______________________________________________

LLVM Developers mailing list

<a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a>

</pre>

    </blockquote>

    <br>

  </div>

<br>_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

<br></blockquote></div><br>