<br><br><div class="gmail_quote">On Fri, Mar 30, 2012 at 11:49 AM, John Criswell <span dir="ltr"><<a href="mailto:criswell@illinois.edu">criswell@illinois.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><div class="im">
On 3/30/12 1:08 PM, Raphael Ernani Rodrigues wrote:
<blockquote type="cite">
<span>Dear LLVMers,</span><br>
<br>
<span> My name is Raphael Ernani, and I am doing my MsC
at the Federal</span><br>
<span>University of Minas Gerais, Brazil. I have been
using LLVM for a</span><br>
<span>while, and I would like to participate in this
year's Summer of Code.</span><br>
<span>One particular idea, in your "open projects" page
caught my eye, and I</span><br>
<span>decided to write a proposal about it. The line that
I liked in the</span><br>
<span>page was "Create an LLVM pass that adds memory
safety checks to code,</span><br>
<span>like Valgrind does for binaries, or like mudflap
does for gcc compiled</span><br>
<span>code.", and my proposal is written below:</span></blockquote>
<br></div>
Hi Raphael!<br>
<br>
I haven't read your proposal in detail, but there are actually three
projects that implement memory safety by transforming LLVM bitcode:<br>
<br>
1) ASAN is integrated into LLVM. It is meant as a low-overhead
debugging tool in the spirit of Valgrind. It's primary technique is
to place guard memory in between memory objects and check loads and
stores to see if they return a poisoned bit-patter from a guard. It
isn't sound (certain dangling pointer errors and array bound
violations can get past it), but it's easy to implement.<br>
<br>
2) SAFECode (my project: <a href="http://safecode.cs.illinois.edu" target="_blank">http://safecode.cs.illinois.edu</a>) aims to
provide sound memory safety checks and is a sub-project of LLVM. It
inserts function pointer checks, precise structure and array bounds
checks, load/store checks, and checks on C library function calls
into code. It is also designed to work with automatic pool
allocation to provide sound points-to analysis and partial
type-safety, but these features are currently disabled because they
are not sufficiently robust. While initially designed to protect
applications during production, SAFECode has a generic pass to add
debug information to its run-time checks, essentially make it a
valgrind replacement like ASAN.<br>
<br>
3) SoftBound and its CETS extension
(<a href="http://www.cis.upenn.edu/acg/softbound" target="_blank">http://www.cis.upenn.edu/acg/softbound</a>) have been integrated into
the SAFECode compiler and can be enabled with options in SAFECode's
clang. It provides array bounds checking and, with CETS, optional
dangling pointer detection.<br>
<br>
While I don't think we need another memory safety checker for LLVM,
I do think that there's *a lot* of interesting projects in making
each of these tools better. Some ideas that come to mind:<br>
<br>
o Creating common instrumentation passes for ASAN, SAFECode, and
SoftBound to permit code reuse. All three essentially place checks
on loads, stores, and atomic operations. If they use the same
run-time check names, then the same optimizations can improve the
speed of all three.<br>
<br>
o Creating or improving optimizations for memory safety checks.
Static array bounds checking alone provides numerous options, such
as implementing ABCD, using value-range analysis (mentioned in
another GSoC post), investigating SMT solver-based techniques, etc,
etc. Other optimizations include hoisting checks out of loops,
polishing and re-enabling the SAFECode type-safety optimization, and
removing redundant load/store checks.<br></div></blockquote><div><br></div><div>"removing redundant load/store checks" could be quite useful for both</div><div>AddressSanitizer (<a href="http://clang.llvm.org/docs/AddressSanitizer.html">http://clang.llvm.org/docs/AddressSanitizer.html</a>) and</div>
<div>ThreadSanitizer (<a href="http://code.google.com/p/data-race-test/wiki/ThreadSanitizer2">http://code.google.com/p/data-race-test/wiki/ThreadSanitizer2</a>). </div><div>Some basic examples here: <a href="http://code.google.com/p/address-sanitizer/wiki/CompileTimeOptimizations">http://code.google.com/p/address-sanitizer/wiki/CompileTimeOptimizations</a></div>
<div>I am looking into having some simple and human verifiable analysis with lots of unit tests, </div><div>rather than something heavyweight and exhaustive.</div><div><br></div><div>--kcc </div><div><br></div><div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">
<br>
o Polishing and enabling the Baggy Bounds Checking (BBC) run-time.
This run-time can speed up the run-time checks and should be faster
than the splay-tree approach that SAFECode currently uses.<br>
<br>
I'll go ahead and create a list of open projects on the SAFECode web
page (<a href="http://sva.cs.illinois.edu" target="_blank">http://sva.cs.illinois.edu</a>), and I'll remove the "Valgrind"
tool project off the open projects page as there are now several
memory safety tools built using LLVM. If you want my opinion, I
think the static array bounds checker or the monotonic loop
optimization make nice, self-contained projects.<br>
<br>
Finally, you might be interested in the Memory Safety Menagerie
(<a href="http://sva.cs.illinois.edu/menagerie/" target="_blank">http://sva.cs.illinois.edu/menagerie/</a>). This web page contains a
whole list of papers on the subject of memory safety transforms.<br>
<br>
-- John T.<br>
<br>
<blockquote type="cite"><div><div class="h5">
<div><span><br>
</span></div>
<div><span><br>
</span></div>
<div><span>==============================</span><span>==================</span> <br>
<span>Adding memory safety checks to the LLVM bitcodes</span><br>
<span>==============================</span><span>==================</span><br>
<br>
<br>
<span>Objective</span><br>
<span>---------</span><br>
<br>
<span>The objective of this project is to create an
LLVM pass that adds</span><br>
<span>memory safety checks to code, like Valgrind does
for binaries, or like</span><br>
<span>mudflap does for gcc compiled code.</span><br>
<br>
<br>
<span>How it will be done</span><br>
<span>-------------------</span><br>
<br>
<span>We will achieve our goal by creating a LLVM pass
that does the following:</span><br>
<span>1 - Propagate the array size to the sites in the
program where the</span><br>
<span>array is accessed.</span><br>
<span>2 - Insert code in the program to make a
bound-check and throw an</span><br>
<span>exception if the program tries to access a
position out of the array</span><br>
<span>bounds.</span><br>
<span>3 - Eliminate redundant bound-checks.</span><br>
<br>
<br>
<span>Background</span><br>
<span>----------</span><br>
<br>
<span>Invalid memory access is a major source of
program failures. If a</span><br>
<span>program statement dereferences a pointer that
points to an invalid</span><br>
<span>memory cell, the program is either aborted by the
operating system or,</span><br>
<span>often worse, the program continues to run with an
undefined behavior.</span><br>
<span>To avoid the latter, one can perform checks
before every memory access</span><br>
<span>at run time. For some programming languages
(e.g., Java) this is done</span><br>
<span>automatically by the compiler/run-time
environment. For the language</span><br>
<span>C, neither the compiler nor the run-time
environment enforces</span><br>
<span>memory-safety policies [1].</span><br>
<br>
<span>An example of access violation is given by the
program below. When</span><br>
<span>this program runs, probably no visible error
happens. However, the</span><br>
<span>program violates the semantics of well defined C
code: it is accessing</span><br>
<span>the 11-th cell of an array that was declared with
only 10 cells. Thus,</span><br>
<span>the behavior of this program is undefined.</span><br>
<br>
<span>#include <stdlib.h></span><br>
<span>#include <stdio.h></span><br>
<span>int main(int argc, char** argv){</span><br>
<span> int i;</span><br>
<span> int* A;</span><br>
<span> A = (int*)calloc(10,sizeof(int));</span><br>
<span> for(i=0;i<=11;i++)</span><br>
<span> A[i] = i;</span><br>
<span> printf("%d\n",A[11]);</span><br>
<span> return 0;</span><br>
<span>}</span><br>
<br>
<span>This type of behavior can be the source of some
very hard-to-find</span><br>
<span>bugs. If no explicit error occurs, other
variables can have their</span><br>
<span>values changed without any reasonable
explanation. Finding and fixing</span><br>
<span>this type of bug can be really hard, especially
if the program runs</span><br>
<span>with more than one active thread. There exists
tools that have been</span><br>
<span>built specially to track access violations like
the one in the program</span><br>
<span>above. Valgrind is probably the best known
example. These tools are</span><br>
<span>dynamic: they are used during the execution of
the program, either</span><br>
<span>emulating it, or instrumenting it to catch errors
at runtime. Tools</span><br>
<span>like Valgrind and mudflap are very useful;
however, LLVM does not have</span><br>
<span>any similar function yet. Our goal is to help
filling this gap.</span><br>
<br>
<br>
<span>How we plan to improve the code</span><br>
<span>------------------------------</span><span>-</span><br>
<br>
<span>Before every read or write in a position of an
array, our pass will</span><br>
<span>add instructions to check if the index of the
array is outside the</span><br>
<span>bounds.</span><br>
<span>The secured program, without optimizations, is
shown below:</span><br>
<br>
<span>#include <stdlib.h></span><br>
<span>#include <stdio.h></span><br>
<span>int main(int argc, char** argv){</span><br>
<span> int i, tmp0;</span><br>
<span> int* A;</span><br>
<span> A = (int*)calloc(10,sizeof(int));</span><br>
<span> for(i=0;i<=11;i++){</span><br>
<span> if (i<0 or i>10)
raise("Array out of bounds.");</span><br>
<span> A[i] = i;</span><br>
<span> }</span><br>
<span> tmp0 = 11;</span><br>
<span> if (tmp0<0 or tmp0>10) raise("Array
out of bounds.");</span><br>
<span> printf("%d\n",A[11]);</span><br>
<span> return 0;</span><br>
<span>}</span><br>
<br>
<span>In order to accomplish this object, the new LLVM
pass will have to</span><br>
<span>perform a number of actions:</span><br>
<span>1) Collect the allocation sizes of each array.
This can be done at</span><br>
<span>runtime, simply reading the arguments passed to
the allocation</span><br>
<span>function.</span><br>
<span>2) Store the size of the array together with the
allocated area. This</span><br>
<span>can also be done dynamically, increasing in 32
bits the size of the</span><br>
<span>allocated area that is given to the array base
address.</span><br>
<span>3) Prefix any array access code with a bounds
check to ensure that the</span><br>
<span>used index is inside the correct limits.</span><br>
<br>
<span>After this, to improve the performance of the
"safe programs", I can</span><br>
<span>remove redundant checks, by not adding new checks
where already exist</span><br>
<span>a safety check. I can use two different abstract
domains to accomplish</span><br>
<span>this goal, a simpler, and another more complex
one:</span><br>
<br>
<span>1) The domain of "less-than" names. This is the
domain used in the</span><br>
<span>ABCD array bounds check algorithm [2]. For each
variable v is gives</span><br>
<span>the set of variables that are provably less than
v. It is relatively</span><br>
<span>simple - in terms of implementation - to compute
this domain; however,</span><br>
<span>this computation is costly: it is cubic on the
number of variables in</span><br>
<span>the program. Nevertheless, I can implement the
domain, to see if it</span><br>
<span>pays off its costs. There was an implementation
of ABCD in LLVM 2.7,</span><br>
<span>and I can reuse a substantial part of that code.</span><br>
<br>
<span>2) The domain of intervals. For each variable v,
this domain gives an</span><br>
<span>interval [l, u] with the minimum and maximum
values that v can assume</span><br>
<span>during the program execution. I can use the
implementation of range</span><br>
<span>analysis that we have developed in the Federal
University of Minas</span><br>
<span>Gerais to compute this domain. The implementation
is quite mature now,</span><br>
<span>and I was one of the developers that worked on
it.</span><br>
<br>
<span>Linked programs</span><br>
<span>----------------</span><br>
<br>
<span>In order to be effective, the instrumentation
pass should be a whole</span><br>
<span>program analysis. However, even in this case,
only the code that has</span><br>
<span>been compiled with the pass will be secure. It is
possible that</span><br>
<span>library calls still lead to undefined behavior,
like in the program</span><br>
<span>below:</span><br>
<br>
<span>// We do not have access to this code, because it
is an external library:</span><br>
<span>void foo(int* A, int i){</span><br>
<span> A[i] = i;</span><br>
<span>}</span><br>
<br>
<span>// We have secured this code with our
instrumentation:</span><br>
<span>#include <stdlib.h></span><br>
<span>#include <stdio.h></span><br>
<span>#include "A.h"</span><br>
<span>int main(int argc, char** argv){</span><br>
<span> int i;</span><br>
<span> int* A;</span><br>
<span> A = (int*)calloc(10,sizeof(int));</span><br>
<span> for(i=0;i<=11;i++){</span><br>
<span> foo(A,i);</span><br>
<span> }</span><br>
<span> printf("%d\n",A[11]);</span><br>
<span> return 0;</span><br>
<span>}</span><br>
<br>
<span>Timeline</span><br>
<span>--------</span><br>
<br>
<span>This is a three months project, and we would like
to schedule the time</span><br>
<span>in the following way (schedule given in weeks):</span><br>
<br>
<span>[1-4]: Change the allocation scheme used by LLVM
to store the array</span><br>
<span>size next to the area that is allocated to the
array.</span><br>
<br>
<span>[5-6] Write the pass to insert access checks
before array accesses.</span><br>
<br>
<span>[7-10] Improve the performance of safe programs.</span><br>
<span>- Avoid inserting bound-checks where already
exists a memory check.</span><br>
<span>- Avoid inserting bound-checks where the
"less-than" domain indicates</span><br>
<span>that the memory access is safe.</span><br>
<br>
<span>[11-12] Final tests and report</span><br>
<span>- Correctly compile the whole LLVM test suite.</span><br>
<span>- Generate performance numbers.</span><br>
<span>- Write the final report.</span><br>
<br>
<br>
<span>My background</span><br>
<span>-------------</span><br>
<br>
<span>I am currently a first year master student at the
Federal University</span><br>
<span>of Minas Gerais, under orientation of professor
Fernando Pereira. I</span><br>
<span>have graduated in this school on the year of
2011. I've worked with</span><br>
<span>software development for five years, in different
projects. I built a</span><br>
<span>Data Warehouse viewer and Business Intelligence
graphics and gauges</span><br>
<span>using Delphi; Commercial programs in MS Visual
FoxPro; Improvements in</span><br>
<span>a logistics planning system, in Excel VBA. I have
five years of</span><br>
<span>experience with SQL. Now, I'm working with LLVM,
writing passes to</span><br>
<span>support scientifical researchers.</span><br>
<br>
<br>
<span>Why I can do well in this project</span><br>
<span>------------------------------</span><span>---</span><br>
<br>
<span>I believe that I am a good candidate for this
Summer of Code for four</span><br>
<span>reasons. Firstly, I am a master student of
computer science in the</span><br>
<span>Federal</span><br>
<span>University of Minas Gerais (UFMG), which is one
of the best schools of</span><br>
<span>computer science in Brazil. To support this
claim, notice that UFMG</span><br>
<span>has the maximum grade in the ranking of CAPES
(Coordenação de</span><br>
<span>Aperfeiçoamento de Pessoal de Nivel Superior -
the Evaluation Body</span><br>
<span>under the Ministry of Education). I am working in
the compiler's lab,</span><br>
<span>under the advice of Fernando Pereira, who was a
Summer of Coder</span><br>
<span>himself in 2006. Second, I have good experience
with programming, with</span><br>
<span>almost six years of experience in many languages
(Java, C, C++, C#,</span><br>
<span>VB, VBA, Object Pascal, SQL, etc...). Third, I'm
already working with</span><br>
<span>LLVM passes. I already know some LLVM tools and I
already have the</span><br>
<span>basic knowledge to start this project. I have
written a pass to</span><br>
<span>instrument LLVM bitcodes, in order to find the
minimum and maximum</span><br>
<span>values that each variable assumes at runtime.
This code has been used</span><br>
<span>to check the precision of our range analysis. It
is publicly available</span><br>
<span>at (</span><a href="http://code.google.com/p/range-analysis/source/browse/#svn%2Ftrunk%2Fsrc%2FRAInstrumentation" target="_blank">http://code.google.com/p/range-analysis/source/browse/#svn%2Ftrunk%2Fsrc%2FRAInstrumentation</a><span>). </span></div>
<div><span>Finally, in the lab where I work there are six
other people</span></div>
<div><span>who work everyday with LLVM; thus, in addition
to getting help in the</span><br>
<span>forum, I can easily talk about my project to my
colleagues.</span><br>
<br>
<span>References</span><br>
<span>----------</span><br>
<br>
<span>[1] Dirk Beyer, Thomas A. Henzinger, Ranjit Jhala
and Rupak Majumdar.</span><br>
<span>Checking Memory Safety with Blast. FASE 2005,
LNCS 3442, pages 2-18,</span><br>
<span>Springer-Verlag, 2005</span><br>
<br>
<span>[2] ABCD: eliminating array bounds checks on
demand. Rastislav Bodik</span><br>
<span>and Rajiv Gupta and Vivek Sarkar. PLDI 2000, ACM,
pages 321-333</span>
</div>
<br>
<fieldset></fieldset>
<br>
</div></div><pre>_______________________________________________
LLVM Developers mailing list
<a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a>
</pre>
</blockquote>
<br>
</div>
<br>_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
<br></blockquote></div><br>