I'd like some similar work to be done, although I view it a bit differently. <div>This might be a separate analysis pass that knows nothing about ASAN or SAFECode</div><div>and appends metadata nodes to memory access instructions saying things like</div>

<div>   - this access can not go out of buffer bounds</div><div>   - this access can not touch free-ed memory</div><div>   - this access can not participate in a race </div><div>   - this read always reads initialized memory </div>

<div>Then the actual instrumentation passes will simply consult the metadata. </div><div><br></div><div>Equally important would be an exhaustive test suite. </div><div>Not sure if it should be in LLVM IR or in C (if in C, other compilers will benefit too).</div>

<div><br><div class="gmail_quote">On Thu, Apr 5, 2012 at 6:49 PM, Ott Tinn <span dir="ltr"><<a href="mailto:llvm@otinn.com">llvm@otinn.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

This is a proposal to create memory safety instrumentation and<br>

optimization passes for LLVM.<br>

<br>

Abstract:<br>

The goal of this project is to modify SAFECode and AddressSanitizer<br>

(ASAN) to use a common set of memory safety instrumentation and<br>

optimization passes to increase code reuse. These tools and other<br>

similar ones use varying methods to detect whether memory accesses are<br>

safe, but are fundamentally trying to do the same thing: check whether<br>

each memory access is safe. It is desirable to optimize away redundant<br>

runtime checks to improve such tools' runtime performance. This means<br>

that there is a need for shared memory safety instrumentation and<br>

optimization passes.<br>

<br>

<br>

Proposal:<br>

The general idea is to make SAFECode and ASAN use the following design:<br>

1. Add checks to memory accesses (loads, stores, and some intrinsics).<br>

2. Run the memory safety check optimization passes.<br>

3. Transform the remaining checks to tool-specific runtime calls.<br>

4. Do whatever the specific tool did before.<br>

<br>

This design would make it possible for SAFECode, ASAN, and other<br>

similar tools to share the memory safety instrumentation and<br>

optimization passes. The main benefit of the code reuse is that the<br>

memory-safety-specific optimizations could be used by all such tools.<br>

<br>

The project proposes to modify SAFECode and ASAN as a proof of<br>

concept. It might also be useful to modify SoftBound, ThreadSanitizer,<br>

or some other tool but I have not analysed how difficult/useful that<br>

would be. That is why they are excluded from the current proposal.<br>

<br>

Implementation plan:<br>

1. Create the common instrumentation pass.<br>

2. Add a pass to convert the common checks to ASAN-specific ones.<br>

3. Add a pass to convert the common checks to SAFECode-specific ones.<br>

4. Convert some of the simpler optimizations from SAFECode to run on<br>

the common checks.<br>

5. Add more optimizations (from SAFECode or otherwise).<br>

<br>

The plan is to make sure that it is possible to commit early and often<br>

without breaking anything (unless absolutely needed). The conversion<br>

passes are needed to make the tool work but a side-effect is that the<br>

existing tool-specific optimizations should continue working without<br>

changes.<br>

<br>

The "simpler" optimizations are defined to be the ones that are easy<br>

for humans to verify and do not have large extra dependencies like<br>

Poolalloc or SMT solvers.<br>

<br>

Optimizations that will definitely be implemented such that they work<br>

on the common memory safety checks (milestone 3 or 4):<br>

* Remove obviously redundant checks in the same basic block.<br>

* Remove unnecessary constant checks to global variables / allocas.<br>

* Combine struct member checks in the same basic block.<br></blockquote><div><br></div><div>Beware, that some of such cases will be covered by GVN (load widening, etc). Although some will not. </div><div>E.g. </div><div>

struct S {</div><div>  int alignment;</div><div>  short a, b;</div><div>};</div><div> </div><div>S *x;</div><div>...</div><div>x->a = ... </div><div>... = x->b</div><div><br></div><div>These two accesses can be combined for ASAN, but not for TSAN. </div>

<div><br></div><div>  </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

* Hoisting constant checks from loops.<br></blockquote><div><br></div><div>In most cases, this should be handled by general LLVM loop invariant code motion. </div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


* Something more.<br>

<br>

An additional plan that outlines the optimizations to be added in the<br>

later part of the program will be produced and agreed upon before the<br>

mid-term evaluations. The general idea is to add slightly more<br>

complicated optimizations that are useful in practice rather than<br>

large and complicated optimizations that are difficult to verify by<br>

humans.<br>

<br>

Timeline:<br>

Milestone 1 (June 1): The common instrumentation pass works and there<br>

are tests to verify it.<br>

Milestone 2 (June 22): The tool-specific conversion passes work and<br>

there are tests to verify it.<br>

Milestone 3 (July 6): Some simple optimization passes from SAFECode<br>

work on the common checks; there are unit tests to verify that.<br>

Finished (and agreed upon) a specific plan that outlines which<br>

optimizations will be converted / created for milestone 4.<br>

Mid-term evaluations deadline (July 13)<br>

Milestone 4 (August 13): Added more optimizations (and relevant unit<br>

tests) as specified in the additional plan.<br>

Firm 'pencils down' date (August 20): More testing and documentation.<br>

<br>

Basically the idea is to produce something practically useful and<br>

thoroughly tested that will definitely be done in time.<br>

<br>

<br>

Contact information:<br>

Included in the official submission.<br>

<br>

<br>

Interesting to me:<br>

I am generally interested in developing bug finding/detecting systems<br>

but this project would also have been useful to me for a project I<br>

completed previously (see the experience section). I have previously<br>

used SAFECode for automatically checking whether a program has a<br>

buffer overflow on a specific run.  I was interested in reusing the<br>

static memory safety optimization parts of SAFECode but it seemed to<br>

be too tightly integrated to be easily reused for my purposes.<br>

<br>

<br>

Useful for LLVM:<br>

This project would be useful for LLVM in general because it would make<br>

it easier to develop memory safety tools based on LLVM because of the<br>

available relevant transforms. Reducing the amount of code each<br>

subproject has to add should make it more likely that the subprojects<br>

stay compatible with the latest LLVM changes.<br>

<br>

It would be useful for ASAN mostly because the optimizations should<br>

reduce the runtime overhead.<br>

<br>

It would be useful for SAFECode because the code should become a bit<br>

more modular and there should be more code reuse. The extra testing<br>

and shared code should make it easier to keep up with the changes in<br>

LLVM because there would be more people who are interested in that<br>

being the case.<br>

<br>

It would be useful for both ASAN and SAFECode because optimizations<br>

based on the common instrumentation would be useful for both of them.<br>

<br>

<br>

Relevant experience:<br>

I created a tool based on LLVM and KLEE that aimed to optimize a<br>

specific type of C++ programs such that they would crash on exactly<br>

the same inputs as before the optimizations. This made the system find<br>

inputs on which the programs crashed faster than before. Most of the<br>

project was about creating LLVM passes that might make the bug finding<br>

process faster while retaining that property.<br>

<br>

One part of the system was adding and later removing memory safety<br>

checks. That was necessary because a significant part of the code<br>

became otherwise unused after aggressively transforming / essentially<br>

removing output calls (printf, the cout stream, etc.) and the aim was<br>

to still detect invalid but unused memory accesses.<br>

<br>

I successfully participated in GSoC 2011 by creating an AI player for<br>

an open source RTS game, Unknown Horizons (written in Python). I have<br>

continued to contribute to that project so far.<br>

_______________________________________________<br>

LLVM Developers mailing list<br>

<a href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>

</blockquote></div><br></div>