[LLVMdev] GSoC proposal: Common memory safety instrumentation and optimization passes for LLVM

Fri Apr 6 08:29:18 PDT 2012

On Fri, Apr 6, 2012 at 7:50 AM, John Criswell <criswell at illinois.edu> wrote:

>  On 4/6/12 12:50 AM, Kostya Serebryany wrote:
>
> I'd like some similar work to be done, although I view it a bit
> differently.
> This might be a separate analysis pass that knows nothing about ASAN or
> SAFECode
> and appends metadata nodes to memory access instructions saying things like
>
>
> This is a good idea but is the wrong way to implement the idea.  LLVM
> passes are not required to preserve metadata, and even if they were
> required to do so, there would always be a pass with a bug that would fail
> to preserve the metadata properly.  It's an approach that can lead to
> undesired headaches.  Furthermore, you're not guaranteed that an
> instruction that was deemed safe earlier is safe after transformation;
> there are optimizations that LLVM can do on C code exhibiting undefined
> behavior that can change it from memory safe to memory-unsafe code.
>

Oh, surely the analysis pass should be called directly from the
instrumentation pass, so that no other pass can interfere.
But a separate pass that marks insns with metadata might be easier to test.

>
> The correct way to do this is by writing generic LLVM analysis passes that
> compute this information and can be queried by SAFECode/ASAN-specific
> instrumentation and optimization passes.  In this fashion, the LLVM pass
> manager will re-run the analyses if an earlier transform comes along and
> modifies the IR.  It is also far more robust since analysis information
> cannot be discarded by LLVM passes written by other people.
>
>
>     - this access can not go out of buffer bounds
>    - this access can not touch free-ed memory
>    - this access can not participate in a race
>    - this read always reads initialized memory
> Then the actual instrumentation passes will simply consult the metadata.
>
>
> One of the design principles I've been trying to follow in refactoring
> SAFECode is that we have dumb instrumentation passes that just instrument
> everything followed by optimization passes that remove run-time checks that
> are unnecessary.  This follows the compiler-building principle called
> Separation of Concerns, and it's useful in tools like SAFECode because it
> allows us to easily turn optimizations on/off by running/not running
> individual passes.  This makes performance analysis easier (we can see the
> effect of an optimization by not running a pass), and it makes it possible
> for bugpoint to figure out which optimization is causing a program to break.
>
> SAFECode used to have a single instrumentation pass that inserted both
> load/store and GEP checks with various options to enable/disable
> optimizations.  It made the code complex and difficult to read.  The new
> passes are reusable and so blindingly simple that a child can understand
> what they do.  I highly recommend that ASAN not make the mistake that
> SAFECode originally made.
>
> Finally, the common infrastructure idea I was talking about on the
> SAFECode open projects page is to have a common set of run-time check
> function names
>
and set of instrumentation passes to add them and optimize them.  In this
> way, SAFECode/SoftBound/ASAN can share not only the same analysis passes
> (e.g., an always-safe load/store analysis) but the actual optimization and
> instrumentation passes, too.  SAFECode/ASAN specific transforms can be run
> after the generic instrumentation passes to specialize the checks for the
> specific tool (e.g., SAFECode would have a pass that adds pool handles to
> the run-time checks).
>
>
>
>  Equally important would be an exhaustive test suite.
> Not sure if it should be in LLVM IR or in C (if in C, other compilers will
> benefit too).
>
>
> Wilander has a new suite of tests out that might be useful.
>
> -- John T.
>
>
>
> On Thu, Apr 5, 2012 at 6:49 PM, Ott Tinn <llvm at otinn.com> wrote:
>
>> This is a proposal to create memory safety instrumentation and
>> optimization passes for LLVM.
>>
>> Abstract:
>> The goal of this project is to modify SAFECode and AddressSanitizer
>> (ASAN) to use a common set of memory safety instrumentation and
>> optimization passes to increase code reuse. These tools and other
>> similar ones use varying methods to detect whether memory accesses are
>> safe, but are fundamentally trying to do the same thing: check whether
>> each memory access is safe. It is desirable to optimize away redundant
>> runtime checks to improve such tools' runtime performance. This means
>> that there is a need for shared memory safety instrumentation and
>> optimization passes.
>>
>>
>> Proposal:
>> The general idea is to make SAFECode and ASAN use the following design:
>> 1. Add checks to memory accesses (loads, stores, and some intrinsics).
>> 2. Run the memory safety check optimization passes.
>> 3. Transform the remaining checks to tool-specific runtime calls.
>> 4. Do whatever the specific tool did before.
>>
>> This design would make it possible for SAFECode, ASAN, and other
>> similar tools to share the memory safety instrumentation and
>> optimization passes. The main benefit of the code reuse is that the
>> memory-safety-specific optimizations could be used by all such tools.
>>
>> The project proposes to modify SAFECode and ASAN as a proof of
>> concept. It might also be useful to modify SoftBound, ThreadSanitizer,
>> or some other tool but I have not analysed how difficult/useful that
>> would be. That is why they are excluded from the current proposal.
>>
>> Implementation plan:
>> 1. Create the common instrumentation pass.
>> 2. Add a pass to convert the common checks to ASAN-specific ones.
>> 3. Add a pass to convert the common checks to SAFECode-specific ones.
>> 4. Convert some of the simpler optimizations from SAFECode to run on
>> the common checks.
>> 5. Add more optimizations (from SAFECode or otherwise).
>>
>> The plan is to make sure that it is possible to commit early and often
>> without breaking anything (unless absolutely needed). The conversion
>> passes are needed to make the tool work but a side-effect is that the
>> existing tool-specific optimizations should continue working without
>> changes.
>>
>> The "simpler" optimizations are defined to be the ones that are easy
>> for humans to verify and do not have large extra dependencies like
>> Poolalloc or SMT solvers.
>>
>> Optimizations that will definitely be implemented such that they work
>> on the common memory safety checks (milestone 3 or 4):
>> * Remove obviously redundant checks in the same basic block.
>> * Remove unnecessary constant checks to global variables / allocas.
>> * Combine struct member checks in the same basic block.
>>
>
>  Beware, that some of such cases will be covered by GVN (load widening,
> etc). Although some will not.
> E.g.
>  struct S {
>   int alignment;
>   short a, b;
> };
>
> S *x;
> ...
> x->a = ...
> ... = x->b
>
>  These two accesses can be combined for ASAN, but not for TSAN.
>
>
>
>> * Hoisting constant checks from loops.
>>
>
>  In most cases, this should be handled by general LLVM loop invariant
> code motion.
>
>
>> * Something more.
>>
>> An additional plan that outlines the optimizations to be added in the
>> later part of the program will be produced and agreed upon before the
>> mid-term evaluations. The general idea is to add slightly more
>> complicated optimizations that are useful in practice rather than
>> large and complicated optimizations that are difficult to verify by
>> humans.
>>
>> Timeline:
>> Milestone 1 (June 1): The common instrumentation pass works and there
>> are tests to verify it.
>> Milestone 2 (June 22): The tool-specific conversion passes work and
>> there are tests to verify it.
>> Milestone 3 (July 6): Some simple optimization passes from SAFECode
>> work on the common checks; there are unit tests to verify that.
>> Finished (and agreed upon) a specific plan that outlines which
>> optimizations will be converted / created for milestone 4.
>> Mid-term evaluations deadline (July 13)
>> Milestone 4 (August 13): Added more optimizations (and relevant unit
>> tests) as specified in the additional plan.
>> Firm 'pencils down' date (August 20): More testing and documentation.
>>
>> Basically the idea is to produce something practically useful and
>> thoroughly tested that will definitely be done in time.
>>
>>
>> Contact information:
>> Included in the official submission.
>>
>>
>> Interesting to me:
>> I am generally interested in developing bug finding/detecting systems
>> but this project would also have been useful to me for a project I
>> completed previously (see the experience section). I have previously
>> used SAFECode for automatically checking whether a program has a
>> buffer overflow on a specific run.  I was interested in reusing the
>> static memory safety optimization parts of SAFECode but it seemed to
>> be too tightly integrated to be easily reused for my purposes.
>>
>>
>> Useful for LLVM:
>> This project would be useful for LLVM in general because it would make
>> it easier to develop memory safety tools based on LLVM because of the
>> available relevant transforms. Reducing the amount of code each
>> subproject has to add should make it more likely that the subprojects
>> stay compatible with the latest LLVM changes.
>>
>> It would be useful for ASAN mostly because the optimizations should
>> reduce the runtime overhead.
>>
>> It would be useful for SAFECode because the code should become a bit
>> more modular and there should be more code reuse. The extra testing
>> and shared code should make it easier to keep up with the changes in
>> LLVM because there would be more people who are interested in that
>> being the case.
>>
>> It would be useful for both ASAN and SAFECode because optimizations
>> based on the common instrumentation would be useful for both of them.
>>
>>
>> Relevant experience:
>> I created a tool based on LLVM and KLEE that aimed to optimize a
>> specific type of C++ programs such that they would crash on exactly
>> the same inputs as before the optimizations. This made the system find
>> inputs on which the programs crashed faster than before. Most of the
>> project was about creating LLVM passes that might make the bug finding
>> process faster while retaining that property.
>>
>> One part of the system was adding and later removing memory safety
>> checks. That was necessary because a significant part of the code
>> became otherwise unused after aggressively transforming / essentially
>> removing output calls (printf, the cout stream, etc.) and the aim was
>> to still detect invalid but unused memory accesses.
>>
>> I successfully participated in GSoC 2011 by creating an AI player for
>> an open source RTS game, Unknown Horizons (written in Python). I have
>> continued to contribute to that project so far.
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>
>
>
> _______________________________________________
> LLVM Developers mailing listLLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.eduhttp://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120406/a8a6f9ba/attachment.html>