<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
On 4/6/12 12:50 AM, Kostya Serebryany wrote:
<blockquote
cite="mid:CAN=P9phQ-VRe11_nS0YhfWM=T5G=WAiS0XN+qtM4zdJ_JsXthQ@mail.gmail.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
I'd like some similar work to be done, although I view it a bit
differently.
<div>This might be a separate analysis pass that knows nothing
about ASAN or SAFECode</div>
<div>and appends metadata nodes to memory access instructions
saying things like</div>
</blockquote>
<br>
This is a good idea but is the wrong way to implement the idea.
LLVM passes are not required to preserve metadata, and even if they
were required to do so, there would always be a pass with a bug that
would fail to preserve the metadata properly. It's an approach that
can lead to undesired headaches. Furthermore, you're not guaranteed
that an instruction that was deemed safe earlier is safe after
transformation; there are optimizations that LLVM can do on C code
exhibiting undefined behavior that can change it from memory safe to
memory-unsafe code.<br>
<br>
The correct way to do this is by writing generic LLVM analysis
passes that compute this information and can be queried by
SAFECode/ASAN-specific instrumentation and optimization passes. In
this fashion, the LLVM pass manager will re-run the analyses if an
earlier transform comes along and modifies the IR. It is also far
more robust since analysis information cannot be discarded by LLVM
passes written by other people.<br>
<br>
<blockquote
cite="mid:CAN=P9phQ-VRe11_nS0YhfWM=T5G=WAiS0XN+qtM4zdJ_JsXthQ@mail.gmail.com"
type="cite">
<div> - this access can not go out of buffer bounds</div>
<div> - this access can not touch free-ed memory</div>
<div> - this access can not participate in a race </div>
<div> - this read always reads initialized memory </div>
<div>Then the actual instrumentation passes will simply consult
the metadata. <br>
</div>
</blockquote>
<br>
One of the design principles I've been trying to follow in
refactoring SAFECode is that we have dumb instrumentation passes
that just instrument everything followed by optimization passes that
remove run-time checks that are unnecessary. This follows the
compiler-building principle called Separation of Concerns, and it's
useful in tools like SAFECode because it allows us to easily turn
optimizations on/off by running/not running individual passes. This
makes performance analysis easier (we can see the effect of an
optimization by not running a pass), and it makes it possible for
bugpoint to figure out which optimization is causing a program to
break.<br>
<br>
SAFECode used to have a single instrumentation pass that inserted
both load/store and GEP checks with various options to
enable/disable optimizations. It made the code complex and
difficult to read. The new passes are reusable and so blindingly
simple that a child can understand what they do. I highly recommend
that ASAN not make the mistake that SAFECode originally made.<br>
<br>
Finally, the common infrastructure idea I was talking about on the
SAFECode open projects page is to have a common set of run-time
check function names and set of instrumentation passes to add them
and optimize them. In this way, SAFECode/SoftBound/ASAN can share
not only the same analysis passes (e.g., an always-safe load/store
analysis) but the actual optimization and instrumentation passes,
too. SAFECode/ASAN specific transforms can be run after the generic
instrumentation passes to specialize the checks for the specific
tool (e.g., SAFECode would have a pass that adds pool handles to the
run-time checks).<br>
<br>
<blockquote
cite="mid:CAN=P9phQ-VRe11_nS0YhfWM=T5G=WAiS0XN+qtM4zdJ_JsXthQ@mail.gmail.com"
type="cite">
<div><br>
</div>
<div>Equally important would be an exhaustive test suite. </div>
<div>Not sure if it should be in LLVM IR or in C (if in C, other
compilers will benefit too).</div>
</blockquote>
<br>
Wilander has a new suite of tests out that might be useful.<br>
<br>
-- John T.<br>
<br>
<blockquote
cite="mid:CAN=P9phQ-VRe11_nS0YhfWM=T5G=WAiS0XN+qtM4zdJ_JsXthQ@mail.gmail.com"
type="cite">
<div><br>
<div class="gmail_quote">On Thu, Apr 5, 2012 at 6:49 PM, Ott
Tinn <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:llvm@otinn.com">llvm@otinn.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
This is a proposal to create memory safety instrumentation
and<br>
optimization passes for LLVM.<br>
<br>
Abstract:<br>
The goal of this project is to modify SAFECode and
AddressSanitizer<br>
(ASAN) to use a common set of memory safety instrumentation
and<br>
optimization passes to increase code reuse. These tools and
other<br>
similar ones use varying methods to detect whether memory
accesses are<br>
safe, but are fundamentally trying to do the same thing:
check whether<br>
each memory access is safe. It is desirable to optimize away
redundant<br>
runtime checks to improve such tools' runtime performance.
This means<br>
that there is a need for shared memory safety
instrumentation and<br>
optimization passes.<br>
<br>
<br>
Proposal:<br>
The general idea is to make SAFECode and ASAN use the
following design:<br>
1. Add checks to memory accesses (loads, stores, and some
intrinsics).<br>
2. Run the memory safety check optimization passes.<br>
3. Transform the remaining checks to tool-specific runtime
calls.<br>
4. Do whatever the specific tool did before.<br>
<br>
This design would make it possible for SAFECode, ASAN, and
other<br>
similar tools to share the memory safety instrumentation and<br>
optimization passes. The main benefit of the code reuse is
that the<br>
memory-safety-specific optimizations could be used by all
such tools.<br>
<br>
The project proposes to modify SAFECode and ASAN as a proof
of<br>
concept. It might also be useful to modify SoftBound,
ThreadSanitizer,<br>
or some other tool but I have not analysed how
difficult/useful that<br>
would be. That is why they are excluded from the current
proposal.<br>
<br>
Implementation plan:<br>
1. Create the common instrumentation pass.<br>
2. Add a pass to convert the common checks to ASAN-specific
ones.<br>
3. Add a pass to convert the common checks to
SAFECode-specific ones.<br>
4. Convert some of the simpler optimizations from SAFECode
to run on<br>
the common checks.<br>
5. Add more optimizations (from SAFECode or otherwise).<br>
<br>
The plan is to make sure that it is possible to commit early
and often<br>
without breaking anything (unless absolutely needed). The
conversion<br>
passes are needed to make the tool work but a side-effect is
that the<br>
existing tool-specific optimizations should continue working
without<br>
changes.<br>
<br>
The "simpler" optimizations are defined to be the ones that
are easy<br>
for humans to verify and do not have large extra
dependencies like<br>
Poolalloc or SMT solvers.<br>
<br>
Optimizations that will definitely be implemented such that
they work<br>
on the common memory safety checks (milestone 3 or 4):<br>
* Remove obviously redundant checks in the same basic block.<br>
* Remove unnecessary constant checks to global variables /
allocas.<br>
* Combine struct member checks in the same basic block.<br>
</blockquote>
<div><br>
</div>
<div>Beware, that some of such cases will be covered by GVN
(load widening, etc). Although some will not. </div>
<div>E.g. </div>
<div>
struct S {</div>
<div> int alignment;</div>
<div> short a, b;</div>
<div>};</div>
<div> </div>
<div>S *x;</div>
<div>...</div>
<div>x->a = ... </div>
<div>... = x->b</div>
<div><br>
</div>
<div>These two accesses can be combined for ASAN, but not for
TSAN. </div>
<div><br>
</div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
* Hoisting constant checks from loops.<br>
</blockquote>
<div><br>
</div>
<div>In most cases, this should be handled by general LLVM
loop invariant code motion. </div>
<div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
* Something more.<br>
<br>
An additional plan that outlines the optimizations to be
added in the<br>
later part of the program will be produced and agreed upon
before the<br>
mid-term evaluations. The general idea is to add slightly
more<br>
complicated optimizations that are useful in practice rather
than<br>
large and complicated optimizations that are difficult to
verify by<br>
humans.<br>
<br>
Timeline:<br>
Milestone 1 (June 1): The common instrumentation pass works
and there<br>
are tests to verify it.<br>
Milestone 2 (June 22): The tool-specific conversion passes
work and<br>
there are tests to verify it.<br>
Milestone 3 (July 6): Some simple optimization passes from
SAFECode<br>
work on the common checks; there are unit tests to verify
that.<br>
Finished (and agreed upon) a specific plan that outlines
which<br>
optimizations will be converted / created for milestone 4.<br>
Mid-term evaluations deadline (July 13)<br>
Milestone 4 (August 13): Added more optimizations (and
relevant unit<br>
tests) as specified in the additional plan.<br>
Firm 'pencils down' date (August 20): More testing and
documentation.<br>
<br>
Basically the idea is to produce something practically
useful and<br>
thoroughly tested that will definitely be done in time.<br>
<br>
<br>
Contact information:<br>
Included in the official submission.<br>
<br>
<br>
Interesting to me:<br>
I am generally interested in developing bug
finding/detecting systems<br>
but this project would also have been useful to me for a
project I<br>
completed previously (see the experience section). I have
previously<br>
used SAFECode for automatically checking whether a program
has a<br>
buffer overflow on a specific run. I was interested in
reusing the<br>
static memory safety optimization parts of SAFECode but it
seemed to<br>
be too tightly integrated to be easily reused for my
purposes.<br>
<br>
<br>
Useful for LLVM:<br>
This project would be useful for LLVM in general because it
would make<br>
it easier to develop memory safety tools based on LLVM
because of the<br>
available relevant transforms. Reducing the amount of code
each<br>
subproject has to add should make it more likely that the
subprojects<br>
stay compatible with the latest LLVM changes.<br>
<br>
It would be useful for ASAN mostly because the optimizations
should<br>
reduce the runtime overhead.<br>
<br>
It would be useful for SAFECode because the code should
become a bit<br>
more modular and there should be more code reuse. The extra
testing<br>
and shared code should make it easier to keep up with the
changes in<br>
LLVM because there would be more people who are interested
in that<br>
being the case.<br>
<br>
It would be useful for both ASAN and SAFECode because
optimizations<br>
based on the common instrumentation would be useful for both
of them.<br>
<br>
<br>
Relevant experience:<br>
I created a tool based on LLVM and KLEE that aimed to
optimize a<br>
specific type of C++ programs such that they would crash on
exactly<br>
the same inputs as before the optimizations. This made the
system find<br>
inputs on which the programs crashed faster than before.
Most of the<br>
project was about creating LLVM passes that might make the
bug finding<br>
process faster while retaining that property.<br>
<br>
One part of the system was adding and later removing memory
safety<br>
checks. That was necessary because a significant part of the
code<br>
became otherwise unused after aggressively transforming /
essentially<br>
removing output calls (printf, the cout stream, etc.) and
the aim was<br>
to still detect invalid but unused memory accesses.<br>
<br>
I successfully participated in GSoC 2011 by creating an AI
player for<br>
an open source RTS game, Unknown Horizons (written in
Python). I have<br>
continued to contribute to that project so far.<br>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a moz-do-not-send="true" href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a>
<a moz-do-not-send="true"
href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>
<a moz-do-not-send="true"
href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev"
target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br>
</blockquote>
</div>
<br>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
LLVM Developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:LLVMdev@cs.uiuc.edu">LLVMdev@cs.uiuc.edu</a> <a class="moz-txt-link-freetext" href="http://llvm.cs.uiuc.edu">http://llvm.cs.uiuc.edu</a>
<a class="moz-txt-link-freetext" href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a>
</pre>
</blockquote>
<br>
</body>
</html>