[llvm-dev] llvm-dev Digest, Vol 136, Issue 22

Mon Oct 12 08:51:21 PDT 2015

On Fri, Oct 9, 2015, at 05:45 PM, Adve, Vikram Sadanand wrote:
> > Maybe I should have been a bit clearer; we're really interested in full
> > memory and type safety. We want to harden the system against memory
> > corruption vulnerabilities. Process isolation isn't an issue, as we are
> > in an embedded context where we don't have processes.
> 
> 
> Do you also care about use-after-free and free-after-free errors? 
> Handling free-after-free is not too hard, but general use-after-free can
> be expensive, unless you’re willing to spend a fair amount of extra
> memory.  If dynamic allocation isn’t an issue for your (embedded)
> applications, that makes this whole problem go away, of course.

Yes, we do care about these issues. We are happy with the trick used in
the TECS paper - that use after frees are type safe in that they will
either point to the original object or an object of the same type.

> 
> This brings me to the new question: exactly what unsafe features do you
> need to be concerned about?  Our first paper in the SAFECode project
> focused on identifying a subset of C for which type and memory safety
> could be enforced without *any* run-time checks or GC.  We found that
> array bounds checks were the biggest problem; ignoring these, we can
> ensure the safety of pointer and dynamic memory usage in all the embedded
> benchmarks we tried without any run-time checks.  The paper appeared at
> LCTES:
> 	http://llvm.org/pubs/2003-05-05-LCTES03-CodeSafety.html
> Like the later work, this did not eliminate use-after-free but used APA
> to ensure that any use-after-free errors are “harmless” in that they did
> not violate all the other type- and memory-safety guarantees.
> 
> Let me know if you have any questions about this.
> 

We're not sure what features to be concerned about until we actually try
to run it on some of the intended code base. So probably, everything.
But we are happy to spend some effort in refactoring programs if it
means we can get type/memory safety without run time checks. Most of the
C semantics that need to be dropped appear to be things that are quite
bad practice anyway (except perhaps complex array indexing).

I've been reading through this and a few other publications in order to
try and get a better handle on the code. Having not done anything with
llvm before and not knowing DSA/APA/SafeCode intimately I am a bit out
of my depth so the background reading is great. Also, your code is
really well commented :)

The main question I have at the moment is what is really broken with APA
in its current state? Or is it just that it's suspected to be broken?
I've managed to build and run it using opt, and it appears to work just
as well as it did for llvm-3.2. I was expecting to have to modify
something to work around the untyped GEPs that John mentioned, but in
fact I haven't had any such issues. I run using:

path/to/opt -S -load path/to/LLVMDataStructure.so -load
path/to/poolalloc.so -paheur-AllHeapNodes -poolalloc myprog.bc >
myprog.pa.ll

and get the same (modulo bytecode syntax changes) as with llvm 3.2. I
have also tried some of the other heuristics and have not found any
differences. But I don't think I tried with any programs that need more
than one pool yet (so haven't really exercised it), and I haven't tried
running sc over the output yet either.

It would be useful if I could get a pointer to some test programs that
really exercise PA. This is next on my todo list (going to check those
referenced in your publications first).

Thanks for your help Vikram. Best,
Ed