[llvm-dev] GSoC and SAFECode

Tue Mar 22 18:45:00 PDT 2016

On 3/22/16 7:52 PM, Michael McConville wrote:
> John Criswell wrote:
>> If you're interested in SAFECode, the first step is to get SAFECode
>> working with a newer version of LLVM.  A Master's student did some
>> work on this last summer with LLVM 3.7 but didn't finish.  It would
>> now need to be updated to LLVM 3.8 (though I suppose a completed LLVM
>> 3.7 port would be fine with me).
>>
>> After that, there are some interesting projects on which to work. One
>> would be static array bounds checking.  That could be interesting, but
>> it doesn't really address my immediate research needs.  Right now, I'm
>> more interested in getting the Baggy Bounds with Accurate Checking
>> (BBAC) feature enabled so that we can use it in research.  For
>> example, we could try to get faster enforcement of memory safety on
>> operating system kernels, examine the use of combined safe/unsafe
>> languages for OS kernels (without letting C code violate the safety
>> provided by the safe language), and enforce dynamic security policies
>> on kernel modules (to thwart rootkits).
>>
>> If you're interested in security projects on the kernel, you could
>> enhance the KCoFI prototype to use a more accurate control-flow graph
>> or to use code pointer integrity, or you could write optimizations for
>> the software-fault isolation instrumentation (which would improve both
>> KCoFI and Virtual Ghost, if you are familiar with those papers of
>> mine).
>>
>> Does any of these projects sound interesting to you?
> Yeah, definitely. Porting to LLVM 3.8 or finishing the 3.7 port would be
> a good way to get more familiar with LLVM internals.
>
> BBAC looks very interesting. I, like you (according to the BBAC paper's
> intro), am a little frustrated by the fact that these sorts of checkers
> still aren't used in standard software builds, so I find optimizing for
> performance and simplicity particularly interesting. Also, this is an
> anecdote, but have you considered writing pseudo-random data to the
> padding area and using its checksum as a canary?

No, I have not considered canaries, and I'd be very wary of doing so.  
Canaries are (IMHO) a hack; Stephen Checkoway has his students defeat 
stack canaries as a homework assignment.  I'd need to see a strong 
argument that a heap object canary would not be defeated easily.

I'm more interested in storing information like the following in the 
padding:

o The exact length of the memory object (BBAC)
o The points-to set(s) to which the memory object belongs (useful for 
finding casting errors, dangling pointer errors, and bugs in the 
compiler's points-to analysis)
o Policy information on which part of a program can modify which fields 
in the object (useful for restricting the behavior of kernel modules 
within a monolithic kernel)

I'm rather hoping that there's a research paper within the latter two 
projects.

>   Alternately, you could
> even just use the first few bytes of the padding directly. We recently
> added optional canaries to OpenBSD and it's been useful in finding bugs.

Bug finding and online protection make very different tradeoffs that I 
won't get into right now due to lack of time.  If you're interested, we 
could probably meet up at a conference sometime (or discuss it if your 
GSoC proposal is accepted :) ).

Regards,

John Criswell

>
> I'll have to read more about the kernel projects before I can comment.
>
> Thanks,
> Michael

-- 
John Criswell
Assistant Professor
Department of Computer Science, University of Rochester
http://www.cs.rochester.edu/u/criswell