[llvm-dev] GSoC and SAFECode
John Criswell via llvm-dev
llvm-dev at lists.llvm.org
Tue Mar 22 18:45:00 PDT 2016
On 3/22/16 7:52 PM, Michael McConville wrote:
> John Criswell wrote:
>> If you're interested in SAFECode, the first step is to get SAFECode
>> working with a newer version of LLVM. A Master's student did some
>> work on this last summer with LLVM 3.7 but didn't finish. It would
>> now need to be updated to LLVM 3.8 (though I suppose a completed LLVM
>> 3.7 port would be fine with me).
>>
>> After that, there are some interesting projects on which to work. One
>> would be static array bounds checking. That could be interesting, but
>> it doesn't really address my immediate research needs. Right now, I'm
>> more interested in getting the Baggy Bounds with Accurate Checking
>> (BBAC) feature enabled so that we can use it in research. For
>> example, we could try to get faster enforcement of memory safety on
>> operating system kernels, examine the use of combined safe/unsafe
>> languages for OS kernels (without letting C code violate the safety
>> provided by the safe language), and enforce dynamic security policies
>> on kernel modules (to thwart rootkits).
>>
>> If you're interested in security projects on the kernel, you could
>> enhance the KCoFI prototype to use a more accurate control-flow graph
>> or to use code pointer integrity, or you could write optimizations for
>> the software-fault isolation instrumentation (which would improve both
>> KCoFI and Virtual Ghost, if you are familiar with those papers of
>> mine).
>>
>> Does any of these projects sound interesting to you?
> Yeah, definitely. Porting to LLVM 3.8 or finishing the 3.7 port would be
> a good way to get more familiar with LLVM internals.
>
> BBAC looks very interesting. I, like you (according to the BBAC paper's
> intro), am a little frustrated by the fact that these sorts of checkers
> still aren't used in standard software builds, so I find optimizing for
> performance and simplicity particularly interesting. Also, this is an
> anecdote, but have you considered writing pseudo-random data to the
> padding area and using its checksum as a canary?
No, I have not considered canaries, and I'd be very wary of doing so.
Canaries are (IMHO) a hack; Stephen Checkoway has his students defeat
stack canaries as a homework assignment. I'd need to see a strong
argument that a heap object canary would not be defeated easily.
I'm more interested in storing information like the following in the
padding:
o The exact length of the memory object (BBAC)
o The points-to set(s) to which the memory object belongs (useful for
finding casting errors, dangling pointer errors, and bugs in the
compiler's points-to analysis)
o Policy information on which part of a program can modify which fields
in the object (useful for restricting the behavior of kernel modules
within a monolithic kernel)
I'm rather hoping that there's a research paper within the latter two
projects.
> Alternately, you could
> even just use the first few bytes of the padding directly. We recently
> added optional canaries to OpenBSD and it's been useful in finding bugs.
Bug finding and online protection make very different tradeoffs that I
won't get into right now due to lack of time. If you're interested, we
could probably meet up at a conference sometime (or discuss it if your
GSoC proposal is accepted :) ).
Regards,
John Criswell
>
> I'll have to read more about the kernel projects before I can comment.
>
> Thanks,
> Michael
--
John Criswell
Assistant Professor
Department of Computer Science, University of Rochester
http://www.cs.rochester.edu/u/criswell
More information about the llvm-dev
mailing list