[llvm-dev] Finding all pointers to functions

Wed Dec 23 10:26:16 PST 2015

On 12/23/15 12:55 PM, Russell Wallace wrote:
> On Wed, Dec 23, 2015 at 5:35 PM, John Criswell <jtcriswel at gmail.com 
> <mailto:jtcriswel at gmail.com>> wrote:
>
>     DSA was built when LLVM's optimizations maintained the type
>     information on GEP and other instructions (DSA existed before LLVM
>     was open-source).  As such, it uses LLVM's type information to aid
>     in its type-inference which, in turn, gives it field sensitivity
>     which, in turn, improves its accuracy.  Over time, LLVM
>     optimizations have come to modify the type information so that it
>     is just simple byte-level indexing (as opposed to
>     array-of-structure indexing).  DSA hasn't been updated to handle
>     that well.  That is why its precision is better pre-optimization
>     than post-optimization.
>
>
> Ah! I don't suppose you could point to some examples of this? E.g. a 
> simple test program such that one could eyeball the intermediate code 
> before and after optimization?

Off the top of my head, no, I don't have an example, but I suspect any 
program with an array indexing operation with a for loop will do.

>
>     Just out of curiosity, what are you trying to do?  I need call
>     graph analysis for C/C++ code with function pointers, and so I'm
>     writing an NSF proposal to seek funding to do that (among other
>     enhancements to my SVA infrastructure).  If it's something that
>     would be useful to you (or other LLVM community members), it would
>     be useful for me to know that.
>
>
> SVA?

Sorry.  SVA is Secure Virtual Architecture.  It's my LLVM-based 
infrastructure for controlling operating system kernel behavior via 
compiler instrumentation and hardware configuration.  I've used it to 
build a system that protects applications from a compromised operating 
system kernel as well as to enforce memory safety and control-flow 
integrity on operating system kernel code.

I need DSA for doing things like:

1) Creating an accurate call graph for kernel code to enforce better 
control-flow integrity and to test our future infrastructure for 
measuring the efficacy of defenses against code reuse attacks.

2) Analyzing the memory accesses of kernel modules to see if they modify 
kernel data structures that they should not modify (e.g., to find 
rootkits that modify the process list).

3) For optimizing run-time checks that protect kernel data structure, at 
run-time, from other kernel components (useful for a number of things).

In short, strong points-to and call graph analysis enable some 
interesting research projects.

>
> I'm trying to write a superoptimizer that can optimize code based on a 
> high-level understanding of what it's actually doing, so yes, call 
> graph analysis that can deal with function pointers does seem likely 
> to be one of the things that will be needed.

Nice.

One thing you might want to investigate is whether building a call graph 
analysis off of the TBAA metadata would work.  If TBAA works for lots of 
programs (I hear some non-conformant programs cause it problems), then 
using it as a springboard for analysis may be effective (as TBAA is 
already well maintained in the LLVM source tree).

Regards,

John Criswell

-- 
John Criswell
Assistant Professor
Department of Computer Science, University of Rochester
http://www.cs.rochester.edu/u/criswell

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151223/b7fdd5fb/attachment.html>