[llvm-dev] Finding all pointers to functions

Wed Dec 23 09:35:49 PST 2015

On 12/23/15 2:09 AM, Russell Wallace wrote:
> On Tue, Dec 22, 2015 at 10:55 AM, John Criswell <jtcriswel at gmail.com 
> <mailto:jtcriswel at gmail.com>> wrote:
>
>     You could conservatively assume that any function that has its
>     address taken has a pointer to it that escapes into memory or
>     external code.
>
>
> Right, that's what I'm doing to start with.
>
>     To make things a little more accurate, you could scan the uses of
>     any function for which hasAddressTaken() returns true and see if
>     any of its uses escapes its function or escapes into memory or
>     external code.  I believe hasAddressTaken() returns true if the
>     function is subjected to a cast instruction, and functions are
>     often casted if they are used in a call that uses a different
>     signature than the function's declared signature.
>
>
> I'll look into that. It seems reasonable to guess that the major 
> confounding factor in many C++ programs will be references from 
> virtual function tables; there should be some way to optimize those 
> specifically.
>
>
>     To get anything more accurate, you'll need to use alias analysis
>     or points-to analysis.  DSA tracks function pointers in the heap
>     and can tell you whether the function is called from external
>     code.  However, DSA's accuracy currently suffers if it is run
>     after LLVM's optimizations, and the code needs some serious TLC.
>
>
> DSA presumably stands for data structure analysis. TLC = tender loving 
> care? Why does DSA become less accurate if run after optimization?
>

DSA was built when LLVM's optimizations maintained the type information 
on GEP and other instructions (DSA existed before LLVM was 
open-source).  As such, it uses LLVM's type information to aid in its 
type-inference which, in turn, gives it field sensitivity which, in 
turn, improves its accuracy.  Over time, LLVM optimizations have come to 
modify the type information so that it is just simple byte-level 
indexing (as opposed to array-of-structure indexing).  DSA hasn't been 
updated to handle that well.  That is why its precision is better 
pre-optimization than post-optimization.

Just out of curiosity, what are you trying to do?  I need call graph 
analysis for C/C++ code with function pointers, and so I'm writing an 
NSF proposal to seek funding to do that (among other enhancements to my 
SVA infrastructure).  If it's something that would be useful to you (or 
other LLVM community members), it would be useful for me to know that.

Regards,

John Criswell

-- 
John Criswell
Assistant Professor
Department of Computer Science, University of Rochester
http://www.cs.rochester.edu/u/criswell

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151223/bd540330/attachment.html>