<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">On 12/23/15 2:09 AM, Russell Wallace
wrote:<br>
</div>
<blockquote
cite="mid:CAH+nB+yBu-dCHfU=hg7G41Dupzui1RxpNn6Dcq4YDHhDs-2LjA@mail.gmail.com"
type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">On Tue, Dec 22, 2015 at 10:55 AM,
John Criswell <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:jtcriswel@gmail.com" target="_blank">jtcriswel@gmail.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><span class=""></span>You
could conservatively assume that any function that has
its address taken has a pointer to it that escapes into
memory or external code. </div>
</blockquote>
<div><br>
Right, that's what I'm doing to start with.<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">To make things a
little more accurate, you could scan the uses of any
function for which hasAddressTaken() returns true and
see if any of its uses escapes its function or escapes
into memory or external code. I believe
hasAddressTaken() returns true if the function is
subjected to a cast instruction, and functions are often
casted if they are used in a call that uses a different
signature than the function's declared signature.<br>
</div>
</blockquote>
<div><br>
I'll look into that. It seems reasonable to guess that the
major confounding factor in many C++ programs will be
references from virtual function tables; there should be
some way to optimize those specifically. <br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px
0.8ex;border-left:1px solid
rgb(204,204,204);padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"> <br>
To get anything more accurate, you'll need to use alias
analysis or points-to analysis. DSA tracks function
pointers in the heap and can tell you whether the
function is called from external code. However, DSA's
accuracy currently suffers if it is run after LLVM's
optimizations, and the code needs some serious TLC.<br>
</div>
</blockquote>
<div><br>
DSA presumably stands for data structure analysis. TLC =
tender loving care? Why does DSA become less accurate if
run after optimization?<br>
</div>
</div>
<br>
</div>
</div>
</blockquote>
<br>
DSA was built when LLVM's optimizations maintained the type
information on GEP and other instructions (DSA existed before LLVM
was open-source). As such, it uses LLVM's type information to aid
in its type-inference which, in turn, gives it field sensitivity
which, in turn, improves its accuracy. Over time, LLVM
optimizations have come to modify the type information so that it is
just simple byte-level indexing (as opposed to array-of-structure
indexing). DSA hasn't been updated to handle that well. That is
why its precision is better pre-optimization than post-optimization.<br>
<br>
Just out of curiosity, what are you trying to do? I need call graph
analysis for C/C++ code with function pointers, and so I'm writing
an NSF proposal to seek funding to do that (among other enhancements
to my SVA infrastructure). If it's something that would be useful
to you (or other LLVM community members), it would be useful for me
to know that.<br>
<br>
Regards,<br>
<br>
John Criswell<br>
<br>
<br>
<pre class="moz-signature" cols="72">--
John Criswell
Assistant Professor
Department of Computer Science, University of Rochester
<a class="moz-txt-link-freetext" href="http://www.cs.rochester.edu/u/criswell">http://www.cs.rochester.edu/u/criswell</a></pre>
</body>
</html>