<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Mon, Jun 15, 2015 at 11:02 AM, Daniel Berlin <span dir="ltr"><<a href="mailto:dberlin@dberlin.org" target="_blank">dberlin@dberlin.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Points-to analysis on LLVM-IR itself is fine (see the current CFL-AA,<br>

or the old deleted andersen's implementations), and giving may-alias<br>

and no-alias results also works. Giving must-alias answers, however,<br>

is difficult.<br>

<br>

In particular, i would not simply ignore some types of constructs and<br>

expect to produce valid answers.<br></blockquote><div><br></div><div>Makes sense.  Thanks for the advice.</div><div> <br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br>

</span>There are plenty of things that are illegal in C but legal in LLVM IR.<br>

<br>

For example, the following is legal LLVM IR (sorry for c style, it's early)<br>

<br>

bar(int64 a) {<br>

int64 * foo = inttoptr(a);<br>

baz = load *foo;<br>

}<br>

<br>

This is not illegal, and will produce a valid result.<br>

<br>

Same with stuff like:<br>

bar(int64 *a) {<br>

int64 foo = ptrtoint(a);<br>

baz = foo + 5;<br>

int64 *b = inttoptr(baz);<br>

c = load *b;<br>

}<br>

<br>

Again, not illegal, and produces a valid result.<br>

You can pretty much do what you want.<br>

<br>

Things like "c pointer aliasing rules" exist only as metadata.<br>

So in general, you can't expect "invalid pointers" to buy you very much.<br></blockquote><div><br></div><div>I see, thanks for clarifying.  The AA algorithm I've been working with assumes that the type system is going to lie, since C allows type punning.  I'm pretty sure I can port that distrust to the LLVM IR version of the algorithm.  It sounds like that would cover the examples you gave above, if I'm also appropriately pessimistic about the behavior of unknown / unanalyzed callers and callees.  </div><div><br></div><div>Maybe what I'll try is to add a flag to each vertex in the may-point-to graph, indicating whether or not the vertex's memory might hold additional, poorly understood pointers.  Then I can let an appropriate amount of hell break loose in the analysis, if a piece of memory with that flag is used in various ways.</div><div><br></div><div>That way, if over time I can make the algorithm better at detecting and making sense of code which generates new pointer values, I can just gradually reduce the cases where I need to set that flag.</div><div><br></div><div>  <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span class="">

> I did look at the LLVM IR for calling a virtual function in C++, since you<br></span><span class="">

> mentioned that as an example earlier.  From manual inspection, I thought I<br></span><span class="">

> could spot the value flow of the virtual function pointer from where the<br></span><span class="">

> function was defined, into the vtable constant for that class, and then into<br></span><span class="">

> the class instance's vtable pointer.</span><span class=""><br></span>This depends on the frontend generating the llvm IR :)</blockquote></div><br></div><div class="gmail_extra">Touche.</div></div>