[LLVMdev] Expressing ambiguous points-to info in AliasAnalysis::alias(...) results?

Mon Jun 15 15:03:06 PDT 2015

On Mon, Jun 15, 2015 at 2:20 PM, Christian Convey
<christian.convey at gmail.com> wrote:
> On Mon, Jun 15, 2015 at 4:54 PM, Daniel Berlin <dberlin at dberlin.org> wrote:
>>
>> > I'm mostly going from Robert Wilson's 1997 phd thesis, although I'm
>> > pretty
>> > sure I've seen a lot of the same ideas elsewhere as well.
>>
>> Yes, using summary/transfer functions has been tried a lot.
>> Note: the numbers in most of these phd thesis do *not* get born out in
>> practice.
>>
>> See, e.g, http://www.cse.unsw.edu.au/~ysui/papers/spe14.pdf
>>
>> vs
>>
>> http://impact.crhc.illinois.edu/shared/report/phd-thesis-erik-nystrom.pdf
>>
>> vs
>> the wilson thesis.
>
>
> Cool, I'll give those papers a read, thanks.
>
>>
>> In particular, i expect the wilson algorithm is going to fall over
>> very badly on anything of even medium size.
>
>
> Interesting.  Do you mean "fall over" in terms of running time, or
> precision, or something else?

Running time.  Wilson's approach took 16 seconds for 4500 lines of code.
Let's assume his implementation actually did things right. Most
research impls do things like "ignore effects of external function
calls", which significantly impacts runtime in practice.

Let's further assume that scaling of computer speed from 1995 to now
eats your entire exponential factor, making the algorithm in practice
linear (This is not even close to true).

His algorithm, on a million lines of code, would take an hour to run.
Which would be pretty unacceptable.

In practice, i think you'll find it probably never finishes :)

>
>>
>> Nowadays, the thing of the day seems to be using cfl reachability
>> based formulations and doing context-sensitivity on demand.
>
>
> Out of curiosity, what's the appeal (aside from academic novelty) of the CFL
> approaches?

They are demand driven and easy to add context sensitivity to.
At the same time, on an all-pairs basis, they are competitive with
constraint based andersen's solvers.

You can't really do better than this, because it means you pay pretty
close to the minimal  cost for the stuff you query.

>
> From a personal perspective, I'm particularly interested in the maximum
> analytic precision each AA approach can take, almost without regard to how
> much time or memory the computation takes to run.

I'm wildly curious why.

> So one thing I found
> appealing about Wilson's thesis was his "location set" approach for
> providing field-sensitivity.

I never saw this paper (it was too old for where i started), but GCC
uses something similar.

We create variables for each field of a given variable.
They are linked in a linked list with an offset and size (if known) to
the variables that contain them.

Constraints contain stride info, and intersection is then used to
compute what sub-variables a constraint touches during solving.
It is a variant of this:
http://homepages.mcs.vuw.ac.nz/~djp/files/paste04.ps

Note a few things:

Some programs have so many sub-variables you will run out of memory.
This is true in the location set approach as well.
GCC only created variables for possibly-used field ranges too.
I can point you at GCC bugs offline if you are interested

>
> I also liked Alain Deutsch's 1994 paper "Interprocedural may-alias analysis
> for pointers: Beyond k-limiting" for the same reason.  But I didn't take a
> crack at implementing *that* because based on the paper's complexity, I
> thought I'd go clinically insane trying to turn it into code :)