[LLVMdev] GSoC 2012 Ideas - Alias Analysis

Wed Apr 4 21:27:16 PDT 2012

Hi guys,

I've gone ahead and written my proposal to GSoC. Comments are welcome! :)

=== Introduction ===

Pointer analysis has the power of increasing the effectiveness and the
efficiency of other compiler optimizations, and it is a prerequisite
for many of them. The quality of pointer information can considerably
affect the performance and the quality of those optimizations. The
most precise kind of pointer analysis is context-sensitive and
flow-sensitive. However, it is hard to scale context-sensitive pointer
analysis to big programs. On the other hand, it has been shown that it
is possible to scale flow-sensitive pointer analysis to programs with
millions of lines of code[1].

There are several passes on LLVM to perform pointer analysis: -no-aa,
-basic-aa, -globalsmodref-aa, -scev-aa, ds-aa, and steens-aa -- the
last two ones are not part of the LLVM main tree; they are part of
poolalloc and work as a separated module of LLVM. None of these passes
implements flow-sensitive pointer analysis, though.

Thus, I propose to implement a flow-sensitive pointer analysis
algorithm on LLVM, as part of the Google Summer of Code program. The
implementation that I propose is based on the value flow graph and was
created by Li et. al.[2] Additionally to the pointer analysis
implementation, I propose to implement a program slicer that uses the
results of flow-sensitive pointer analysis. I'll explain bellow the
details about both implementations. Finally, I propose to add an
AliasAnalysis interface to the new implementation of pointer analysis
so that other optimizations can use its results.

=== Implementation ===

== Value-Flow-Based Pointer Analysis ==

The paper "Boosting the performance of flow-sensitive points-to
analysis using value flow"[2] introduces a novel method to perform
flow-sensitive pointer analysis. This method transforms the points-to
analysis problem to a general graph reachability problem. And
according to the authors, the runtime of their implementation is
comparable to the state-of-the-art flow-insensitive points-to
analysis. Implementing such a technique on LLVM would potentially
benefit several optimizations. Besides, the authors claim that their
approach is intuitive and easy to implement, which, in my opinion, is
good for a summer project.

== The Alias Analysis Interface ==

All of the LLVM's alias analysis implementations have a common
interface to which the clients make queries regarding to pointer
information. One of the objectives of this work would be implement
this same interface in the proposed flow-sensitive pointer analysis
algorithm. It would make possible to chain the other LLVM's alias
analysis implementations with the flow-sensitive implementation and
get the best of both.

== The Program Slicer ==

A program slicer aims to find a set of program instructions that may
affect a variable at a given program point. Program slicing uses
points-to information. And its performance and precision depend on the
performance and precision of pointer analysis. A program slicer builds
a System Dependence Graph (SDG) based on the points-to information. If
that information is imprecise -- points-to set bigger than necessary
-- the SDG become too big, making program slicing inefficient. For
example, one of the students in our research group performed an
experiment with program slicing on the Vim benchmark, using a
Andersen-style pointer analysis. It produced a SDG with more than 30
millions of nodes. We believe that with a more precise pointer
information we can reduce the size of the graph and therefore make
program slicing more efficient.

=== Timeline ===

* Weeks 1 - 4: Implement the value-flow-based pointer analysis algorithm;

* Weeks 5 - 6: Implement the AliasAnalysis interface;

* Weeks 7 - 10: Implement the program slicer;

* Week 11: Run tests to compare the impact of the flow-sensitive
pointer analysis and the other LLVM pointer analyses on the
effectiveness and efficiency of the clients. As clients, I'd suggest:
  - Agressive Dead Code Elimination;
  - Loop Invariant Code Motion;
  - Global Value Numbering;
  - Program Slicing.

* Week 12: Write a report with the results of the experiments and
explaining the implementation.

=== Background ===

I'm a fourth year undergraduate student at the Federal University of
Minas Gerais (UFMG), in Brazil. I've been working with C++ for almost
four years and with LLVM for almost three years. I've worked on UFMG's
compiler group in a project to design and implement an algorithm to
perform range analysis, and in a number of other small projects using
LLVM.

Nowadays I'm working as research assistant at The University of Texas
at Austin, under the orientation of professor Calvin Lin. I'm working
on two projects at UT: one is to profile several pointer analysis
algorithms and explore algorithmic improvements, and the other is to
compare the impact of different pointer analysis algorithms in the
precision and efficiency of other analyses and optimizations.

=== References ===

[1] Hardekopf and Lin: The Ant and the Grasshopper: Fast and Accurate
Pointer Analysis for Millions of Lines of Code

[2] Li et. al.: Boosting the performance of flow-sensitive points-to
analysis using value flow

[3] LLVM Alias Analysis Infrastructure: http://llvm.org/docs/AliasAnalysis.html

On Mon, Apr 2, 2012 at 1:55 PM, Douglas do Couto Teixeira
<douglasdocouto at gmail.com> wrote:
> Hi John, thanks for your feedback. I'll try to answer your comments bellow.
>
> On Mon, Apr 2, 2012 at 10:59 AM, John Criswell <criswell at illinois.edu> wrote:
>> On 3/30/12 7:07 PM, Douglas do Couto Teixeira wrote:
>>>
>>> Hi guys,
>>>
>>> I'm an undergraduate computer science student and I've been working
>>> with pointer analysis this semester under the orientation of professor
>>> Calvin Lin, at The University of Texas at Austin. I'm interested on
>>> helping to develop LLVM's alias analysis infrastructure. I have two
>>> ideas for a possible proposal and I'd like to know if you have
>>> interest on any of them.
>>>
>>> My first idea is to improve LLVM's existing alias analysis
>>> infrastructure. I saw on the open projects page that there are a
>>> couple of things that need to be done on LLVM's Pointer and Alias
>>> Analysis[1]. However, some things seem to be solved
>>> (http://llvm.org/PR1604 and http://llvm.org/PR452) and it made me a
>>> little confused because I don't know exactly what should be addressed
>>> by a possible proposal.
>>>
>>> My second idea is to implement a pointer analysis algorithm in LLVM
>>> using Value Flow Graph, as described by the paper "Boosting the
>>> Performance of Flow-sensitive Points-to Analysis using Value Flow"[2].
>>>
>>> Thus, I'd like to know whether or not you think improving LLVM's alias
>>> analysis infrastructure using one of those two suggestions would be a
>>> reasonable thing to do as a GSoC.
>>
>>
>> If you want to work on points-to analysis, then you need to have some
>> feature which will benefit from using that points-to analysis.  Example
>> applications of points-to analysis would be optimization and static
>> backwards slicing.
>>
>
> I think implementing a program slicer would be doable. We have a
> prototype implemented; extending it and porting it to the newer
> version of LLVM shouldn't be hard.
>
> As for the optimizations, is there any optimization (maybe a client of
> pointer information, maybe not) that you would like to see implemented
> in LLVM that is not implemented yet?
>
> I'd like to point out that, although it'd be easier for me working
> with pointer analysis, if there is anything that you think has more
> priority or is more appealing for the LLVM community, I'd be happy to
> give it a try.
>
>> If you write a proposal to work on points-to analysis, then I recommend that
>> your proposal also include work on code that will use that points-to
>> analysis.  For example, you could implement a points-to analysis and then
>> add an AliasAnalysis interface to it so that LLVM optimizations can use it
>> without modification.
>
> Yes, that would be part of the proposal.
>
>>  Alternatively, you could implement some new
>> optimization that uses the points-to results from your analysis directly.
>>
>> If you can provide evidence in your proposal that LLVM is missing
>> optimization opportunities because of a lack of good points-to analysis,
>> that would make your proposal even stronger.
>>
>
> Well, off the top of my head, I can't think of any missing
> optimizations related to points-to analysis. However, I'm developing a
> study right now to investigate which clients would benefit from a
> stronger pointer analysis. There is a famous paper about this "Which
> Pointer Analysis Should I Use?" where they conclude that most of the
> clients do not need a really strong pointer analysis. However, it has
> been 12 years since they developed that study, they only analyzed very
> small benchmarks, and they experimented just a few clients. So,
> hopefully, we'll be able to find potencial clients that benefit from a
> strong pointer analysis.
>
> Douglas
>
>> -- John T.
>>
>>>
>>> [1] http://llvm.org/OpenProjects.html#pointeranalysis
>>> [2] http://labs.oracle.com/docs/2011/150-299/2011-0189/esec053-li.pdf
>>>
>>> Best,
>>>
>>> Douglas
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>