[LLVMdev] Range Analysis GSoC 2011 Proposal

Thu Mar 24 16:06:37 PDT 2011

Dear LLVM community,

A revised version of my GSoC proposal is listed below. Please, let me know
your comments.

Adding Range Analysis to LLVM

Abstract

The objective of this work is patch our implementation of range analysis
into LLVM. I have a working implementation of range analysis in LLVM, but it
is not currently part of the main distribution. I propose to integrate our
range analysis implementation into the Lazy Value Info (LVI) interface that
LLVM provides. Range analysis finds the intervals of values that may be
bound to the integer variables during the execution of programs and is
useful in several scenarios: constant propagation, detection of potential
buffer overflow attacks, dead branch elimination, array bound check
elimination, elimination of overflow tests in scripting languages such as
JavaScript and Lua, etc.

Objective

The objective of this project is to augment LLVM with range analysis. We
will do this integration by patching the current implementation of range
analysis that we have onto the Lazy Value Info (LVI) interface that LLVM
already provides. In addition, we will develop new optimizations using LVI.
In particular, we will provide a pass that performs conditional constant
propagation [5], and elimination of dead-branches.

Criteria of Success

   - To improve substantially the precision of the current implementation of
   LVI. Currently, LVI’s interface only allows a client to know if a variable
   contains a constant. We want to allow LVI to report that a variable either
   contains a constant, or a known-range.
   - To improve the current implementation of constant propagation that LLVM
   uses. We hope to obtain a small performance gain on the C benchmarks in the
   LLVM test suite, and a larger gain on Java programs that are compiled using
   VMKit.
   - To improve the implementation of JumpThreading, in such a way that more
   dead-branches will be eliminated. Again, we hope to achieve a small speed-up
   on the C benchmarks, and a larger speed-up on the Java benchmarks.

Background

Range Analysis is a technique that maps integer variables to the possible
ranges of values that they may assume throughout the execution of a program.
Thus, for each integer variable, a range analysis determines its lower and
upper limits. A very simple range analysis would, for instance, map each
variable to the limits imposed by its type. That is, an 8-bit signed integer
variable can be correctly mapped to the interval [-128, 127], and an 16-bit
signed integer variable can be mapped to [-32768, 32767]. However, the
precision of this analysis can greatly be improved from information inferred
from the program text.

Ideally this range should be as constrained as possible, so that an
optimizing compiler could learn more information about each variable.
However, the range analysis must be conservative, that is, it will only
constrain the range of a variable if it can prove that it is safe to do so.
As an example, consider the program:

i = read();
if (i < 10) {
  print (i + 1);
else {
  print(i - 1);
}

In this program we know, from the conditional test, that the value of*
‘i’*in the true side of the branch is in the range [-INF, 9], and in
the false
side is in the range [10, +INF].

During the Summer of Code 2010 I have designed and implemented, under the
orientation of Duncan Sands, a non-iterative range analysis algorithm. Our
implementation is currently fully functional, been able to analyze the whole
LLVM test suite. For more details, see [4]. However, this implementation has
never been integrated into the LLVM main trunc, for two reasons:

   1. We use an intermediate representation called extended static
   assignment form [6], which the LLVM contributors were reluctant to use;
   2. During the SoC 2010 we did not have time to completely finish our
   implementation, and runtime numbers were available only by the end of 2010.

In order to address the first item, we propose to integrate the intermediate
representation directly into our analysis, yet, as a module that can be used
in separate by other clients, if necessary. We are using the e-SSA form to
gain precision but it is not a requirement. The use of the e-SSA form
increases a lot of the precision of the analysis, but we still can work
without it. A previous Summer of Code, authored by Andre Tavares, has shown
that the e-SSA form increases the number of phi-functions in the program
code by less than 10%, and it is very fast to build [6].

The second item of our list of hindrances is no longer a problem. Our
implementation is ready for use. We have been able to analyze the whole LLVM
test suite, plus SPEC CPU 2006 - over 4 million LLVM bytecoes - in 44
seconds on a 2.4GHz machine. We obtain non-trivial bit size reductions for
the small benchmarks, having results that match those found by previous,
non-conservative works, such as Stephenson’s et al’s [2]. Moreover, our
implementation is based on a very modern algorithm, by Su and Wagner [1],
augmented with Gawlitza’s technique to handle cycles [3]. We believe that
this is the fastest implementation of such an analysis.

In small benchmarks (such as the bitwise benchmark used by Stephenson et al.
[2]), we have been able to reduce the bitwidth of the variables by 35%, in
average. In the other hand, in large benchmarks (such as SPEC CPU 2006), the
bitwidth reduction is 8%, in average. This happens because the analysis is
intra-procedural. I guess the other optimizations that LLVM uses suffer from
this limitation too. This would happen with any type of range analysis
algorithm. In order to solve this, one of my lab mates is working on an
inter-procedural version of the analysis but his project is on a very
initial stage though.

Timeline and Testing Methodology

   1. Change the LVI interface, adding a new method to it: getRange(Value
   *V), so that we can not only know if a variable is a constant, but also
   know its range of values whenever it is not a constant.
   2. Change the implementation of the prototype range analysis that LVI
   already uses, so that it will use our implementation.
   3. Implement the sparse conditional constant propagation to use LVI. I
   plan to implement Zadeck’s optimization to deal with ranges instead of
   simple constants. So, instead of being able to eliminate:  if (x != 10)
   {} when we know that x is, say, 11, we could eliminate also branches that
   uses other relational operators, eg, <, <=, >, >=. I believe that it
   would be good for array bounds check elimination in memory safe languages
   such as Java. This, in addition to JumpThreading will be another client that
   will use LVI.
   4. Run experiments on the LLVM test suite to verify that our new
   optimization improves on the current dead branch elimination pass that LLVM
   already uses.
   5. Run experiments using VMKit, to check the impact of dead branch
   elimination on Java programs. We hope to deliver better performance numbers
   in this case because Java, as a memory safe language, is notorious for using
   many checks to ensure that memory is used only in ways that obey the
   contract of the variable types.

Biography

I am currently a third year Computer Science undergraduate student at
the Federal
University of Minas Gerais <http://www.dcc.ufmg.br/dcc/>, Brazil, and I work
as a research assistant at the Programming Languages
Lab<http://www2.dcc.ufmg.br/laboratorios/llp/>(LLP), in that
university.

I believe I am a good candidate to work in the proposed project because I
have already a good knowledge of LLVM. I have implemented range analysis,
and currently it can find the ranges of integer variables used in the
programs. In addition to this, I have been working as a programmer for three
years before joining the LLP as a research assistant and I am experienced
with C++, the language in which LLVM is implemented.

Furthermore, I am a student at a very good university (UFMG). To justify
this statement I would like to point out that the Department of Computer
Science of UFMG got the best
mark<http://www.ufmg.br/online/arquivos/012977.shtml>in the Brazilian
National Undergrad Exam (ENADE).

References

1. A Class of Polynomially Solvable Range Constraints for Interval Analysis
without Widenings and Narrowings, Zhendong Su and David Wagner, In
Theoretical Computer Science, Volume 345 , Issue 1, 2005.

2. Bitwidth Analysis with Application to Silicon Compilation, Mark
Stephenson, Jonathan Babb, and Saman Amarasinghe, In Proceedings of the
SIGPLAN conference on Programming Language Design and Implementation, 2000.

3. Polynomial Precise Interval Analysis Revisited. Thomas Gawlitza, Jérôme
Leroux, Jan Reineke, Helmut Seidl, Grégoire Sutre, Reinhard Wilhelm:
Efficient Algorithms 2009: 422-437

4. Linear Time Range Analysis with Affine Constraints. Douglas do Couto
Teixeira and Fernando Magno Quintao Pereira (
http://homepages.dcc.ufmg.br/~douglas/projects/RangeAnalysis/RangeAnalysis.paper.pdf
)

5. Constant propagation with conditional branches. Mark N. Wegman and F.
Kenneth Zadeck, In ACM Transactions on Programming Languages and Systems
(TOPLAS), 181-210, 1991.

6. Efficient SSI Conversion. André Luiz C. Tavares, Fernando Magno Quintão
Pereira, Mariza A. S. Bigonha and Roberto S. Bigonha. Simpósio Brasileiro de
Linguagens de Programação. 2010.

7. ABCD: eliminating array bounds checks on demand, Rajkslav Bodik, Rajiv
Gupta and Vivek Sarkar, In Proceedings of the SIGPLAN conference on
Programming Language Design and Implementation, 2000.

Contact Info

Name: Douglas do Couto Teixeira
e-mail 1: douglas at dcc dot ufmg dot br
e-mail 2: douglasdocouto at gmail dot com

--
Douglas do Couto Teixeira

On Thu, Mar 24, 2011 at 11:05 AM, Allan Tong <actong88 at gmail.com> wrote:

> On Wed, Mar 23, 2011 at 6:24 PM, John Criswell <criswell at illinois.edu>
> wrote:
> >
> > > On 3/23/11 8:06 AM, Douglas do Couto Teixeira wrote:
> > >
> > > the execution of a program. Thus, for each integer variable, a range
> > > analysis determines its lower and upper limits. A very simple range
> analysis
> > > > would, for instance, map each variable to the limits imposed by its
> type.
> > > That is, an 8-bit unsigned integer variable can be correctly mapped to
> the
> > > interval [0, 255], and an 16-bit signed integer can be mapped to
> [-32767,
> > > 32766].
> >
> > You probably want to be consistent in how numbers are interpreted.  Use
> the
> > unsigned or signed interpretations in both examples above.
>
> Also, if you keep the 16-bit signed example, it should be [-32768, 32767].
>
>  - Allan
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110324/fb644532/attachment.html>