[LLVMdev] Summer of Code

Fri Mar 19 07:12:29 PDT 2010

Dear LLVMers,

    my name is Douglas, and I would like to participate in the Summer of
Code this year. I am currently a Computer Science student at the
Federal University of Minas Gerais, Brazil, and I work as a research
assistant at the Programming Languages Lab, in that university. I work
together with Andre Tavares and Andrei Rimsa, two summer of coders
last year, and my advisor is Fernando Pereira, a Summer of Coder in
2006.
    I would like to work with range analysis. This is a super-set of
bitwidth analysis. The objective is to find an approximation for the
range of integer values that a variable may assume. Bitwidth analysis
was initially part of Andre's Summer of Code, but it was not finished.
I took over Andre's project, and I have already started working on
range analysis. Currently I can find the range of variables used in
simple loops, but there is still a lot to be done. Range analysis
would be useful in a number of scenarios, for instance:

1) To eliminate overflow tests in dynamically compiled languages, such as
java script, where numbers are represented as floating point values. A
optimizing compiler may try to convert these numbers to integers, under
the carpet, but it must keep the overflow tests, in case the values are
larger than 32 bits. This is how the traceMonkey compiler does in
JavaScript, for instance.

2) To reduce the size of variables, in order to keep more of them in
registers. X86, as you know, has some aliased registers, but very few
variables fit in there, as these registers require 8-bit values.
Programmers generally declare variables with bigger types. It would be
nice if an optimizing compiler could infer that the variables are small
enough to fit into the eight-bit registers, so that less variables are
sent to memory.

3) As a more general ABCD algorithm, that tries to remove array bound
checks. Given that we have an approximation of the range of the variables,
then it is possible to remove array bound checks of accesses that we prove
that are always inside the limits of the arrays.

4) In security analyses. For instance, to detect buffer overflows. If we
know that the range of a variable might be greater than the buffer that
the variable ranges over, then a overflow might happen.

Well, there are many other applications. In short, I would like to know if
the LLVM community would be interested in such a project. I have been
working on this for two months, and I am pretty confident I can implement
a good range analysis. I am basing my work on a paper by Zhendong and
Wagner: "A class of polynomially solvable range constraints for interval
analysis without widenings". I have opted for this work, instead of
Stephenson's more well known paper because I found Zhendong's easier to
understand and implement. As I told you, I can already infer correct
ranges on very simple loops, such as:

for (int i = 0; i < 10; i++) {
  printf("%d\n", i);
}

In this case, I can find out that i is in [0, 0] before the loop, and is
in [0, 9] inside, and in [10, +INF] outside. There is, of course, a lot of
work to be done: handling loops limited by variables, nested loops, nested
conditionals, etc, but I think that, with the help of a mentor, plus the
help of the past Summer of Coders in my lab, I can do a good job.

Thanks a lot,

Douglas