[LLVMdev] Range Analysis GSoC 2011 Proposal
Douglas do Couto Teixeira
douglasdocouto at gmail.com
Thu Mar 24 04:39:38 PDT 2011
On Wed, Mar 23, 2011 at 7:24 PM, John Criswell <criswell at illinois.edu>wrote:
> Dear Douglas,
>
> Comments below.
>
>
>
Dear John,
I'm planning to send a revised version of the proposal to the list today,
but before that let me try answer some of your questions.
> On 3/23/11 8:06 AM, Douglas do Couto Teixeira wrote:
>
> Dear LLVM community,
>
> I would like to contribute to LLVM in the Google Summer of Code project. My
> proposal is listed below. Please let me know your comments.
>
>
> Adding Range Analysis to LLVM
>
> Abstract
>
> The objective of this work is patch our implementation of range analysis
> into LLVM. I have a running implementation of range
>
>
> Say "working implementing" instead of "running implementation."
>
>
> analysis in LLVM, but it is not currently part of the main distribution. I
> propose to integrate our range analysis implementation into the Lazy Value
> Info (LVI) interface that LLVM provides. Range analysis finds the intervals
> of values that may be bound to the integer variables during the execution of
> programs and is useful in several scenarios: constant propagation, detection
> of potential buffer overflow attacks, dead branch elimination, array bound
> check elimination, elimination of overflow tests in scripting languages such
> as JavaScript and Lua, etc.
>
>
> Objective
>
> The objective of this project is to augment LLVM with range analysis. We
> will do this integration by patching the current implementation of range
> analysis that we have onto the Lazy Value Info (LVI) interface that LLVM
> already provides. In addition, we will develop new optimizations using LVI.
> In particular, we will provide a pass that performs conditional constant
> propagation [5], and elimination of dead-branches.
>
> Criteria of Success
>
> - To improve substantially the precision of the current implementation
> of LVI. Currently, LVI’s interface only allows a client to know if a
> variable contains a constant. We want to allow LVI to report that a variable
> either contains a constant, or a known-range.
> - To improve the current implementation of constant propagation that
> LLVM uses. We hope to obtain a small performance gain on the C benchmarks in
> the LLVM test suite, and a larger gain on Java programs that are compiled
> using VMKit.
> - To improve the implementation of JumpThreading, in such a way that
> more dead-branches will be eliminated. Again, we hope to achieve a small
> speed-up on the C benchmarks, and a larger speed-up on the Java benchmarks.
>
>
>
> Background
>
> Range Analysis is a technique that maps integer variables to the possible
> ranges of values that they may assume through out
>
>
> throughout is a single word in this instance.
>
>
> the execution of a program. Thus, for each integer variable, a range
> analysis determines its lower and upper limits. A very simple range analysis
> would, for instance, map each variable to the limits imposed by its type.
> That is, an 8-bit unsigned integer variable can be correctly mapped to the
> interval [0, 255], and an 16-bit signed integer can be mapped to [-32767,
> 32766].
>
>
> You probably want to be consistent in how numbers are interpreted. Use the
> unsigned or signed interpretations in both examples above.
>
>
> However, the precision of this analysis can greatly be improved from
> information inferred from the program text.
>
> Ideally this range should be as constrained as possible, so that an
> optimizing compiler could learn more information about each variable.
> However, the range analysis must be conservative, that is, it will only
> constraint the range of a variable if it can
>
>
> "constrain the range"
>
>
> prove that it is safe to do so. As an example, consider the program:
>
> i = read();
> if (i < 10) {
> print (i + 1);
> else {
> print(i - 1);
> }
>
> In this program we know, from the conditional test, that the value of ‘i’
> in the true side of the branch is in the range [-INF, 9], and in the false
> side is in the range [10, +INF].
>
> During the Summer of Code 2010 I have designed and implemented, under the
> orientation of Duncan Sands, a non-iterative
>
>
> Remove the word "have" in the sentence above.
>
>
> range analysis algorithm. Our implementation is currently fully functional,
> been able to analyze the whole LLVM test suite. For more details, see [4].
> However, this implementation has never been integrated into the LLVM main
> trunc, for two reasons:
>
> 1. We use an intermediate representation called extended static
> assignment form [6], which the LLVM contributors were reluctant to use;
> 2. During the SoC 2010 we did not have time to completely finish our
> implementation, and runtime numbers were available only by the end of 2010.
> 3. There was not really an infra-structure already in place in LLVM to
> take benefit of our analysis.
>
>
> I think your text is a little unclear about point #3. From your text
> below, it sounds like LLVM lacks optimizations that utilize value range
> analysis. If that is the case, then you should state below (like you do in
> the beginning of your proposal) what optimizations you'll write.
>
>
I meant: There was not really an infra-structure already in place in LLVM to
take benefit of our analysis. There were not clients for this pass, and not
a common interface that those clients could use. Now, LLVM provides the LVI
interface, and there are some clients that use it: JumpThreading, etc.
> As an aside, we'd be interested in using trying out value-range analysis in
> SAFECode (http://safecode.cs.illinois.edu) for use in static array bounds
> checking. Creating a static array bounds checking pass for SAFECode using
> your analysis would probably be trivial. The only difficulty is that
> SAFECode is currently built using LLVM 2.7, and I'm guessing that you're
> using a newer version of LLVM.
>
>
>
No, I'm using LLVM 2.7 too. That's because the pass that produces e-SSA form
was not ported to the newer LLVM versions.
>
>
>
>
>
> In order to address the first item, we propose to integrate the
> intermediate representation directly into our analysis, yet, as a module
> that can be used in separate by other clients, if necessary.
>
>
> This part isn't completely clear to me. It sounds like what you're
> suggesting is to write one analysis pass that internally constructs e-SSA
> form and a second analysis that actually does the value-range analysis. It
> also sounds like you're writing the value-range analysis so that it can be
> used with and without e-SSA form. Is this correct, or have I misinterpreted
> what you're saying?
>
> Either way, I think the text above and the paragraph below about e-SSA form
> could be made a little more clear.
>
>
>
Yes, we use e-SSA form to gain precision, but it is not a requirement.
Compare, for instance, Figures 1, 4 and 5 in our report (
http://homepages.dcc.ufmg.br/~douglas/projects/RangeAnalysis/RangeAnalysis.paper.pdf).
E-SSA increases a lot the precision of the analysis, but we still can work
without it
> We are splitting the live ranges of variables using single-arity
> phi-functions, which are automatically handled by the SSA elimination pass
> that LLVM already includes. This live range splitting is only necessary for
> greater precision. We can do our live range analysis without it, although
> the results are less precise. A previous Summer of Code, authored by Andre
> Tavares, has shown that the e-SSA form increases the number of phi-functions
> in the program code by less than 10%, and it is very fast to build [6].
>
> The second item of our list of hindrances is no longer a problem. Our
> implementation is ready for use. We have been able to analyze the whole LLVM
> test suite, plus SPEC CPU 2006 - over 4 million LLVM bytecoes - in 44
> seconds on a 2.4GHz machine. We obtain non-trivial bit size reductions for
> the small benchmarks, having results that match those found by previous,
> non-conservative works, such as Stephenson’s et al’s [2].
>
>
> What kind of improvements do you see in large benchmarks? If small
> benchmarks yield good results and large benchmarks yield poor results, then
> your analysis probably needs more work to be useful for real-world programs.
>
>
>
>
This happens because the analysis is intra-procedural. I guess the other
optimizations that LLVM uses suffer from this limitation too. For SPEC CPU
2006 we have been able to reduce the bitwidth of the variables by 8%. This
would happen with any type of range analysis algorithm. Every time we have a
function, like:
foo(int n) {...}
We must assume that [-inf, +inf] \subseteq n. One of my lab mates is working
on an inter-procedural version of the analysis. His project is on a very
initial stage though.
> Moreover, our implementation is based on a very modern algorithm, by Su
> and Wagner [1], augmented with Gawlitza’s technique to handle cycles [3]. We
> believe that this is the fastest implementation of such an analysis.
>
> Finally, the third item is also no longer a problem. Presently LLVM offers
> the Lazy Value Info interface that reports when variables are constants. The
> current LVI implementation also provides infra-structure to deal with ranges
> of integer intervals. However, it does not contain a fully functional
> implementation yet, an omission that we hope to fix with this project.
>
>
> Again, you need to be clear about whether any optimizations use this for
> anything beyond seeing if a value is constant, and if not, describe what
> optimizations you plan to write to fix that.
>
>
>
I think the big client would be dead-code elimination:
if (x > 10) {...}
If we know that x is [0, 10], for instance, then this branch is dead. This
kind of optimization is stronger than Zadeck's conditional constant
propagation, and I believe that it would be good for array-bounds check
elimination in memory safe languages such as Java.
>
> Timeline and Testing Methodology
>
> 1. Change the LVI interface, adding a new method to it: getRange(Value
> *V), so that we can, not only know if a variable is
>
>
> No comma after can.
>
>
>
> 1. a constant, but also know its range of values whenever it is not a
> constant.
> 2. Change the implementation of the prototype range analysis that LVI
> already uses, so that it will use our implementation.
> 3. Implement the sparse conditional constant propagation to use LVI.
> This, in addition to JumpThreading will be another client that will use LVI.
>
>
> Is there an advantage to using your analysis for conditional constant
> propagation? I see the value in range analysis, but I don't see what your
> algorithm adds above what is already there. Please specify.
>
>
>
Yes! I think I was not clear here. The main contribution, again, would be
able to implement Zadeck's optimization with ranges, instead of simple
constants. So, instead of being able to eliminate:
if (x != 10) {...}
when we know that x is, say, 11, we could eliminate also branches that use
other relational operators, e.g, <, <=, >, >=, !=.
>
> 1. Run experiments on the LLVM test suite to verify that our new
> optimization improves on the current dead branch elimination pass that LLVM
> already uses.
> 2. Run experiments using VMKit, to check the impact of dead branch
> elimination on Java programs. We hope to deliver better performance numbers
> in this case because Java, as a memory safe language, is notorious for using
> many checks to ensure that memory is used only in ways that obey the
> contract of the variable types.
>
>
> You may want to check that VMKit is working with the version of LLVM that
> you're using.
>
>
>
> Biograph
>
>
> Biography
>
>
>
> I am currently a third year Computer Science student at the Federal
> University of Minas Gerais <http://www.dcc.ufmg.br/dcc/>, Brazil, and I
> work as a research assistant at the Programming Languages Lab<http://www2.dcc.ufmg.br/laboratorios/llp/>(LLP), in that university.
>
>
> You might want to state more explicitly whether you're an undergraduate or
> graduate student. I'm guessing your an undergraduate but am not sure.
>
>
>
I'm an undergraduate student.
>
> I believe I am a good candidate to work in the proposed project because I
> have already a good knowledge of LLVM. I have
>
>
> "on the proposed project" and "because I am already knowledgeable about
> LLVM"
>
>
>
> implemented range analysis, and currently it can find the ranges of integer
> variables used in the programs. In addition to this, I have been working as
> a programmer for three years, before joining the LLP as a research
> assistant, and I am experienced
>
>
> I think you want to say that you worked as a programmer for three years
> before joining LLP.
>
>
> with C++, the language in which LLVM is implemented. Furthermore, I am a
> student at a very good university (UFMG). To ground this statement I would
> like to point that the Department of Computer Science of UFMG got the best
> mark <http://www.ufmg.br/online/arquivos/012977.shtml> in the
>
>
> "To justify" instead of "to ground" and "point out" instead of just "point"
>
>
> Brazilian National Undergrad Exam (ENADE). Finally, I work on a lab in
> which three other students work with LLVM. We had
>
>
> work in a lab
>
>
> three Summer of Codes on LLVM in the past, and a number of papers have been
> published out of these experiences.
>
>
> I think you should just point out the SoC's that you've been involved in.
> The ones from your lab mates seems less relevant to me.
>
> All in all, I think your wrote a good proposal. You may want to send a
> revised version to the list; others may want to comment on the things that I
> think need to be clarified.
>
> BTW, are you looking for a mentor, or has Duncan volunteered for this year
> again?
>
>
Well, I made contact with Duncan some weeks ago but he doesn't officially
volunteered to be my mentor this year. So, yes, I'm looking for a mentor.
With best wishes,
Douglas
Good luck!
>
> -- John T.
>
>
>
> References
>
> 1. A Class of Polynomially Solvable Range Constraints for Interval
> Analysis without Widenings and Narrowings, Zhendong Su and David Wagner,
> In Theoretical Computer Science, Volume 345 , Issue 1, 2005.
>
> 2. Bitwidth Analysis with Application to Silicon Compilation, Mark
> Stephenson, Jonathan Babb, and Saman Amarasinghe, In Proceedings of the
> SIGPLAN conference on Programming Language Design and Implementation, 2000.
>
> 3. Polynomial Precise Interval Analysis Revisited. Thomas Gawlitza, Jérôme
> Leroux, Jan Reineke, Helmut Seidl, Grégoire Sutre, Reinhard Wilhelm:
> Efficient Algorithms 2009: 422-437
>
> 4. Linear Time Range Analysis with Affine Constraints. Douglas do Couto
> Teixeira and Fernando Magno Quintao Pereira (
> http://homepages.dcc.ufmg.br/~douglas/projects/RangeAnalysis/RangeAnalysis.paper.pdf
> )
>
> 5. Constant propagation with conditional branches. Mark N. Wegman and F.
> Kenneth Zadeck, In ACM Transactions on Programming Languages and Systems
> (TOPLAS), 181-210, 1991.
>
> 6. Efficient SSI Conversion. André Luiz C. Tavares, Fernando Magno Quintão
> Pereira, Mariza A. S. Bigonha and Roberto S. Bigonha. Simpósio Brasileiro de
> Linguagens de Programação. 2010.
>
> 7. ABCD: eliminating array bounds checks on demand, Rajkslav Bodik, Rajiv
> Gupta and Vivek Sarkar, In Proceedings of the SIGPLAN conference on
> Programming Language Design and Implementation, 2000.
>
> Contact Info
>
> Name: Douglas do Couto Teixeira
> e-mail 1: douglas at dcc dot ufmg dot br
> e-mail 2: douglasdocouto at gmail dot com
>
>
>
> --
> Douglas do Couto Teixeira
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110324/c71ba915/attachment.html>
More information about the llvm-dev
mailing list