[LLVMdev] GSoC - Range Analysis

Sat Mar 31 09:17:06 PDT 2012

On 3/30/12 2:32 PM, Victor Campos wrote:
> > What version of LLVM does your analysis use currently?
>
> We are working with LLVM 3.0 (stable release)
>
> > It sounds like your analysis is fast.  Can you show results on how 
> fast it
> > is on various programs?  Do you have measurements on how much memory it
> > uses?  How large is the largest program you've compiled with it?
>
> Yes, we have a very extensive report about it. Take a look into the
> pdf at 
> http://code.google.com/p/range-analysis/source/browse/trunk/doc/Manuscript/paper.pdf

I took a brief look at your report.  I'm not sure what others in the 
LLVM community think (and I'd like their input), but I suspect that 
10-15 seconds for analyzing GCC means that users would want your 
analysis enabled at optimization levels higher than -O2.

When I asked for memory usage, I meant the amount of RAM used by the 
analysis.  As far as I can tell, your paper only reports the number of 
constraints, so it doesn't really tell me how much RAM the analysis 
uses.  That said, your test machine had about 3 GB of RAM, so I assume 
all your test runs used less than that.  Do you have a more precise (and 
hopefully smaller) upper-bound?

One other thing I've been wondering about is whether your analysis is 
able to find value ranges for the results of load instructions.  Can it 
do that?  If so, I assume it relies on a points-to analysis.

Finally, do you think it would be possible in the future to use 
parallelism to improve the performance of your analysis?

>
> In particular, take a look into fig.10, that shows program size vs
> runtime. We are truly linear on the program size. The largest program
> that we have compiled is SPEC CPU 2006 gcc. It takes less than 10
> seconds to analyze it. This is the largest program that we had here.
> Its constraint graph has almost 1.5 million nodes.

I recommend that your proposal mention that you'll try your analysis on 
larger and more diverse codes (especially non-benchmark codes).  
Programs like Apache, Firefox, MySQL, LLVM, and the FreeBSD kernel come 
to mind.  I'm not sure that the GCC SPEC benchmark is considered a 
monster-sized program these days.

>
> > We in the SAFECode project would be interested in trying your 
> analysis out
> > for eliminating bounds checks.  We just haven't had the time to do 
> that yet
> > (I'm assuming you're the same person that wanted to do Value Range 
> analysis
> > last year).
>
> No, that was Douglas, a friend of mine. He is in Texas nowadays.

Fascinating.  A former student of ours is a postdoc at UT-Austin.  Small 
world.

> Now I
> am responsible for the range analysis project at UFMG. There are two
> other guys working with me on it too. And there are a few users,
> outside our school that use the pass. If you guys are willing to give
> our pass a try, we will be more than happy to help you. Notice that we
> have put some detailed instructions at
> http://code.google.com/p/range-analysis/wiki/HowToUseRangeAnalysisInAnotherPass
> about how to use our range analysis.

I'd like to look into that, but my schedule to date hasn't permitted me 
to do so.  We have a new person, though, that might be able to look into 
that in the coming months.  It seems Raphael Ernani Rodrigues may be 
interested in working on it, too.

>
> > The idea of integrating your pass as a lazy value pass sounds good.  The
> > question I have is what optimizations benefit from LazyInfo?  I only see
> > one or two transforms that use it in LLVM 3.0.
>
> Yes, I do not think there are many optimizations that use LazyInfo so
> far. But maybe, backed by a more aggressive analysis, LazyInfo will be
> more effective. I can implement dead-code elimination based on value
> ranges, and I think it can be quite effective, even more in code that
> is produced out of type-safe languages, such as Java, which is full of
> safety checks.

Regarding your comment about Java, I think you're correct that your 
range analysis would be better at optimizing Java code.  However, LLVM 
really isn't used for Java, so I'm not sure if that's a strong argument 
for your proposal.  To get the best bang for the buck, I think your 
analysis will need to work well in optimizing C/C++/Objective-C[++] code.

One thing that concerns me is that your project currently sounds like a 
small research project: you want to hook your analysis into dead code 
elimination and see how well it works.  I'm not sure if the GSoC people 
are really looking for a project like that; they may want something with 
a higher assurance of a payoff.

One thing you could do to mitigate that problem is to do an experiment 
that *shows* that your analysis has the potential to find more 
optimization opportunities than what LLVM does today.  For example, for 
dead code elimination, you could run your analysis on optimized LLVM 
bitcode and find the percentage of icmp/fcmp instructions that can be 
changed into constants.  If you find a large percentage, then it is more 
likely that your analysis will help dead code elimination or other 
optimizations.

I'd also recommend that you think of other optimizations that could 
benefit from range analysis and at least list them in your proposal as 
optimizations you'll look at if time permits.

-- John T.