[LLVMdev] [Fwd: Optimization: Conclusions from Evolutionary Analysis]

Tue Nov 18 17:12:01 PST 2003

This is a hot topic in the compiler research community, but the focus 
there is on
(a) choosing the right optimization sequences internally and 
transparently, rather than through combinations of options,
(b) performance prediction techniques so you don't actually have to run 
gazillion different choices, and perhaps can even avoid the problem of 
choosing representative inputs, as you talked about, and
(c) developing flexible internal representations so that you can 
dynamically generate appropriate code sequences.

We are actually involved in a project with IBM that aims to look at all 
of these issues.  Needless to say, I'm hoping that they will use LLVM 
as one of the components of the infrastructure for this work :)  They 
do seem interested.

--Vikram
http://www.cs.uiuc.edu/~vadve
http://llvm.cs.uiuc.edu/

On Nov 18, 2003, at 2:18 PM, Reid Spencer wrote:

> I'm cross-posting the message below (from GCC list) because I believe 
> it
> would (at some point) be very beneficial to build an evolutionary
> optimization pass into LLVM. The idea would be to discover the perfect
> set of optimizations for a given program by trying them all and
> analyzing the execution times of each. This would be somewhat like
> profile driven optimization except the profile is easier to gather: its
> the whole program timing. I imagine a "mega pass" could be written that
> tries all other passes in all their combinations to (eventually) arrive
> at the right set for a given program.  There's one big issue with this:
> workload. Unlike profile-driven optimization, which can find parts of
> the program to optimize based on data collected from a run or several
> runs of the program, this evolutionary pass would need to run the exact
> same workload multiple times to discover which optimizations to apply.
> Obtaining a useful/typical workload for a given program will be
> difficult. Its easy for compilers and the like, but not all programs
> work that way (consider optimizing a web server this way). Probably the
> right approach is to punt on the workload issue leaving it to the
> programmer to invoke/drive the program in such a way that it finds a
> consistent workload to execute.
>
> I know its early in LLVM's life for something like this, but I thought
> I'd mention it so it can be tucked away in the "think about this later"
> file.
>
> Reid.
>
> -----Forwarded Message-----
>> From: Scott Robert Ladd <coyote at coyotegulch.com>
>> To: gcc mailing list <gcc at gcc.gnu.org>
>> Subject: Optimization: Conclusions from Evolutionary Analysis
>> Date: Tue, 18 Nov 2003 10:11:55 -0500
>>
>> For those who haven't the time to read my full article, I thought I'd
>> pass along the conclusions section. One of my goals with Acovea was to
>> provide some empirical evidence of how optimizations behave, and to
>> dispell some common misconceptions about optimization.
>>
>> The settings for -O1, -O2, and -O3 are valid. Every option implied by
>> -O3 may not be applicable to every program -- but enabling all those
>> options does not seem to be detrimental, except in the case of
>> huffbench. Several of the recently-added flags (-ftracer in 
>> particular)
>> may be candidates for inclusion in -O2 or -O3 for future versions of 
>> the
>> compiler.
>>
>> Processor-specific options do not appear to be a major factor in
>> performance on these benchmarks. I don't know if this is due to the
>> nature of the processor, or if GCC can't take advantage of
>> processor-specific instructions. I have double-checked my results;
>> adding -mfpmath=sse (or any of its variants, or -msse) to a compile of
>> almabench does not make the code run any faster. The only 
>> ia32-specific
>> option that showed consistent value was -momit-leaf-frame-pointer.
>>
>> The genetic algorithm was able to find sets of flags that produced
>> faster code than that emitted by the default -O1, -O2, and -O3 options
>> (with the exception of almabench, which is largely unoptimizable by
>> nature). In many cases, this was due to the inclusion of "new" flags
>> introduced with more recent version of GCC. Acovea discovers 
>> additional
>> options that improve performance over that provided by the default
>> settings; in essence, the genetic algorithm determines which of the
>> non-standard and processor- specific options are most effective for a
>> given algorithm.
>>
>> A well-designed program will encapsulate algorithms. For most 
>> programs,
>> performance is predicated on specific algorithms that can be 
>> identified
>> via profiling; Acovea can then be applied to those algorithms for
>> finding the set of optimizations that produce the fastest code. Only
>> rarely does an entire program need to be fully optimized; instead,
>> optimization should be applied to specific, critical routines that
>> perform concise tasks.
>>
>> The full article can be found at:
>>
>> http://www.coyotegulch.com/acovea/index.html
>
>
> _______________________
> Reid Spencer
> President & CTO
> eXtensible Systems, Inc.
> rspencer at x10sys.com