[LLVMdev] Phase Interactions

Sun Jun 19 06:08:07 PDT 2011

On 19 June 2011 14:44, Suresh Purini <suresh.purini at gmail.com> wrote:
>  I am doing few experiments to do understand optimization phase
> interactions. Here is a brief description of my experiements.
>
> 1. I picked the list of machine independent optimizations acting on
> llvm IR (those that are enabled at O3).
> 2.  for each optimzation in the optimization-list
>             a) Compiled the program using 'clang -c O0 -flto program.c'
>             b) opt -optimization program.o -o optprogram.o
>             c) llc optprogram.o
>             d) gcc optprogram.o.s
>             e) Measure the performance of the generated executable.
> 3. for each optimization pair [opt1, opt2] from the optimization list
>             a) Compiled the program using 'clang -c O0 -flto program.c'
>             b) opt -opt1 -opt2 program.o -o optprogram.o
>             c) llc optprogram.o
>             d) gcc optprogram.o.s
>             e) Measure the performance of the generated executable.
>
> My intention is understand or model phase interactions by observing
> this data and the corresponding program's static/dynamic features.
> However I couldn't glean much information from this data as almost in
> all cases there is no change in the runtime when compared to O0 except
> for few programs where gvn and loop-rotate improved the program
> performance to some extent. But the 'scalarrepl' optimization is an
> exception because it almost consistently improved the program
> performance and in fact it almost matches the O3 level performance of
> the program.
>
> Can some one enlighten about what is happening? Is there any thing
> wrong in my experimental setup?

In short: it doesn't really make sense to run most of the
optimizations before running -scalarrepl (or -mem2reg), and it makes
even less sense to leave it out entirely.

Almost all optimizations assume that -scalarrepl (or the less
aggressive -mem2reg) has been run first. If neither has been run, then
all variables are still stored on the stack, meaning they have to be
loaded before each use and stored when they change. That makes it hard
for other optimizations to see what's really happening because they
usually consider every load to be a separate value.

A better setup might be to always run -scalarrepl (or mem2reg) before
2b/3b. Running it in a separate opt invocation allows you to to save
some cycles by pre-calculating it once.