[LLVMdev] Landing my new development on the trunk ...
Brian West
bnwest at rice.edu
Thu Nov 11 15:26:09 PST 2010
Evan Cheng <evan.cheng <at> apple.com> writes:
> Eli is right. We do need to see some benchmark numbers and understand
how the
pass will fit in the target
> independent optimizer. While we encourage contribution, we typically
don't
commit new passes unless it
> introduce new functionalities that have active clients. It would also
help if
you provide us with compile
> time numbers.
>
> Evan
Evan,
Let's restart this conversation ...
I propose that I test OSR versus LSR compile time performance via
opt shell.ll -mem2reg -licm -osr -instcombine -stats
-time-passes -disable-output
and
opt shell.ll -mem2reg -licm -loop-reduce -instcombine -stats
-time-passes -disable-output
Caveats which may skew any compile time testing:
1. LST is target dependent and was developed to run closer to code
generation time. Thus, we are testing LSR out of its normal context.
Also as mentioned previously, -instcombine is not normally called after
LSR (though LSR does create opportunities for -instcombine)
2. OSR finds loops in the SSA graph and will find more reduction
opportunities than LSR. Thus, OSR may spend more time doing strength
reduction, since OSR does more work.
3. From Dan Gohman's reply, "LSR's primary goal is to reduce integer
register pressure, with strength reduction being just one of its
available tools." Thus, LSR may be doing more than just strength reduction.
4. It is nontrivial to compare OSR and LSR optimizations in a given
program, especially the larger tests in the test suite where the compile
time differences may be the largest.
These caveats makes it a bit like comparing apples and oranges.
I created a simple test for the shell sort with a triple nested loop. I
hand verified that both OSR and LSR performed the same strength
reductions. Here are the results:
% opt shell.ll -mem2reg -licm -osr -instcombine -stats -time-passes
-disable-output
===-------------------------------------------------------------------------===
... Statistics Collected ...
===-------------------------------------------------------------------------===
5 instcombine - Number of dead inst eliminated
1 instcombine - Number of instructions sunk
12 instcombine - Number of insts combined
5 mem2reg - Number of PHI nodes inserted
8 mem2reg - Number of alloca's promoted
5 mem2reg - Number of alloca's promoted with a single store
9 osr - Number of operators reduced
===-------------------------------------------------------------------------===
... Pass execution timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 0.0058 seconds (0.0058 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall
Time--- --- Name ---
0.0018 ( 34.5%) 0.0001 ( 26.5%) 0.0020 ( 33.8%) 0.0020 (
34.1%) Combine redundant instructions
0.0007 ( 13.2%) 0.0001 ( 25.5%) 0.0008 ( 14.2%) 0.0008 (
14.2%) Promote Memory to Register
0.0007 ( 13.8%) 0.0000 ( 6.6%) 0.0008 ( 13.3%) 0.0008 (
13.2%) Operator Strength Reduction
0.0005 ( 9.2%) 0.0001 ( 16.1%) 0.0006 ( 9.7%) 0.0006 (
9.6%) Dominator Tree Construction
0.0005 ( 8.7%) 0.0000 ( 5.3%) 0.0005 ( 8.4%) 0.0005 (
8.4%) Module Verifier
0.0003 ( 6.3%) 0.0000 ( 4.0%) 0.0004 ( 6.2%) 0.0004 (
6.1%) Loop Invariant Code Motion
0.0003 ( 5.4%) 0.0000 ( 3.6%) 0.0003 ( 5.3%) 0.0003 (
5.3%) Natural Loop Information
0.0002 ( 4.2%) 0.0000 ( 5.3%) 0.0003 ( 4.3%) 0.0003 (
4.3%) Dominance Frontier Construction
0.0002 ( 4.3%) 0.0000 ( 4.9%) 0.0003 ( 4.4%) 0.0002 (
4.3%) Canonicalize natural loops
0.0000 ( 0.2%) 0.0000 ( 1.7%) 0.0000 ( 0.3%) 0.0000 (
0.3%) No Alias Analysis (always returns 'may' alias)
0.0000 ( 0.2%) 0.0000 ( 0.4%) 0.0000 ( 0.2%) 0.0000 (
0.2%) Preliminary module verification
0.0053 (100.0%) 0.0005 (100.0%) 0.0058 (100.0%) 0.0058
(100.0%) Total
% opt shell.ll -mem2reg -licm -loop-reduce -instcombine -stats
-time-passes -disable-output
===-------------------------------------------------------------------------===
... Statistics Collected ...
===-------------------------------------------------------------------------===
7 instcombine - Number of dead inst eliminated
2 instcombine - Number of instructions sunk
19 instcombine - Number of insts combined
5 mem2reg - Number of PHI nodes inserted
8 mem2reg - Number of alloca's promoted
5 mem2reg - Number of alloca's promoted with a single store
2 scalar-evolution - Number of loops with predictable loop counts
2 scalar-evolution - Number of loops without predictable loop counts
===-------------------------------------------------------------------------===
... Pass execution timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 0.0114 seconds (0.0114 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall
Time--- --- Name ---
0.0050 ( 46.0%) 0.0001 ( 26.6%) 0.0051 ( 45.1%) 0.0051 (
45.1%) Loop Strength Reduction
0.0020 ( 18.3%) 0.0001 ( 17.2%) 0.0021 ( 18.3%) 0.0021 (
18.3%) Combine redundant instructions
0.0013 ( 12.0%) 0.0001 ( 12.0%) 0.0014 ( 12.0%) 0.0014 (
12.0%) Induction Variable Users
0.0005 ( 4.5%) 0.0001 ( 14.8%) 0.0006 ( 5.0%) 0.0006 (
5.0%) Promote Memory to Register
0.0005 ( 4.8%) 0.0000 ( 5.6%) 0.0005 ( 4.8%) 0.0005 (
4.8%) Natural Loop Information
0.0003 ( 3.0%) 0.0000 ( 2.6%) 0.0003 ( 3.0%) 0.0003 (
3.0%) Loop Invariant Code Motion
0.0003 ( 2.4%) 0.0000 ( 3.6%) 0.0003 ( 2.5%) 0.0003 (
2.5%) Module Verifier
0.0003 ( 2.4%) 0.0000 ( 1.6%) 0.0003 ( 2.4%) 0.0003 (
2.4%) Induction Variable Users
0.0002 ( 2.1%) 0.0000 ( 2.6%) 0.0002 ( 2.1%) 0.0002 (
2.2%) Canonicalize natural loops
0.0002 ( 1.7%) 0.0000 ( 7.2%) 0.0002 ( 1.9%) 0.0002 (
1.9%) Dominator Tree Construction
0.0002 ( 1.6%) 0.0000 ( 1.4%) 0.0002 ( 1.6%) 0.0002 (
1.6%) Canonicalize natural loops
0.0001 ( 0.8%) 0.0000 ( 1.8%) 0.0001 ( 0.9%) 0.0001 (
0.9%) Dominance Frontier Construction
0.0000 ( 0.3%) 0.0000 ( 1.2%) 0.0000 ( 0.3%) 0.0000 (
0.3%) Scalar Evolution Analysis
0.0000 ( 0.1%) 0.0000 ( 1.6%) 0.0000 ( 0.2%) 0.0000 (
0.2%) No Alias Analysis (always returns 'may' alias)
0.0000 ( 0.0%) 0.0000 ( 0.2%) 0.0000 ( 0.1%) 0.0000 (
0.0%) Preliminary module verification
0.0109 (100.0%) 0.0005 (100.0%) 0.0114 (100.0%) 0.0114
(100.0%) Total
If you think that the above is a fair methodology for comparing OSR and
LSR compile times, I can run additional tests. Which tests are an open
question.
Brian
More information about the llvm-dev
mailing list