[PATCH] D42717: [JumpThreading] sync DT for LVI analysis (PR 36133)
Brian Rzycki via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Feb 12 16:34:13 PST 2018
brzycki added a comment.
I used `clang -O3 -fno-exceptions -ftime-report -c tramp3d-v4.cpp -o /tmp/foo.o` with an upstream (non-patched) compiler. Here are the hottest areas of pass execution according to the reports. These lines are displayed in the same order:
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
37.7480 ( 9.8%) 0.1080 ( 3.9%) 37.8560 ( 9.7%) 37.8099 ( 9.7%) Global Value Numbering
32.5400 ( 8.4%) 0.3320 ( 11.9%) 32.8720 ( 8.4%) 32.8960 ( 8.5%) Function Integration/Inlining
26.7520 ( 6.9%) 0.2280 ( 8.2%) 26.9800 ( 6.9%) 26.9368 ( 6.9%) X86 DAG->DAG Instruction Selection
25.8080 ( 6.7%) 0.0960 ( 3.4%) 25.9040 ( 6.7%) 25.8395 ( 6.6%) Combine redundant instructions
16.7080 ( 4.3%) 0.1160 ( 4.2%) 16.8240 ( 4.3%) 17.0581 ( 4.4%) Combine redundant instructions
14.7120 ( 3.8%) 0.1000 ( 3.6%) 14.8120 ( 3.8%) 14.8012 ( 3.8%) Combine redundant instructions
13.6320 ( 3.5%) 0.1120 ( 4.0%) 13.7440 ( 3.5%) 13.6522 ( 3.5%) Combine redundant instructions
11.3840 ( 2.9%) 0.0120 ( 0.4%) 11.3960 ( 2.9%) 11.2862 ( 2.9%) Loop Strength Reduction
10.6120 ( 2.7%) 0.0080 ( 0.3%) 10.6200 ( 2.7%) 10.6267 ( 2.7%) Combine redundant instructions
10.0360 ( 2.6%) 0.0320 ( 1.1%) 10.0680 ( 2.6%) 10.1269 ( 2.6%) Combine redundant instructions
9.7280 ( 2.5%) 0.0160 ( 0.6%) 9.7440 ( 2.5%) 9.7972 ( 2.5%) Combine redundant instructions
8.1040 ( 2.1%) 0.0240 ( 0.9%) 8.1280 ( 2.1%) 8.0646 ( 2.1%) Dead Store Elimination
7.6040 ( 2.0%) 0.0480 ( 1.7%) 7.6520 ( 2.0%) 7.6756 ( 2.0%) Value Propagation
7.5520 ( 2.0%) 0.0200 ( 0.7%) 7.5720 ( 1.9%) 7.5224 ( 1.9%) SLP Vectorizer
6.6240 ( 1.7%) 0.0800 ( 2.9%) 6.7040 ( 1.7%) 6.7257 ( 1.7%) Induction Variable Simplification
6.6320 ( 1.7%) 0.0320 ( 1.1%) 6.6640 ( 1.7%) 6.7028 ( 1.7%) Loop Invariant Code Motion
6.5040 ( 1.7%) 0.1000 ( 3.6%) 6.6040 ( 1.7%) 6.5648 ( 1.7%) Combine redundant instructions
6.0480 ( 1.6%) 0.0440 ( 1.6%) 6.0920 ( 1.6%) 6.2509 ( 1.6%) Memory SSA
5.9080 ( 1.5%) 0.0560 ( 2.0%) 5.9640 ( 1.5%) 6.0655 ( 1.6%) Value Propagation
5.8120 ( 1.5%) 0.0160 ( 0.6%) 5.8280 ( 1.5%) 5.9670 ( 1.5%) Early CSE w/ MemorySSA
5.0480 ( 1.3%) 0.0040 ( 0.1%) 5.0520 ( 1.3%) 5.0647 ( 1.3%) Loop Invariant Code Motion
4.3280 ( 1.1%) 0.0120 ( 0.4%) 4.3400 ( 1.1%) 4.3600 ( 1.1%) Loop Invariant Code Motion
4.0080 ( 1.0%) 0.0280 ( 1.0%) 4.0360 ( 1.0%) 4.0010 ( 1.0%) Greedy Register Allocator
3.7480 ( 1.0%) 0.0400 ( 1.4%) 3.7880 ( 1.0%) 3.7836 ( 1.0%) CodeGen Prepare
3.2240 ( 0.8%) 0.0080 ( 0.3%) 3.2320 ( 0.8%) 3.3606 ( 0.9%) Machine Instruction Scheduler
3.2640 ( 0.8%) 0.0280 ( 1.0%) 3.2920 ( 0.8%) 3.3383 ( 0.9%) MemCpy Optimization
3.1040 ( 0.8%) 0.0160 ( 0.6%) 3.1200 ( 0.8%) 3.1925 ( 0.8%) Induction Variable Users
2.5760 ( 0.7%) 0.0160 ( 0.6%) 2.5920 ( 0.7%) 2.6184 ( 0.7%) Unroll loops
2.5720 ( 0.7%) 0.0320 ( 1.1%) 2.6040 ( 0.7%) 2.5598 ( 0.7%) SROA
2.5320 ( 0.7%) 0.0200 ( 0.7%) 2.5520 ( 0.7%) 2.3682 ( 0.6%) Jump Threading
2.3400 ( 0.6%) 0.0040 ( 0.1%) 2.3440 ( 0.6%) 2.3639 ( 0.6%) Jump Threading
2.2880 ( 0.6%) 0.0280 ( 1.0%) 2.3160 ( 0.6%) 2.2819 ( 0.6%) Reassociate expressions
2.1440 ( 0.6%) 0.0120 ( 0.4%) 2.1560 ( 0.6%) 2.1243 ( 0.5%) Loop Load Elimination
1.9400 ( 0.5%) 0.0080 ( 0.3%) 1.9480 ( 0.5%) 2.0576 ( 0.5%) SROA
1.9640 ( 0.5%) 0.0160 ( 0.6%) 1.9800 ( 0.5%) 1.9998 ( 0.5%) Loop Vectorization
1.5960 ( 0.4%) 0.0280 ( 1.0%) 1.6240 ( 0.4%) 1.6763 ( 0.4%) Simplify the CFG
1.5000 ( 0.4%) 0.0160 ( 0.6%) 1.5160 ( 0.4%) 1.6653 ( 0.4%) Simplify the CFG
1.6280 ( 0.4%) 0.0160 ( 0.6%) 1.6440 ( 0.4%) 1.5298 ( 0.4%) Remove redundant instructions
1.5160 ( 0.4%) 0.0160 ( 0.6%) 1.5320 ( 0.4%) 1.4801 ( 0.4%) Module Verifier
1.4280 ( 0.4%) 0.0240 ( 0.9%) 1.4520 ( 0.4%) 1.4439 ( 0.4%) Module Verifier
1.3960 ( 0.4%) 0.0000 ( 0.0%) 1.3960 ( 0.4%) 1.4080 ( 0.4%) Module Verifier
1.4440 ( 0.4%) 0.0080 ( 0.3%) 1.4520 ( 0.4%) 1.3567 ( 0.3%) Simplify the CFG
1.2880 ( 0.3%) 0.0160 ( 0.6%) 1.3040 ( 0.3%) 1.3025 ( 0.3%) Simplify the CFG
1.2040 ( 0.3%) 0.0200 ( 0.7%) 1.2240 ( 0.3%) 1.2963 ( 0.3%) Early CSE
1.0320 ( 0.3%) 0.0120 ( 0.4%) 1.0440 ( 0.3%) 1.1418 ( 0.3%) Sparse Conditional Constant Propagation
and
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
38.5520 ( 9.9%) 0.1040 ( 3.9%) 38.6560 ( 9.9%) 38.7477 ( 9.9%) Global Value Numbering
32.6440 ( 8.4%) 0.2720 ( 10.2%) 32.9160 ( 8.4%) 32.8803 ( 8.4%) Function Integration/Inlining
26.3040 ( 6.8%) 0.2440 ( 9.1%) 26.5480 ( 6.8%) 26.6897 ( 6.8%) X86 DAG->DAG Instruction Selection
26.3120 ( 6.8%) 0.0760 ( 2.8%) 26.3880 ( 6.7%) 26.2355 ( 6.7%) Combine redundant instructions
17.1440 ( 4.4%) 0.0800 ( 3.0%) 17.2240 ( 4.4%) 17.3240 ( 4.4%) Combine redundant instructions
14.9080 ( 3.8%) 0.0480 ( 1.8%) 14.9560 ( 3.8%) 15.0418 ( 3.8%) Combine redundant instructions
13.6120 ( 3.5%) 0.0560 ( 2.1%) 13.6680 ( 3.5%) 13.8665 ( 3.5%) Combine redundant instructions
11.0240 ( 2.8%) 0.0400 ( 1.5%) 11.0640 ( 2.8%) 11.0079 ( 2.8%) Loop Strength Reduction
10.7280 ( 2.8%) 0.0320 ( 1.2%) 10.7600 ( 2.7%) 10.7927 ( 2.7%) Combine redundant instructions
10.2160 ( 2.6%) 0.0400 ( 1.5%) 10.2560 ( 2.6%) 10.2907 ( 2.6%) Combine redundant instructions
9.9440 ( 2.6%) 0.0200 ( 0.7%) 9.9640 ( 2.5%) 9.9617 ( 2.5%) Combine redundant instructions
8.3960 ( 2.2%) 0.0320 ( 1.2%) 8.4280 ( 2.1%) 8.2761 ( 2.1%) Dead Store Elimination
7.7520 ( 2.0%) 0.0440 ( 1.6%) 7.7960 ( 2.0%) 7.8317 ( 2.0%) Value Propagation
7.5040 ( 1.9%) 0.0360 ( 1.3%) 7.5400 ( 1.9%) 7.4545 ( 1.9%) SLP Vectorizer
6.8080 ( 1.7%) 0.0240 ( 0.9%) 6.8320 ( 1.7%) 6.8237 ( 1.7%) Loop Invariant Code Motion
6.7880 ( 1.7%) 0.1160 ( 4.3%) 6.9040 ( 1.8%) 6.7762 ( 1.7%) Combine redundant instructions
6.5240 ( 1.7%) 0.0880 ( 3.3%) 6.6120 ( 1.7%) 6.6196 ( 1.7%) Induction Variable Simplification
6.3320 ( 1.6%) 0.0360 ( 1.3%) 6.3680 ( 1.6%) 6.4497 ( 1.6%) Memory SSA
6.2720 ( 1.6%) 0.0080 ( 0.3%) 6.2800 ( 1.6%) 6.2596 ( 1.6%) Value Propagation
5.9040 ( 1.5%) 0.0480 ( 1.8%) 5.9520 ( 1.5%) 6.0199 ( 1.5%) Early CSE w/ MemorySSA
5.1880 ( 1.3%) 0.0080 ( 0.3%) 5.1960 ( 1.3%) 5.1699 ( 1.3%) Loop Invariant Code Motion
4.4320 ( 1.1%) 0.0040 ( 0.1%) 4.4360 ( 1.1%) 4.4453 ( 1.1%) Loop Invariant Code Motion
4.0040 ( 1.0%) 0.0200 ( 0.7%) 4.0240 ( 1.0%) 4.0277 ( 1.0%) Greedy Register Allocator
3.7680 ( 1.0%) 0.0040 ( 0.1%) 3.7720 ( 1.0%) 3.7186 ( 0.9%) CodeGen Prepare
3.3680 ( 0.9%) 0.0200 ( 0.7%) 3.3880 ( 0.9%) 3.4240 ( 0.9%) MemCpy Optimization
3.3920 ( 0.9%) 0.0120 ( 0.4%) 3.4040 ( 0.9%) 3.3959 ( 0.9%) Machine Instruction Scheduler
3.1280 ( 0.8%) 0.0120 ( 0.4%) 3.1400 ( 0.8%) 3.1442 ( 0.8%) Induction Variable Users
2.5080 ( 0.6%) 0.0360 ( 1.3%) 2.5440 ( 0.6%) 2.5870 ( 0.7%) SROA
2.4840 ( 0.6%) 0.0320 ( 1.2%) 2.5160 ( 0.6%) 2.5860 ( 0.7%) Unroll loops
2.3200 ( 0.6%) 0.0160 ( 0.6%) 2.3360 ( 0.6%) 2.3900 ( 0.6%) Jump Threading
2.3520 ( 0.6%) 0.0200 ( 0.7%) 2.3720 ( 0.6%) 2.3781 ( 0.6%) Jump Threading
2.1680 ( 0.6%) 0.0240 ( 0.9%) 2.1920 ( 0.6%) 2.2602 ( 0.6%) Reassociate expressions
2.1120 ( 0.5%) 0.0160 ( 0.6%) 2.1280 ( 0.5%) 2.0903 ( 0.5%) Loop Load Elimination
2.0600 ( 0.5%) 0.0240 ( 0.9%) 2.0840 ( 0.5%) 2.0447 ( 0.5%) SROA
2.0080 ( 0.5%) 0.0000 ( 0.0%) 2.0080 ( 0.5%) 1.9710 ( 0.5%) Loop Vectorization
1.5600 ( 0.4%) 0.0160 ( 0.6%) 1.5760 ( 0.4%) 1.7095 ( 0.4%) Simplify the CFG
1.6880 ( 0.4%) 0.0200 ( 0.7%) 1.7080 ( 0.4%) 1.6968 ( 0.4%) Simplify the CFG
1.4960 ( 0.4%) 0.0080 ( 0.3%) 1.5040 ( 0.4%) 1.5448 ( 0.4%) Remove redundant instructions
1.4880 ( 0.4%) 0.0200 ( 0.7%) 1.5080 ( 0.4%) 1.4811 ( 0.4%) Module Verifier
1.4680 ( 0.4%) 0.0160 ( 0.6%) 1.4840 ( 0.4%) 1.4391 ( 0.4%) Module Verifier
1.4120 ( 0.4%) 0.0120 ( 0.4%) 1.4240 ( 0.4%) 1.4021 ( 0.4%) Module Verifier
1.3120 ( 0.3%) 0.0040 ( 0.1%) 1.3160 ( 0.3%) 1.3868 ( 0.4%) Simplify the CFG
1.3240 ( 0.3%) 0.0160 ( 0.6%) 1.3400 ( 0.3%) 1.3271 ( 0.3%) Simplify the CFG
1.2680 ( 0.3%) 0.0200 ( 0.7%) 1.2880 ( 0.3%) 1.3067 ( 0.3%) Early CSE
1.1160 ( 0.3%) 0.0160 ( 0.6%) 1.1320 ( 0.3%) 1.1449 ( 0.3%) Sparse Conditional Constant Propagation
There are small differences between their execution times, but none of them to me stand out. The total difference in wall-clock time of these runs is `3.16` seconds.
https://reviews.llvm.org/D42717
More information about the llvm-commits
mailing list