[PATCH] D11304: [NVPTX] run LSR before straight-line optimizations
Jingyue Wu
jingyue at google.com
Fri Jul 17 20:55:20 PDT 2015
jingyue added a comment.
Below is the compilation time breakdown of running "opt -O3 and llc" on one of our GPU program. It leads to >100k lines of PTX.
This extra GVN takes 2.6% of the time. There are three GVNs in the list. The first one (4.8%) happens in the target-independent stage. The other two happen in NVPTX's private pipeline.
I'll add a check to enable it for -O3 only.
===-------------------------------------------------------------------------===
... Pass execution timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 8.3537 seconds (8.3467 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.9868 ( 11.9%) 0.0161 ( 32.1%) 1.0029 ( 12.0%) 1.0048 ( 12.0%) NVPTX DAG->DAG Pattern Instruction Selection
0.9987 ( 12.0%) 0.0000 ( 0.0%) 0.9987 ( 12.0%) 0.9995 ( 12.0%) Straight line strength reduction
0.4514 ( 5.4%) 0.0000 ( 0.0%) 0.4514 ( 5.4%) 0.4490 ( 5.4%) Function Integration/Inlining
0.4348 ( 5.2%) 0.0000 ( 0.0%) 0.4348 ( 5.2%) 0.4354 ( 5.2%) Nary reassociation
0.4033 ( 4.9%) 0.0000 ( 0.0%) 0.4033 ( 4.8%) 0.4002 ( 4.8%) Global Value Numbering
0.2823 ( 3.4%) 0.0001 ( 0.1%) 0.2824 ( 3.4%) 0.2780 ( 3.3%) Combine redundant instructions
0.2696 ( 3.2%) 0.0002 ( 0.3%) 0.2697 ( 3.2%) 0.2647 ( 3.2%) Combine redundant instructions
0.2423 ( 2.9%) 0.0000 ( 0.0%) 0.2423 ( 2.9%) 0.2387 ( 2.9%) Combine redundant instructions
0.2328 ( 2.8%) 0.0000 ( 0.1%) 0.2328 ( 2.8%) 0.2291 ( 2.7%) Combine redundant instructions
0.2232 ( 2.7%) 0.0000 ( 0.0%) 0.2232 ( 2.7%) 0.2223 ( 2.7%) Global Value Numbering
0.2123 ( 2.6%) 0.0001 ( 0.1%) 0.2124 ( 2.5%) 0.2161 ( 2.6%) Global Value Numbering
0.2000 ( 2.4%) 0.0002 ( 0.4%) 0.2001 ( 2.4%) 0.1944 ( 2.3%) Loop Invariant Code Motion
0.1929 ( 2.3%) 0.0001 ( 0.3%) 0.1931 ( 2.3%) 0.1927 ( 2.3%) Combine redundant instructions
0.1928 ( 2.3%) 0.0000 ( 0.0%) 0.1928 ( 2.3%) 0.1924 ( 2.3%) Combine redundant instructions
0.1907 ( 2.3%) 0.0000 ( 0.0%) 0.1907 ( 2.3%) 0.1919 ( 2.3%) Value Propagation
0.1759 ( 2.1%) 0.0006 ( 1.1%) 0.1764 ( 2.1%) 0.1715 ( 2.1%) Induction Variable Simplification
0.1735 ( 2.1%) 0.0001 ( 0.1%) 0.1736 ( 2.1%) 0.1714 ( 2.1%) Loop Invariant Code Motion
0.1671 ( 2.0%) 0.0010 ( 2.0%) 0.1682 ( 2.0%) 0.1714 ( 2.1%) Combine redundant instructions
0.1426 ( 1.7%) 0.0000 ( 0.0%) 0.1426 ( 1.7%) 0.1415 ( 1.7%) Loop Invariant Code Motion
0.1302 ( 1.6%) 0.0001 ( 0.2%) 0.1304 ( 1.6%) 0.1302 ( 1.6%) Loop Strength Reduction
0.1226 ( 1.5%) 0.0000 ( 0.0%) 0.1226 ( 1.5%) 0.1287 ( 1.5%) Unroll loops
0.1248 ( 1.5%) 0.0002 ( 0.4%) 0.1251 ( 1.5%) 0.1269 ( 1.5%) SROA
0.0994 ( 1.2%) 0.0000 ( 0.0%) 0.0994 ( 1.2%) 0.0988 ( 1.2%) Value Propagation
0.0991 ( 1.2%) 0.0000 ( 0.0%) 0.0991 ( 1.2%) 0.0979 ( 1.2%) Combine redundant instructions
0.0748 ( 0.9%) 0.0039 ( 7.8%) 0.0787 ( 0.9%) 0.0752 ( 0.9%) Simple Register Coalescing
0.0659 ( 0.8%) 0.0000 ( 0.0%) 0.0659 ( 0.8%) 0.0664 ( 0.8%) Induction Variable Users
0.0659 ( 0.8%) 0.0038 ( 7.6%) 0.0697 ( 0.8%) 0.0633 ( 0.8%) Early CSE
0.0639 ( 0.8%) 0.0000 ( 0.0%) 0.0639 ( 0.8%) 0.0631 ( 0.8%) Sparse Conditional Constant Propagation
0.0632 ( 0.8%) 0.0000 ( 0.1%) 0.0632 ( 0.8%) 0.0625 ( 0.7%) Early CSE
0.0579 ( 0.7%) 0.0000 ( 0.0%) 0.0579 ( 0.7%) 0.0580 ( 0.7%) NVPTX Assembly Printer
0.0565 ( 0.7%) 0.0000 ( 0.0%) 0.0565 ( 0.7%) 0.0566 ( 0.7%) CodeGen Prepare
0.0564 ( 0.7%) 0.0000 ( 0.0%) 0.0564 ( 0.7%) 0.0553 ( 0.7%) Live Interval Analysis
0.0574 ( 0.7%) 0.0001 ( 0.2%) 0.0575 ( 0.7%) 0.0530 ( 0.6%) Early CSE
0.0506 ( 0.6%) 0.0000 ( 0.1%) 0.0506 ( 0.6%) 0.0468 ( 0.6%) Dead Store Elimination
0.0437 ( 0.5%) 0.0000 ( 0.0%) 0.0437 ( 0.5%) 0.0436 ( 0.5%) Dead Code Elimination
0.0438 ( 0.5%) 0.0002 ( 0.3%) 0.0439 ( 0.5%) 0.0424 ( 0.5%) Bit-Tracking Dead Code Elimination
0.0428 ( 0.5%) 0.0000 ( 0.0%) 0.0428 ( 0.5%) 0.0419 ( 0.5%) Machine Loop Invariant Code Motion
0.0415 ( 0.5%) 0.0000 ( 0.0%) 0.0415 ( 0.5%) 0.0405 ( 0.5%) Module Verifier
0.0413 ( 0.5%) 0.0000 ( 0.0%) 0.0413 ( 0.5%) 0.0401 ( 0.5%) SROA
0.0367 ( 0.4%) 0.0000 ( 0.0%) 0.0367 ( 0.4%) 0.0373 ( 0.4%) Machine Common Subexpression Elimination
0.0352 ( 0.4%) 0.0000 ( 0.0%) 0.0352 ( 0.4%) 0.0351 ( 0.4%) Interprocedural Sparse Conditional Constant Propagation
0.0339 ( 0.4%) 0.0000 ( 0.0%) 0.0339 ( 0.4%) 0.0342 ( 0.4%) Module Verifier
0.0349 ( 0.4%) 0.0001 ( 0.2%) 0.0350 ( 0.4%) 0.0318 ( 0.4%) Simplify the CFG
0.0304 ( 0.4%) 0.0000 ( 0.0%) 0.0304 ( 0.4%) 0.0299 ( 0.4%) Live Variable Analysis
0.0278 ( 0.3%) 0.0000 ( 0.0%) 0.0278 ( 0.3%) 0.0274 ( 0.3%) Reassociate expressions
0.0263 ( 0.3%) 0.0039 ( 7.7%) 0.0301 ( 0.4%) 0.0260 ( 0.3%) Aggressive Dead Code Elimination
0.0215 ( 0.3%) 0.0000 ( 0.0%) 0.0215 ( 0.3%) 0.0228 ( 0.3%) Jump Threading
0.0211 ( 0.3%) 0.0000 ( 0.0%) 0.0211 ( 0.3%) 0.0205 ( 0.2%) Split GEPs to a variadic base and a constant offset for better CSE
0.0193 ( 0.2%) 0.0000 ( 0.0%) 0.0193 ( 0.2%) 0.0200 ( 0.2%) Simplify the CFG
0.0131 ( 0.2%) 0.0038 ( 7.6%) 0.0169 ( 0.2%) 0.0175 ( 0.2%) Unnamed pass: implement Pass::getPassName()
0.0169 ( 0.2%) 0.0000 ( 0.0%) 0.0170 ( 0.2%) 0.0174 ( 0.2%) Rotate Loops
0.0166 ( 0.2%) 0.0000 ( 0.0%) 0.0166 ( 0.2%) 0.0170 ( 0.2%) convert address space of alloca'ed memory to local
0.0148 ( 0.2%) 0.0039 ( 7.8%) 0.0188 ( 0.2%) 0.0168 ( 0.2%) Lower aggregate copies/intrinsics into loops
0.0168 ( 0.2%) 0.0000 ( 0.0%) 0.0168 ( 0.2%) 0.0164 ( 0.2%) Machine code sinking
0.0171 ( 0.2%) 0.0000 ( 0.0%) 0.0171 ( 0.2%) 0.0160 ( 0.2%) Unroll loops
0.0156 ( 0.2%) 0.0000 ( 0.0%) 0.0156 ( 0.2%) 0.0160 ( 0.2%) Simplify the CFG
0.0107 ( 0.1%) 0.0000 ( 0.0%) 0.0107 ( 0.1%) 0.0154 ( 0.2%) Recognize loop idioms
0.0141 ( 0.2%) 0.0000 ( 0.0%) 0.0141 ( 0.2%) 0.0141 ( 0.2%) Eliminate PHI nodes for register allocation
0.0127 ( 0.2%) 0.0000 ( 0.1%) 0.0128 ( 0.2%) 0.0127 ( 0.2%) Unnamed pass: implement Pass::getPassName()
0.0099 ( 0.1%) 0.0000 ( 0.0%) 0.0099 ( 0.1%) 0.0111 ( 0.1%) Jump Threading
0.0111 ( 0.1%) 0.0000 ( 0.0%) 0.0111 ( 0.1%) 0.0111 ( 0.1%) Remove unused exception handling info
0.0105 ( 0.1%) 0.0000 ( 0.0%) 0.0106 ( 0.1%) 0.0107 ( 0.1%) Simplify the CFG
0.0101 ( 0.1%) 0.0000 ( 0.0%) 0.0101 ( 0.1%) 0.0102 ( 0.1%) Simplify the CFG
0.0095 ( 0.1%) 0.0000 ( 0.0%) 0.0095 ( 0.1%) 0.0098 ( 0.1%) Float to int
0.0066 ( 0.1%) 0.0000 ( 0.0%) 0.0066 ( 0.1%) 0.0095 ( 0.1%) Tail Call Elimination
0.0094 ( 0.1%) 0.0000 ( 0.0%) 0.0094 ( 0.1%) 0.0094 ( 0.1%) Dead Global Elimination
0.0051 ( 0.1%) 0.0001 ( 0.2%) 0.0052 ( 0.1%) 0.0088 ( 0.1%) Simplify the CFG
0.0060 ( 0.1%) 0.0000 ( 0.0%) 0.0060 ( 0.1%) 0.0084 ( 0.1%) Loop-Closed SSA Form Pass
0.0079 ( 0.1%) 0.0000 ( 0.0%) 0.0079 ( 0.1%) 0.0082 ( 0.1%) Promote 'by reference' arguments to scalars
0.0074 ( 0.1%) 0.0000 ( 0.0%) 0.0074 ( 0.1%) 0.0074 ( 0.1%) Two-Address instruction pass
0.0068 ( 0.1%) 0.0000 ( 0.0%) 0.0068 ( 0.1%) 0.0073 ( 0.1%) Unswitch loops
0.0033 ( 0.0%) 0.0001 ( 0.1%) 0.0034 ( 0.0%) 0.0072 ( 0.1%) Dominator Tree Construction
0.0031 ( 0.0%) 0.0000 ( 0.0%) 0.0031 ( 0.0%) 0.0068 ( 0.1%) Deduce function attributes
0.0048 ( 0.1%) 0.0000 ( 0.0%) 0.0048 ( 0.1%) 0.0063 ( 0.1%) Lazy Value Information Analysis
0.0040 ( 0.0%) 0.0000 ( 0.0%) 0.0040 ( 0.0%) 0.0057 ( 0.1%) MemCpy Optimization
0.0062 ( 0.1%) 0.0000 ( 0.0%) 0.0062 ( 0.1%) 0.0055 ( 0.1%) Remove unnecessary non-generic-to-generic addrspacecasts
0.0049 ( 0.1%) 0.0000 ( 0.0%) 0.0049 ( 0.1%) 0.0052 ( 0.1%) SROA
0.0054 ( 0.1%) 0.0000 ( 0.0%) 0.0054 ( 0.1%) 0.0052 ( 0.1%) Peephole Optimizations
0.0049 ( 0.1%) 0.0038 ( 7.6%) 0.0088 ( 0.1%) 0.0050 ( 0.1%) Loop-Closed SSA Form Pass
0.0042 ( 0.1%) 0.0000 ( 0.0%) 0.0042 ( 0.1%) 0.0050 ( 0.1%) Slot index numbering
0.0046 ( 0.1%) 0.0000 ( 0.0%) 0.0046 ( 0.1%) 0.0048 ( 0.1%) CallGraph Construction
0.0043 ( 0.1%) 0.0000 ( 0.0%) 0.0043 ( 0.1%) 0.0047 ( 0.1%) Slot index numbering
0.0025 ( 0.0%) 0.0000 ( 0.0%) 0.0025 ( 0.0%) 0.0046 ( 0.1%) Dominator Tree Construction
0.0050 ( 0.1%) 0.0000 ( 0.0%) 0.0050 ( 0.1%) 0.0045 ( 0.1%) Dead Argument Elimination
0.0049 ( 0.1%) 0.0000 ( 0.0%) 0.0049 ( 0.1%) 0.0042 ( 0.1%) Remove dead machine instructions
0.0020 ( 0.0%) 0.0000 ( 0.0%) 0.0020 ( 0.0%) 0.0041 ( 0.0%) Natural Loop Information
0.0043 ( 0.1%) 0.0000 ( 0.0%) 0.0043 ( 0.1%) 0.0040 ( 0.0%) Dominator Tree Construction
0.0042 ( 0.1%) 0.0000 ( 0.0%) 0.0042 ( 0.0%) 0.0040 ( 0.0%) Loop-Closed SSA Form Pass
0.0039 ( 0.0%) 0.0000 ( 0.0%) 0.0039 ( 0.0%) 0.0038 ( 0.0%) Branch Probability Analysis
0.0030 ( 0.0%) 0.0000 ( 0.0%) 0.0030 ( 0.0%) 0.0038 ( 0.0%) Dominator Tree Construction
0.0037 ( 0.0%) 0.0000 ( 0.0%) 0.0037 ( 0.0%) 0.0038 ( 0.0%) Dominator Tree Construction
0.0024 ( 0.0%) 0.0000 ( 0.0%) 0.0025 ( 0.0%) 0.0037 ( 0.0%) Dominator Tree Construction
0.0029 ( 0.0%) 0.0000 ( 0.0%) 0.0029 ( 0.0%) 0.0037 ( 0.0%) Lazy Value Information Analysis
0.0037 ( 0.0%) 0.0000 ( 0.0%) 0.0037 ( 0.0%) 0.0036 ( 0.0%) Branch Probability Analysis
0.0034 ( 0.0%) 0.0000 ( 0.0%) 0.0034 ( 0.0%) 0.0035 ( 0.0%) Branch Probability Basic Block Placement
0.0016 ( 0.0%) 0.0000 ( 0.0%) 0.0016 ( 0.0%) 0.0034 ( 0.0%) Dominator Tree Construction
0.0030 ( 0.0%) 0.0000 ( 0.0%) 0.0030 ( 0.0%) 0.0033 ( 0.0%) Dominator Tree Construction
0.0031 ( 0.0%) 0.0000 ( 0.0%) 0.0031 ( 0.0%) 0.0033 ( 0.0%) Constant Hoisting
0.0034 ( 0.0%) 0.0000 ( 0.0%) 0.0034 ( 0.0%) 0.0032 ( 0.0%) Dominator Tree Construction
0.0021 ( 0.0%) 0.0035 ( 6.9%) 0.0055 ( 0.1%) 0.0032 ( 0.0%) Dominator Tree Construction
0.0038 ( 0.0%) 0.0000 ( 0.0%) 0.0038 ( 0.0%) 0.0032 ( 0.0%) Loop-Closed SSA Form Pass
0.0034 ( 0.0%) 0.0000 ( 0.0%) 0.0034 ( 0.0%) 0.0029 ( 0.0%) Loop-Closed SSA Form Pass
0.0016 ( 0.0%) 0.0000 ( 0.0%) 0.0016 ( 0.0%) 0.0029 ( 0.0%) Natural Loop Information
0.0029 ( 0.0%) 0.0000 ( 0.0%) 0.0029 ( 0.0%) 0.0028 ( 0.0%) Loop-Closed SSA Form Pass
0.0026 ( 0.0%) 0.0000 ( 0.0%) 0.0026 ( 0.0%) 0.0027 ( 0.0%) Partially inline calls to library functions
0.0020 ( 0.0%) 0.0000 ( 0.0%) 0.0020 ( 0.0%) 0.0026 ( 0.0%) Dominator Tree Construction
0.0017 ( 0.0%) 0.0000 ( 0.0%) 0.0017 ( 0.0%) 0.0026 ( 0.0%) Machine Function Analysis
0.0020 ( 0.0%) 0.0000 ( 0.0%) 0.0020 ( 0.0%) 0.0025 ( 0.0%) NVPTX specific alloca hoisting
0.0023 ( 0.0%) 0.0000 ( 0.0%) 0.0023 ( 0.0%) 0.0024 ( 0.0%) Dominator Tree Construction
0.0030 ( 0.0%) 0.0000 ( 0.0%) 0.0030 ( 0.0%) 0.0023 ( 0.0%) Dominator Tree Construction
0.0026 ( 0.0%) 0.0000 ( 0.0%) 0.0026 ( 0.0%) 0.0022 ( 0.0%) Post-RA pseudo instruction expansion pass
0.0012 ( 0.0%) 0.0000 ( 0.0%) 0.0012 ( 0.0%) 0.0022 ( 0.0%) Canonicalize natural loops
0.0020 ( 0.0%) 0.0000 ( 0.0%) 0.0020 ( 0.0%) 0.0022 ( 0.0%) Dominator Tree Construction
0.0012 ( 0.0%) 0.0000 ( 0.0%) 0.0012 ( 0.0%) 0.0022 ( 0.0%) Dominator Tree Construction
0.0014 ( 0.0%) 0.0000 ( 0.0%) 0.0014 ( 0.0%) 0.0021 ( 0.0%) Dominator Tree Construction
0.0026 ( 0.0%) 0.0000 ( 0.0%) 0.0026 ( 0.0%) 0.0021 ( 0.0%) Canonicalize natural loops
0.0011 ( 0.0%) 0.0000 ( 0.0%) 0.0011 ( 0.0%) 0.0021 ( 0.0%) Delete dead loops
0.0019 ( 0.0%) 0.0000 ( 0.0%) 0.0019 ( 0.0%) 0.0020 ( 0.0%) MachineDominator Tree Construction
0.0017 ( 0.0%) 0.0000 ( 0.0%) 0.0017 ( 0.0%) 0.0020 ( 0.0%) MachineDominator Tree Construction
0.0023 ( 0.0%) 0.0000 ( 0.0%) 0.0023 ( 0.0%) 0.0020 ( 0.0%) MachineDominator Tree Construction
0.0016 ( 0.0%) 0.0000 ( 0.0%) 0.0016 ( 0.0%) 0.0020 ( 0.0%) Dominator Tree Construction
0.0012 ( 0.0%) 0.0000 ( 0.0%) 0.0012 ( 0.0%) 0.0020 ( 0.0%) Dominator Tree Construction
0.0003 ( 0.0%) 0.0000 ( 0.1%) 0.0003 ( 0.0%) 0.0020 ( 0.0%) Lower 'expect' Intrinsics
0.0016 ( 0.0%) 0.0000 ( 0.0%) 0.0016 ( 0.0%) 0.0019 ( 0.0%) Block Frequency Analysis
0.0017 ( 0.0%) 0.0000 ( 0.0%) 0.0017 ( 0.0%) 0.0019 ( 0.0%) MachinePostDominator Tree Construction
0.0014 ( 0.0%) 0.0000 ( 0.0%) 0.0014 ( 0.0%) 0.0017 ( 0.0%) MachineDominator Tree Construction
0.0011 ( 0.0%) 0.0000 ( 0.0%) 0.0011 ( 0.0%) 0.0017 ( 0.0%) Natural Loop Information
0.0015 ( 0.0%) 0.0000 ( 0.0%) 0.0015 ( 0.0%) 0.0017 ( 0.0%) Machine Block Frequency Analysis
0.0014 ( 0.0%) 0.0000 ( 0.0%) 0.0014 ( 0.0%) 0.0017 ( 0.0%) Machine Block Frequency Analysis
0.0011 ( 0.0%) 0.0039 ( 7.7%) 0.0050 ( 0.1%) 0.0017 ( 0.0%) Scalar Evolution Analysis
0.0009 ( 0.0%) 0.0000 ( 0.0%) 0.0009 ( 0.0%) 0.0016 ( 0.0%) Natural Loop Information
0.0017 ( 0.0%) 0.0000 ( 0.0%) 0.0017 ( 0.0%) 0.0015 ( 0.0%) Machine Block Frequency Analysis
0.0015 ( 0.0%) 0.0000 ( 0.0%) 0.0015 ( 0.0%) 0.0015 ( 0.0%) Natural Loop Information
0.0009 ( 0.0%) 0.0000 ( 0.0%) 0.0009 ( 0.0%) 0.0014 ( 0.0%) Natural Loop Information
0.0015 ( 0.0%) 0.0000 ( 0.0%) 0.0015 ( 0.0%) 0.0014 ( 0.0%) Natural Loop Information
0.0007 ( 0.0%) 0.0000 ( 0.0%) 0.0007 ( 0.0%) 0.0014 ( 0.0%) Scalar Evolution Analysis
0.0004 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.0%) 0.0014 ( 0.0%) Scalar Evolution Analysis
0.0017 ( 0.0%) 0.0000 ( 0.0%) 0.0017 ( 0.0%) 0.0014 ( 0.0%) Unnamed pass: implement Pass::getPassName()
0.0004 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.0%) 0.0013 ( 0.0%) Canonicalize natural loops
0.0011 ( 0.0%) 0.0000 ( 0.0%) 0.0011 ( 0.0%) 0.0012 ( 0.0%) Global Variable Optimizer
0.0011 ( 0.0%) 0.0000 ( 0.0%) 0.0011 ( 0.0%) 0.0012 ( 0.0%) Canonicalize natural loops
0.0012 ( 0.0%) 0.0000 ( 0.0%) 0.0012 ( 0.0%) 0.0012 ( 0.0%) Merge disjoint stack slots
0.0011 ( 0.0%) 0.0000 ( 0.0%) 0.0011 ( 0.0%) 0.0012 ( 0.0%) Machine Natural Loop Construction
0.0010 ( 0.0%) 0.0000 ( 0.0%) 0.0010 ( 0.0%) 0.0011 ( 0.0%) Machine Natural Loop Construction
0.0007 ( 0.0%) 0.0000 ( 0.0%) 0.0007 ( 0.0%) 0.0011 ( 0.0%) Speculatively execute instructions
0.0007 ( 0.0%) 0.0000 ( 0.0%) 0.0007 ( 0.0%) 0.0011 ( 0.0%) Process Implicit Definitions
0.0008 ( 0.0%) 0.0000 ( 0.0%) 0.0008 ( 0.0%) 0.0011 ( 0.0%) Expand ISel Pseudo-instructions
0.0008 ( 0.0%) 0.0000 ( 0.0%) 0.0008 ( 0.0%) 0.0011 ( 0.0%) Machine Natural Loop Construction
0.0007 ( 0.0%) 0.0000 ( 0.0%) 0.0007 ( 0.0%) 0.0011 ( 0.0%) NVPTX optimize redundant cvta.to.local instruction
0.0006 ( 0.0%) 0.0000 ( 0.0%) 0.0006 ( 0.0%) 0.0011 ( 0.0%) Canonicalize natural loops
0.0004 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.0%) 0.0010 ( 0.0%) Scalar Evolution Analysis
0.0005 ( 0.0%) 0.0000 ( 0.0%) 0.0005 ( 0.0%) 0.0008 ( 0.0%) MergedLoadStoreMotion
0.0002 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) 0.0008 ( 0.0%) Lower pointer arguments of CUDA kernels
0.0008 ( 0.0%) 0.0000 ( 0.0%) 0.0008 ( 0.0%) 0.0008 ( 0.0%) Remove unreachable blocks from the CFG
0.0006 ( 0.0%) 0.0000 ( 0.0%) 0.0006 ( 0.0%) 0.0007 ( 0.0%) Replace occurrences of __nvvm_reflect() calls with 0/1
0.0004 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.0%) 0.0007 ( 0.0%) Memory Dependence Analysis
0.0006 ( 0.0%) 0.0000 ( 0.0%) 0.0006 ( 0.0%) 0.0007 ( 0.0%) Remove unreachable machine basic blocks
0.0009 ( 0.0%) 0.0000 ( 0.0%) 0.0009 ( 0.0%) 0.0006 ( 0.0%) Optimize machine instruction PHIs
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0006 ( 0.0%) Canonicalize natural loops
0.0004 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.0%) 0.0005 ( 0.0%) Memory Dependence Analysis
0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0005 ( 0.0%) Inline Cost Analysis
0.0002 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) 0.0005 ( 0.0%) Speculatively execute instructions
0.0003 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) 0.0005 ( 0.0%) Remove unreachable blocks from the CFG
0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0005 ( 0.0%) Canonicalize natural loops
0.0002 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) 0.0005 ( 0.0%) Rotate Loops
0.0003 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) 0.0005 ( 0.0%) Memory Dependence Analysis
0.0003 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) 0.0004 ( 0.0%) Lower invoke and unwind, for unwindless code generators
0.0003 ( 0.0%) 0.0000 ( 0.0%) 0.0003 ( 0.0%) 0.0004 ( 0.0%) Tail Duplication
0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0004 ( 0.0%) Memory Dependence Analysis
0.0002 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) 0.0003 ( 0.0%) SROA
0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0003 ( 0.0%) Memory Dependence Analysis
0.0004 ( 0.0%) 0.0000 ( 0.0%) 0.0004 ( 0.0%) 0.0002 ( 0.0%) Internalize Global Symbols
0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0002 ( 0.0%) Scalar Evolution Analysis
0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) Insert stack protectors
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) SLP Vectorizer
0.0002 ( 0.0%) 0.0000 ( 0.0%) 0.0002 ( 0.0%) 0.0001 ( 0.0%) Loop Vectorization
0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) Post RA top-down list latency scheduler
0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) Strip Unused Function Prototypes
0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) Loop Access Analysis
0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) StackMap Liveness Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Scalar Evolution Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Safe Stack instrumentation pass
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Scalar Evolution Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Analyze Machine Code For Garbage Collection
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Machine Instruction Scheduler
0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) Lower Garbage Collection Instructions
0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0001 ( 0.0%) Scalar Evolution Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Alignment from assumptions
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Live Stack Slot Analysis
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Shadow Stack GC Lowering
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) Stack Slot Coloring
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Local Stack Slot Allocation
0.0001 ( 0.0%) 0.0000 ( 0.0%) 0.0001 ( 0.0%) 0.0000 ( 0.0%) Assign valid PTX names to globals
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Merge Duplicate Global Constants
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Basic Alias Analysis (stateless AA impl)
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Transform Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Assumption Cache Tracker
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Assumption Cache Tracker
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Ensure that the global variables are in the global address space
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Unnamed pass: implement Pass::getPassName()
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Rewrite Symbols
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) No Alias Analysis (always returns 'may' alias)
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Create Garbage Collector Module Metadata
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Basic Alias Analysis (stateless AA impl)
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) A No-Op Barrier Pass
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Transform Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Pass Configuration
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Target Library Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Module Information
0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) Machine Branch Probability Analysis
8.3034 (100.0%) 0.0502 (100.0%) 8.3537 (100.0%) 8.3467 (100.0%) Total
http://reviews.llvm.org/D11304
More information about the llvm-commits
mailing list