[PATCH] D11304: [NVPTX] run LSR before straight-line optimizations

Jingyue Wu jingyue at google.com
Fri Jul 17 20:55:20 PDT 2015


jingyue added a comment.

Below is the compilation time breakdown of running "opt -O3 and llc" on one of our GPU program. It leads to >100k lines of PTX.

This extra GVN takes 2.6% of the time. There are three GVNs in the list. The first one (4.8%) happens in the target-independent stage. The other two happen in NVPTX's private pipeline.

I'll add a check to enable it for -O3 only.

  ===-------------------------------------------------------------------------===
                        ... Pass execution timing report ...
  ===-------------------------------------------------------------------------===
    Total Execution Time: 8.3537 seconds (8.3467 wall clock)
  
     ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
     0.9868 ( 11.9%)   0.0161 ( 32.1%)   1.0029 ( 12.0%)   1.0048 ( 12.0%)  NVPTX DAG->DAG Pattern Instruction Selection
     0.9987 ( 12.0%)   0.0000 (  0.0%)   0.9987 ( 12.0%)   0.9995 ( 12.0%)  Straight line strength reduction
     0.4514 (  5.4%)   0.0000 (  0.0%)   0.4514 (  5.4%)   0.4490 (  5.4%)  Function Integration/Inlining
     0.4348 (  5.2%)   0.0000 (  0.0%)   0.4348 (  5.2%)   0.4354 (  5.2%)  Nary reassociation
     0.4033 (  4.9%)   0.0000 (  0.0%)   0.4033 (  4.8%)   0.4002 (  4.8%)  Global Value Numbering
     0.2823 (  3.4%)   0.0001 (  0.1%)   0.2824 (  3.4%)   0.2780 (  3.3%)  Combine redundant instructions
     0.2696 (  3.2%)   0.0002 (  0.3%)   0.2697 (  3.2%)   0.2647 (  3.2%)  Combine redundant instructions
     0.2423 (  2.9%)   0.0000 (  0.0%)   0.2423 (  2.9%)   0.2387 (  2.9%)  Combine redundant instructions
     0.2328 (  2.8%)   0.0000 (  0.1%)   0.2328 (  2.8%)   0.2291 (  2.7%)  Combine redundant instructions
     0.2232 (  2.7%)   0.0000 (  0.0%)   0.2232 (  2.7%)   0.2223 (  2.7%)  Global Value Numbering
     0.2123 (  2.6%)   0.0001 (  0.1%)   0.2124 (  2.5%)   0.2161 (  2.6%)  Global Value Numbering
     0.2000 (  2.4%)   0.0002 (  0.4%)   0.2001 (  2.4%)   0.1944 (  2.3%)  Loop Invariant Code Motion
     0.1929 (  2.3%)   0.0001 (  0.3%)   0.1931 (  2.3%)   0.1927 (  2.3%)  Combine redundant instructions
     0.1928 (  2.3%)   0.0000 (  0.0%)   0.1928 (  2.3%)   0.1924 (  2.3%)  Combine redundant instructions
     0.1907 (  2.3%)   0.0000 (  0.0%)   0.1907 (  2.3%)   0.1919 (  2.3%)  Value Propagation
     0.1759 (  2.1%)   0.0006 (  1.1%)   0.1764 (  2.1%)   0.1715 (  2.1%)  Induction Variable Simplification
     0.1735 (  2.1%)   0.0001 (  0.1%)   0.1736 (  2.1%)   0.1714 (  2.1%)  Loop Invariant Code Motion
     0.1671 (  2.0%)   0.0010 (  2.0%)   0.1682 (  2.0%)   0.1714 (  2.1%)  Combine redundant instructions
     0.1426 (  1.7%)   0.0000 (  0.0%)   0.1426 (  1.7%)   0.1415 (  1.7%)  Loop Invariant Code Motion
     0.1302 (  1.6%)   0.0001 (  0.2%)   0.1304 (  1.6%)   0.1302 (  1.6%)  Loop Strength Reduction
     0.1226 (  1.5%)   0.0000 (  0.0%)   0.1226 (  1.5%)   0.1287 (  1.5%)  Unroll loops
     0.1248 (  1.5%)   0.0002 (  0.4%)   0.1251 (  1.5%)   0.1269 (  1.5%)  SROA
     0.0994 (  1.2%)   0.0000 (  0.0%)   0.0994 (  1.2%)   0.0988 (  1.2%)  Value Propagation
     0.0991 (  1.2%)   0.0000 (  0.0%)   0.0991 (  1.2%)   0.0979 (  1.2%)  Combine redundant instructions
     0.0748 (  0.9%)   0.0039 (  7.8%)   0.0787 (  0.9%)   0.0752 (  0.9%)  Simple Register Coalescing
     0.0659 (  0.8%)   0.0000 (  0.0%)   0.0659 (  0.8%)   0.0664 (  0.8%)  Induction Variable Users
     0.0659 (  0.8%)   0.0038 (  7.6%)   0.0697 (  0.8%)   0.0633 (  0.8%)  Early CSE
     0.0639 (  0.8%)   0.0000 (  0.0%)   0.0639 (  0.8%)   0.0631 (  0.8%)  Sparse Conditional Constant Propagation
     0.0632 (  0.8%)   0.0000 (  0.1%)   0.0632 (  0.8%)   0.0625 (  0.7%)  Early CSE
     0.0579 (  0.7%)   0.0000 (  0.0%)   0.0579 (  0.7%)   0.0580 (  0.7%)  NVPTX Assembly Printer
     0.0565 (  0.7%)   0.0000 (  0.0%)   0.0565 (  0.7%)   0.0566 (  0.7%)  CodeGen Prepare
     0.0564 (  0.7%)   0.0000 (  0.0%)   0.0564 (  0.7%)   0.0553 (  0.7%)  Live Interval Analysis
     0.0574 (  0.7%)   0.0001 (  0.2%)   0.0575 (  0.7%)   0.0530 (  0.6%)  Early CSE
     0.0506 (  0.6%)   0.0000 (  0.1%)   0.0506 (  0.6%)   0.0468 (  0.6%)  Dead Store Elimination
     0.0437 (  0.5%)   0.0000 (  0.0%)   0.0437 (  0.5%)   0.0436 (  0.5%)  Dead Code Elimination
     0.0438 (  0.5%)   0.0002 (  0.3%)   0.0439 (  0.5%)   0.0424 (  0.5%)  Bit-Tracking Dead Code Elimination
     0.0428 (  0.5%)   0.0000 (  0.0%)   0.0428 (  0.5%)   0.0419 (  0.5%)  Machine Loop Invariant Code Motion
     0.0415 (  0.5%)   0.0000 (  0.0%)   0.0415 (  0.5%)   0.0405 (  0.5%)  Module Verifier
     0.0413 (  0.5%)   0.0000 (  0.0%)   0.0413 (  0.5%)   0.0401 (  0.5%)  SROA
     0.0367 (  0.4%)   0.0000 (  0.0%)   0.0367 (  0.4%)   0.0373 (  0.4%)  Machine Common Subexpression Elimination
     0.0352 (  0.4%)   0.0000 (  0.0%)   0.0352 (  0.4%)   0.0351 (  0.4%)  Interprocedural Sparse Conditional Constant Propagation
     0.0339 (  0.4%)   0.0000 (  0.0%)   0.0339 (  0.4%)   0.0342 (  0.4%)  Module Verifier
     0.0349 (  0.4%)   0.0001 (  0.2%)   0.0350 (  0.4%)   0.0318 (  0.4%)  Simplify the CFG
     0.0304 (  0.4%)   0.0000 (  0.0%)   0.0304 (  0.4%)   0.0299 (  0.4%)  Live Variable Analysis
     0.0278 (  0.3%)   0.0000 (  0.0%)   0.0278 (  0.3%)   0.0274 (  0.3%)  Reassociate expressions
     0.0263 (  0.3%)   0.0039 (  7.7%)   0.0301 (  0.4%)   0.0260 (  0.3%)  Aggressive Dead Code Elimination
     0.0215 (  0.3%)   0.0000 (  0.0%)   0.0215 (  0.3%)   0.0228 (  0.3%)  Jump Threading
     0.0211 (  0.3%)   0.0000 (  0.0%)   0.0211 (  0.3%)   0.0205 (  0.2%)  Split GEPs to a variadic base and a constant offset for better CSE
     0.0193 (  0.2%)   0.0000 (  0.0%)   0.0193 (  0.2%)   0.0200 (  0.2%)  Simplify the CFG
     0.0131 (  0.2%)   0.0038 (  7.6%)   0.0169 (  0.2%)   0.0175 (  0.2%)  Unnamed pass: implement Pass::getPassName()
     0.0169 (  0.2%)   0.0000 (  0.0%)   0.0170 (  0.2%)   0.0174 (  0.2%)  Rotate Loops
     0.0166 (  0.2%)   0.0000 (  0.0%)   0.0166 (  0.2%)   0.0170 (  0.2%)  convert address space of alloca'ed memory to local
     0.0148 (  0.2%)   0.0039 (  7.8%)   0.0188 (  0.2%)   0.0168 (  0.2%)  Lower aggregate copies/intrinsics into loops
     0.0168 (  0.2%)   0.0000 (  0.0%)   0.0168 (  0.2%)   0.0164 (  0.2%)  Machine code sinking
     0.0171 (  0.2%)   0.0000 (  0.0%)   0.0171 (  0.2%)   0.0160 (  0.2%)  Unroll loops
     0.0156 (  0.2%)   0.0000 (  0.0%)   0.0156 (  0.2%)   0.0160 (  0.2%)  Simplify the CFG
     0.0107 (  0.1%)   0.0000 (  0.0%)   0.0107 (  0.1%)   0.0154 (  0.2%)  Recognize loop idioms
     0.0141 (  0.2%)   0.0000 (  0.0%)   0.0141 (  0.2%)   0.0141 (  0.2%)  Eliminate PHI nodes for register allocation
     0.0127 (  0.2%)   0.0000 (  0.1%)   0.0128 (  0.2%)   0.0127 (  0.2%)  Unnamed pass: implement Pass::getPassName()
     0.0099 (  0.1%)   0.0000 (  0.0%)   0.0099 (  0.1%)   0.0111 (  0.1%)  Jump Threading
     0.0111 (  0.1%)   0.0000 (  0.0%)   0.0111 (  0.1%)   0.0111 (  0.1%)  Remove unused exception handling info
     0.0105 (  0.1%)   0.0000 (  0.0%)   0.0106 (  0.1%)   0.0107 (  0.1%)  Simplify the CFG
     0.0101 (  0.1%)   0.0000 (  0.0%)   0.0101 (  0.1%)   0.0102 (  0.1%)  Simplify the CFG
     0.0095 (  0.1%)   0.0000 (  0.0%)   0.0095 (  0.1%)   0.0098 (  0.1%)  Float to int
     0.0066 (  0.1%)   0.0000 (  0.0%)   0.0066 (  0.1%)   0.0095 (  0.1%)  Tail Call Elimination
     0.0094 (  0.1%)   0.0000 (  0.0%)   0.0094 (  0.1%)   0.0094 (  0.1%)  Dead Global Elimination
     0.0051 (  0.1%)   0.0001 (  0.2%)   0.0052 (  0.1%)   0.0088 (  0.1%)  Simplify the CFG
     0.0060 (  0.1%)   0.0000 (  0.0%)   0.0060 (  0.1%)   0.0084 (  0.1%)  Loop-Closed SSA Form Pass
     0.0079 (  0.1%)   0.0000 (  0.0%)   0.0079 (  0.1%)   0.0082 (  0.1%)  Promote 'by reference' arguments to scalars
     0.0074 (  0.1%)   0.0000 (  0.0%)   0.0074 (  0.1%)   0.0074 (  0.1%)  Two-Address instruction pass
     0.0068 (  0.1%)   0.0000 (  0.0%)   0.0068 (  0.1%)   0.0073 (  0.1%)  Unswitch loops
     0.0033 (  0.0%)   0.0001 (  0.1%)   0.0034 (  0.0%)   0.0072 (  0.1%)  Dominator Tree Construction
     0.0031 (  0.0%)   0.0000 (  0.0%)   0.0031 (  0.0%)   0.0068 (  0.1%)  Deduce function attributes
     0.0048 (  0.1%)   0.0000 (  0.0%)   0.0048 (  0.1%)   0.0063 (  0.1%)  Lazy Value Information Analysis
     0.0040 (  0.0%)   0.0000 (  0.0%)   0.0040 (  0.0%)   0.0057 (  0.1%)  MemCpy Optimization
     0.0062 (  0.1%)   0.0000 (  0.0%)   0.0062 (  0.1%)   0.0055 (  0.1%)  Remove unnecessary non-generic-to-generic addrspacecasts
     0.0049 (  0.1%)   0.0000 (  0.0%)   0.0049 (  0.1%)   0.0052 (  0.1%)  SROA
     0.0054 (  0.1%)   0.0000 (  0.0%)   0.0054 (  0.1%)   0.0052 (  0.1%)  Peephole Optimizations
     0.0049 (  0.1%)   0.0038 (  7.6%)   0.0088 (  0.1%)   0.0050 (  0.1%)  Loop-Closed SSA Form Pass
     0.0042 (  0.1%)   0.0000 (  0.0%)   0.0042 (  0.1%)   0.0050 (  0.1%)  Slot index numbering
     0.0046 (  0.1%)   0.0000 (  0.0%)   0.0046 (  0.1%)   0.0048 (  0.1%)  CallGraph Construction
     0.0043 (  0.1%)   0.0000 (  0.0%)   0.0043 (  0.1%)   0.0047 (  0.1%)  Slot index numbering
     0.0025 (  0.0%)   0.0000 (  0.0%)   0.0025 (  0.0%)   0.0046 (  0.1%)  Dominator Tree Construction
     0.0050 (  0.1%)   0.0000 (  0.0%)   0.0050 (  0.1%)   0.0045 (  0.1%)  Dead Argument Elimination
     0.0049 (  0.1%)   0.0000 (  0.0%)   0.0049 (  0.1%)   0.0042 (  0.1%)  Remove dead machine instructions
     0.0020 (  0.0%)   0.0000 (  0.0%)   0.0020 (  0.0%)   0.0041 (  0.0%)  Natural Loop Information
     0.0043 (  0.1%)   0.0000 (  0.0%)   0.0043 (  0.1%)   0.0040 (  0.0%)  Dominator Tree Construction
     0.0042 (  0.1%)   0.0000 (  0.0%)   0.0042 (  0.0%)   0.0040 (  0.0%)  Loop-Closed SSA Form Pass
     0.0039 (  0.0%)   0.0000 (  0.0%)   0.0039 (  0.0%)   0.0038 (  0.0%)  Branch Probability Analysis
     0.0030 (  0.0%)   0.0000 (  0.0%)   0.0030 (  0.0%)   0.0038 (  0.0%)  Dominator Tree Construction
     0.0037 (  0.0%)   0.0000 (  0.0%)   0.0037 (  0.0%)   0.0038 (  0.0%)  Dominator Tree Construction
     0.0024 (  0.0%)   0.0000 (  0.0%)   0.0025 (  0.0%)   0.0037 (  0.0%)  Dominator Tree Construction
     0.0029 (  0.0%)   0.0000 (  0.0%)   0.0029 (  0.0%)   0.0037 (  0.0%)  Lazy Value Information Analysis
     0.0037 (  0.0%)   0.0000 (  0.0%)   0.0037 (  0.0%)   0.0036 (  0.0%)  Branch Probability Analysis
     0.0034 (  0.0%)   0.0000 (  0.0%)   0.0034 (  0.0%)   0.0035 (  0.0%)  Branch Probability Basic Block Placement
     0.0016 (  0.0%)   0.0000 (  0.0%)   0.0016 (  0.0%)   0.0034 (  0.0%)  Dominator Tree Construction
     0.0030 (  0.0%)   0.0000 (  0.0%)   0.0030 (  0.0%)   0.0033 (  0.0%)  Dominator Tree Construction
     0.0031 (  0.0%)   0.0000 (  0.0%)   0.0031 (  0.0%)   0.0033 (  0.0%)  Constant Hoisting
     0.0034 (  0.0%)   0.0000 (  0.0%)   0.0034 (  0.0%)   0.0032 (  0.0%)  Dominator Tree Construction
     0.0021 (  0.0%)   0.0035 (  6.9%)   0.0055 (  0.1%)   0.0032 (  0.0%)  Dominator Tree Construction
     0.0038 (  0.0%)   0.0000 (  0.0%)   0.0038 (  0.0%)   0.0032 (  0.0%)  Loop-Closed SSA Form Pass
     0.0034 (  0.0%)   0.0000 (  0.0%)   0.0034 (  0.0%)   0.0029 (  0.0%)  Loop-Closed SSA Form Pass
     0.0016 (  0.0%)   0.0000 (  0.0%)   0.0016 (  0.0%)   0.0029 (  0.0%)  Natural Loop Information
     0.0029 (  0.0%)   0.0000 (  0.0%)   0.0029 (  0.0%)   0.0028 (  0.0%)  Loop-Closed SSA Form Pass
     0.0026 (  0.0%)   0.0000 (  0.0%)   0.0026 (  0.0%)   0.0027 (  0.0%)  Partially inline calls to library functions
     0.0020 (  0.0%)   0.0000 (  0.0%)   0.0020 (  0.0%)   0.0026 (  0.0%)  Dominator Tree Construction
     0.0017 (  0.0%)   0.0000 (  0.0%)   0.0017 (  0.0%)   0.0026 (  0.0%)  Machine Function Analysis
     0.0020 (  0.0%)   0.0000 (  0.0%)   0.0020 (  0.0%)   0.0025 (  0.0%)  NVPTX specific alloca hoisting
     0.0023 (  0.0%)   0.0000 (  0.0%)   0.0023 (  0.0%)   0.0024 (  0.0%)  Dominator Tree Construction
     0.0030 (  0.0%)   0.0000 (  0.0%)   0.0030 (  0.0%)   0.0023 (  0.0%)  Dominator Tree Construction
     0.0026 (  0.0%)   0.0000 (  0.0%)   0.0026 (  0.0%)   0.0022 (  0.0%)  Post-RA pseudo instruction expansion pass
     0.0012 (  0.0%)   0.0000 (  0.0%)   0.0012 (  0.0%)   0.0022 (  0.0%)  Canonicalize natural loops
     0.0020 (  0.0%)   0.0000 (  0.0%)   0.0020 (  0.0%)   0.0022 (  0.0%)  Dominator Tree Construction
     0.0012 (  0.0%)   0.0000 (  0.0%)   0.0012 (  0.0%)   0.0022 (  0.0%)  Dominator Tree Construction
     0.0014 (  0.0%)   0.0000 (  0.0%)   0.0014 (  0.0%)   0.0021 (  0.0%)  Dominator Tree Construction
     0.0026 (  0.0%)   0.0000 (  0.0%)   0.0026 (  0.0%)   0.0021 (  0.0%)  Canonicalize natural loops
     0.0011 (  0.0%)   0.0000 (  0.0%)   0.0011 (  0.0%)   0.0021 (  0.0%)  Delete dead loops
     0.0019 (  0.0%)   0.0000 (  0.0%)   0.0019 (  0.0%)   0.0020 (  0.0%)  MachineDominator Tree Construction
     0.0017 (  0.0%)   0.0000 (  0.0%)   0.0017 (  0.0%)   0.0020 (  0.0%)  MachineDominator Tree Construction
     0.0023 (  0.0%)   0.0000 (  0.0%)   0.0023 (  0.0%)   0.0020 (  0.0%)  MachineDominator Tree Construction
     0.0016 (  0.0%)   0.0000 (  0.0%)   0.0016 (  0.0%)   0.0020 (  0.0%)  Dominator Tree Construction
     0.0012 (  0.0%)   0.0000 (  0.0%)   0.0012 (  0.0%)   0.0020 (  0.0%)  Dominator Tree Construction
     0.0003 (  0.0%)   0.0000 (  0.1%)   0.0003 (  0.0%)   0.0020 (  0.0%)  Lower 'expect' Intrinsics
     0.0016 (  0.0%)   0.0000 (  0.0%)   0.0016 (  0.0%)   0.0019 (  0.0%)  Block Frequency Analysis
     0.0017 (  0.0%)   0.0000 (  0.0%)   0.0017 (  0.0%)   0.0019 (  0.0%)  MachinePostDominator Tree Construction
     0.0014 (  0.0%)   0.0000 (  0.0%)   0.0014 (  0.0%)   0.0017 (  0.0%)  MachineDominator Tree Construction
     0.0011 (  0.0%)   0.0000 (  0.0%)   0.0011 (  0.0%)   0.0017 (  0.0%)  Natural Loop Information
     0.0015 (  0.0%)   0.0000 (  0.0%)   0.0015 (  0.0%)   0.0017 (  0.0%)  Machine Block Frequency Analysis
     0.0014 (  0.0%)   0.0000 (  0.0%)   0.0014 (  0.0%)   0.0017 (  0.0%)  Machine Block Frequency Analysis
     0.0011 (  0.0%)   0.0039 (  7.7%)   0.0050 (  0.1%)   0.0017 (  0.0%)  Scalar Evolution Analysis
     0.0009 (  0.0%)   0.0000 (  0.0%)   0.0009 (  0.0%)   0.0016 (  0.0%)  Natural Loop Information
     0.0017 (  0.0%)   0.0000 (  0.0%)   0.0017 (  0.0%)   0.0015 (  0.0%)  Machine Block Frequency Analysis
     0.0015 (  0.0%)   0.0000 (  0.0%)   0.0015 (  0.0%)   0.0015 (  0.0%)  Natural Loop Information
     0.0009 (  0.0%)   0.0000 (  0.0%)   0.0009 (  0.0%)   0.0014 (  0.0%)  Natural Loop Information
     0.0015 (  0.0%)   0.0000 (  0.0%)   0.0015 (  0.0%)   0.0014 (  0.0%)  Natural Loop Information
     0.0007 (  0.0%)   0.0000 (  0.0%)   0.0007 (  0.0%)   0.0014 (  0.0%)  Scalar Evolution Analysis
     0.0004 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.0%)   0.0014 (  0.0%)  Scalar Evolution Analysis
     0.0017 (  0.0%)   0.0000 (  0.0%)   0.0017 (  0.0%)   0.0014 (  0.0%)  Unnamed pass: implement Pass::getPassName()
     0.0004 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.0%)   0.0013 (  0.0%)  Canonicalize natural loops
     0.0011 (  0.0%)   0.0000 (  0.0%)   0.0011 (  0.0%)   0.0012 (  0.0%)  Global Variable Optimizer
     0.0011 (  0.0%)   0.0000 (  0.0%)   0.0011 (  0.0%)   0.0012 (  0.0%)  Canonicalize natural loops
     0.0012 (  0.0%)   0.0000 (  0.0%)   0.0012 (  0.0%)   0.0012 (  0.0%)  Merge disjoint stack slots
     0.0011 (  0.0%)   0.0000 (  0.0%)   0.0011 (  0.0%)   0.0012 (  0.0%)  Machine Natural Loop Construction
     0.0010 (  0.0%)   0.0000 (  0.0%)   0.0010 (  0.0%)   0.0011 (  0.0%)  Machine Natural Loop Construction
     0.0007 (  0.0%)   0.0000 (  0.0%)   0.0007 (  0.0%)   0.0011 (  0.0%)  Speculatively execute instructions
     0.0007 (  0.0%)   0.0000 (  0.0%)   0.0007 (  0.0%)   0.0011 (  0.0%)  Process Implicit Definitions
     0.0008 (  0.0%)   0.0000 (  0.0%)   0.0008 (  0.0%)   0.0011 (  0.0%)  Expand ISel Pseudo-instructions
     0.0008 (  0.0%)   0.0000 (  0.0%)   0.0008 (  0.0%)   0.0011 (  0.0%)  Machine Natural Loop Construction
     0.0007 (  0.0%)   0.0000 (  0.0%)   0.0007 (  0.0%)   0.0011 (  0.0%)  NVPTX optimize redundant cvta.to.local instruction
     0.0006 (  0.0%)   0.0000 (  0.0%)   0.0006 (  0.0%)   0.0011 (  0.0%)  Canonicalize natural loops
     0.0004 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.0%)   0.0010 (  0.0%)  Scalar Evolution Analysis
     0.0005 (  0.0%)   0.0000 (  0.0%)   0.0005 (  0.0%)   0.0008 (  0.0%)  MergedLoadStoreMotion
     0.0002 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)   0.0008 (  0.0%)  Lower pointer arguments of CUDA kernels
     0.0008 (  0.0%)   0.0000 (  0.0%)   0.0008 (  0.0%)   0.0008 (  0.0%)  Remove unreachable blocks from the CFG
     0.0006 (  0.0%)   0.0000 (  0.0%)   0.0006 (  0.0%)   0.0007 (  0.0%)  Replace occurrences of __nvvm_reflect() calls with 0/1
     0.0004 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.0%)   0.0007 (  0.0%)  Memory Dependence Analysis
     0.0006 (  0.0%)   0.0000 (  0.0%)   0.0006 (  0.0%)   0.0007 (  0.0%)  Remove unreachable machine basic blocks
     0.0009 (  0.0%)   0.0000 (  0.0%)   0.0009 (  0.0%)   0.0006 (  0.0%)  Optimize machine instruction PHIs
     0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0006 (  0.0%)  Canonicalize natural loops
     0.0004 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.0%)   0.0005 (  0.0%)  Memory Dependence Analysis
     0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0005 (  0.0%)  Inline Cost Analysis
     0.0002 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)   0.0005 (  0.0%)  Speculatively execute instructions
     0.0003 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)   0.0005 (  0.0%)  Remove unreachable blocks from the CFG
     0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0005 (  0.0%)  Canonicalize natural loops
     0.0002 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)   0.0005 (  0.0%)  Rotate Loops
     0.0003 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)   0.0005 (  0.0%)  Memory Dependence Analysis
     0.0003 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)   0.0004 (  0.0%)  Lower invoke and unwind, for unwindless code generators
     0.0003 (  0.0%)   0.0000 (  0.0%)   0.0003 (  0.0%)   0.0004 (  0.0%)  Tail Duplication
     0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0004 (  0.0%)  Memory Dependence Analysis
     0.0002 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)   0.0003 (  0.0%)  SROA
     0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0003 (  0.0%)  Memory Dependence Analysis
     0.0004 (  0.0%)   0.0000 (  0.0%)   0.0004 (  0.0%)   0.0002 (  0.0%)  Internalize Global Symbols
     0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)  Scalar Evolution Analysis
     0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Insert stack protectors
     0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  SLP Vectorizer
     0.0002 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)   0.0001 (  0.0%)  Loop Vectorization
     0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Post RA top-down list latency scheduler
     0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Strip Unused Function Prototypes
     0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Loop Access Analysis
     0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  StackMap Liveness Analysis
     0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Scalar Evolution Analysis
     0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Safe Stack instrumentation pass
     0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Scalar Evolution Analysis
     0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Analyze Machine Code For Garbage Collection
     0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Machine Instruction Scheduler
     0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Lower Garbage Collection Instructions
     0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Scalar Evolution Analysis
     0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Alignment from assumptions
     0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Live Stack Slot Analysis
     0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Shadow Stack GC Lowering
     0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)  Stack Slot Coloring
     0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Local Stack Slot Allocation
     0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0000 (  0.0%)  Assign valid PTX names to globals
     0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Merge Duplicate Global Constants
     0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Basic Alias Analysis (stateless AA impl)
     0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Transform Information
     0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Assumption Cache Tracker
     0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Assumption Cache Tracker
     0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Ensure that the global variables are in the global address space
     0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Unnamed pass: implement Pass::getPassName()
     0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Rewrite Symbols
     0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Library Information
     0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  No Alias Analysis (always returns 'may' alias)
     0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Create Garbage Collector Module Metadata
     0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Basic Alias Analysis (stateless AA impl)
     0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  A No-Op Barrier Pass
     0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Transform Information
     0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Pass Configuration
     0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Library Information
     0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Module Information
     0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Branch Probability Analysis
     8.3034 (100.0%)   0.0502 (100.0%)   8.3537 (100.0%)   8.3467 (100.0%)  Total


http://reviews.llvm.org/D11304







More information about the llvm-commits mailing list