[PATCH] D42717: [JumpThreading] sync DT for LVI analysis (PR 36133)

Brian Rzycki via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Feb 12 16:34:13 PST 2018


brzycki added a comment.

I used `clang -O3 -fno-exceptions -ftime-report -c tramp3d-v4.cpp -o /tmp/foo.o` with an upstream (non-patched) compiler. Here are the hottest areas of pass execution according to the reports. These lines are displayed in the same order:

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  37.7480 (  9.8%)   0.1080 (  3.9%)  37.8560 (  9.7%)  37.8099 (  9.7%)  Global Value Numbering
  32.5400 (  8.4%)   0.3320 ( 11.9%)  32.8720 (  8.4%)  32.8960 (  8.5%)  Function Integration/Inlining
  26.7520 (  6.9%)   0.2280 (  8.2%)  26.9800 (  6.9%)  26.9368 (  6.9%)  X86 DAG->DAG Instruction Selection
  25.8080 (  6.7%)   0.0960 (  3.4%)  25.9040 (  6.7%)  25.8395 (  6.6%)  Combine redundant instructions
  16.7080 (  4.3%)   0.1160 (  4.2%)  16.8240 (  4.3%)  17.0581 (  4.4%)  Combine redundant instructions
  14.7120 (  3.8%)   0.1000 (  3.6%)  14.8120 (  3.8%)  14.8012 (  3.8%)  Combine redundant instructions
  13.6320 (  3.5%)   0.1120 (  4.0%)  13.7440 (  3.5%)  13.6522 (  3.5%)  Combine redundant instructions
  11.3840 (  2.9%)   0.0120 (  0.4%)  11.3960 (  2.9%)  11.2862 (  2.9%)  Loop Strength Reduction
  10.6120 (  2.7%)   0.0080 (  0.3%)  10.6200 (  2.7%)  10.6267 (  2.7%)  Combine redundant instructions
  10.0360 (  2.6%)   0.0320 (  1.1%)  10.0680 (  2.6%)  10.1269 (  2.6%)  Combine redundant instructions
   9.7280 (  2.5%)   0.0160 (  0.6%)   9.7440 (  2.5%)   9.7972 (  2.5%)  Combine redundant instructions
   8.1040 (  2.1%)   0.0240 (  0.9%)   8.1280 (  2.1%)   8.0646 (  2.1%)  Dead Store Elimination
   7.6040 (  2.0%)   0.0480 (  1.7%)   7.6520 (  2.0%)   7.6756 (  2.0%)  Value Propagation
   7.5520 (  2.0%)   0.0200 (  0.7%)   7.5720 (  1.9%)   7.5224 (  1.9%)  SLP Vectorizer
   6.6240 (  1.7%)   0.0800 (  2.9%)   6.7040 (  1.7%)   6.7257 (  1.7%)  Induction Variable Simplification
   6.6320 (  1.7%)   0.0320 (  1.1%)   6.6640 (  1.7%)   6.7028 (  1.7%)  Loop Invariant Code Motion
   6.5040 (  1.7%)   0.1000 (  3.6%)   6.6040 (  1.7%)   6.5648 (  1.7%)  Combine redundant instructions
   6.0480 (  1.6%)   0.0440 (  1.6%)   6.0920 (  1.6%)   6.2509 (  1.6%)  Memory SSA
   5.9080 (  1.5%)   0.0560 (  2.0%)   5.9640 (  1.5%)   6.0655 (  1.6%)  Value Propagation
   5.8120 (  1.5%)   0.0160 (  0.6%)   5.8280 (  1.5%)   5.9670 (  1.5%)  Early CSE w/ MemorySSA
   5.0480 (  1.3%)   0.0040 (  0.1%)   5.0520 (  1.3%)   5.0647 (  1.3%)  Loop Invariant Code Motion
   4.3280 (  1.1%)   0.0120 (  0.4%)   4.3400 (  1.1%)   4.3600 (  1.1%)  Loop Invariant Code Motion
   4.0080 (  1.0%)   0.0280 (  1.0%)   4.0360 (  1.0%)   4.0010 (  1.0%)  Greedy Register Allocator
   3.7480 (  1.0%)   0.0400 (  1.4%)   3.7880 (  1.0%)   3.7836 (  1.0%)  CodeGen Prepare
   3.2240 (  0.8%)   0.0080 (  0.3%)   3.2320 (  0.8%)   3.3606 (  0.9%)  Machine Instruction Scheduler
   3.2640 (  0.8%)   0.0280 (  1.0%)   3.2920 (  0.8%)   3.3383 (  0.9%)  MemCpy Optimization
   3.1040 (  0.8%)   0.0160 (  0.6%)   3.1200 (  0.8%)   3.1925 (  0.8%)  Induction Variable Users
   2.5760 (  0.7%)   0.0160 (  0.6%)   2.5920 (  0.7%)   2.6184 (  0.7%)  Unroll loops
   2.5720 (  0.7%)   0.0320 (  1.1%)   2.6040 (  0.7%)   2.5598 (  0.7%)  SROA
   2.5320 (  0.7%)   0.0200 (  0.7%)   2.5520 (  0.7%)   2.3682 (  0.6%)  Jump Threading
   2.3400 (  0.6%)   0.0040 (  0.1%)   2.3440 (  0.6%)   2.3639 (  0.6%)  Jump Threading
   2.2880 (  0.6%)   0.0280 (  1.0%)   2.3160 (  0.6%)   2.2819 (  0.6%)  Reassociate expressions
   2.1440 (  0.6%)   0.0120 (  0.4%)   2.1560 (  0.6%)   2.1243 (  0.5%)  Loop Load Elimination
   1.9400 (  0.5%)   0.0080 (  0.3%)   1.9480 (  0.5%)   2.0576 (  0.5%)  SROA
   1.9640 (  0.5%)   0.0160 (  0.6%)   1.9800 (  0.5%)   1.9998 (  0.5%)  Loop Vectorization
   1.5960 (  0.4%)   0.0280 (  1.0%)   1.6240 (  0.4%)   1.6763 (  0.4%)  Simplify the CFG
   1.5000 (  0.4%)   0.0160 (  0.6%)   1.5160 (  0.4%)   1.6653 (  0.4%)  Simplify the CFG
   1.6280 (  0.4%)   0.0160 (  0.6%)   1.6440 (  0.4%)   1.5298 (  0.4%)  Remove redundant instructions
   1.5160 (  0.4%)   0.0160 (  0.6%)   1.5320 (  0.4%)   1.4801 (  0.4%)  Module Verifier
   1.4280 (  0.4%)   0.0240 (  0.9%)   1.4520 (  0.4%)   1.4439 (  0.4%)  Module Verifier
   1.3960 (  0.4%)   0.0000 (  0.0%)   1.3960 (  0.4%)   1.4080 (  0.4%)  Module Verifier
   1.4440 (  0.4%)   0.0080 (  0.3%)   1.4520 (  0.4%)   1.3567 (  0.3%)  Simplify the CFG
   1.2880 (  0.3%)   0.0160 (  0.6%)   1.3040 (  0.3%)   1.3025 (  0.3%)  Simplify the CFG
   1.2040 (  0.3%)   0.0200 (  0.7%)   1.2240 (  0.3%)   1.2963 (  0.3%)  Early CSE
   1.0320 (  0.3%)   0.0120 (  0.4%)   1.0440 (  0.3%)   1.1418 (  0.3%)  Sparse Conditional Constant Propagation

and

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  38.5520 (  9.9%)   0.1040 (  3.9%)  38.6560 (  9.9%)  38.7477 (  9.9%)  Global Value Numbering
  32.6440 (  8.4%)   0.2720 ( 10.2%)  32.9160 (  8.4%)  32.8803 (  8.4%)  Function Integration/Inlining
  26.3040 (  6.8%)   0.2440 (  9.1%)  26.5480 (  6.8%)  26.6897 (  6.8%)  X86 DAG->DAG Instruction Selection
  26.3120 (  6.8%)   0.0760 (  2.8%)  26.3880 (  6.7%)  26.2355 (  6.7%)  Combine redundant instructions
  17.1440 (  4.4%)   0.0800 (  3.0%)  17.2240 (  4.4%)  17.3240 (  4.4%)  Combine redundant instructions
  14.9080 (  3.8%)   0.0480 (  1.8%)  14.9560 (  3.8%)  15.0418 (  3.8%)  Combine redundant instructions
  13.6120 (  3.5%)   0.0560 (  2.1%)  13.6680 (  3.5%)  13.8665 (  3.5%)  Combine redundant instructions
  11.0240 (  2.8%)   0.0400 (  1.5%)  11.0640 (  2.8%)  11.0079 (  2.8%)  Loop Strength Reduction
  10.7280 (  2.8%)   0.0320 (  1.2%)  10.7600 (  2.7%)  10.7927 (  2.7%)  Combine redundant instructions
  10.2160 (  2.6%)   0.0400 (  1.5%)  10.2560 (  2.6%)  10.2907 (  2.6%)  Combine redundant instructions
   9.9440 (  2.6%)   0.0200 (  0.7%)   9.9640 (  2.5%)   9.9617 (  2.5%)  Combine redundant instructions
   8.3960 (  2.2%)   0.0320 (  1.2%)   8.4280 (  2.1%)   8.2761 (  2.1%)  Dead Store Elimination
   7.7520 (  2.0%)   0.0440 (  1.6%)   7.7960 (  2.0%)   7.8317 (  2.0%)  Value Propagation
   7.5040 (  1.9%)   0.0360 (  1.3%)   7.5400 (  1.9%)   7.4545 (  1.9%)  SLP Vectorizer
   6.8080 (  1.7%)   0.0240 (  0.9%)   6.8320 (  1.7%)   6.8237 (  1.7%)  Loop Invariant Code Motion
   6.7880 (  1.7%)   0.1160 (  4.3%)   6.9040 (  1.8%)   6.7762 (  1.7%)  Combine redundant instructions
   6.5240 (  1.7%)   0.0880 (  3.3%)   6.6120 (  1.7%)   6.6196 (  1.7%)  Induction Variable Simplification
   6.3320 (  1.6%)   0.0360 (  1.3%)   6.3680 (  1.6%)   6.4497 (  1.6%)  Memory SSA
   6.2720 (  1.6%)   0.0080 (  0.3%)   6.2800 (  1.6%)   6.2596 (  1.6%)  Value Propagation
   5.9040 (  1.5%)   0.0480 (  1.8%)   5.9520 (  1.5%)   6.0199 (  1.5%)  Early CSE w/ MemorySSA
   5.1880 (  1.3%)   0.0080 (  0.3%)   5.1960 (  1.3%)   5.1699 (  1.3%)  Loop Invariant Code Motion
   4.4320 (  1.1%)   0.0040 (  0.1%)   4.4360 (  1.1%)   4.4453 (  1.1%)  Loop Invariant Code Motion
   4.0040 (  1.0%)   0.0200 (  0.7%)   4.0240 (  1.0%)   4.0277 (  1.0%)  Greedy Register Allocator
   3.7680 (  1.0%)   0.0040 (  0.1%)   3.7720 (  1.0%)   3.7186 (  0.9%)  CodeGen Prepare
   3.3680 (  0.9%)   0.0200 (  0.7%)   3.3880 (  0.9%)   3.4240 (  0.9%)  MemCpy Optimization
   3.3920 (  0.9%)   0.0120 (  0.4%)   3.4040 (  0.9%)   3.3959 (  0.9%)  Machine Instruction Scheduler
   3.1280 (  0.8%)   0.0120 (  0.4%)   3.1400 (  0.8%)   3.1442 (  0.8%)  Induction Variable Users
   2.5080 (  0.6%)   0.0360 (  1.3%)   2.5440 (  0.6%)   2.5870 (  0.7%)  SROA
   2.4840 (  0.6%)   0.0320 (  1.2%)   2.5160 (  0.6%)   2.5860 (  0.7%)  Unroll loops
   2.3200 (  0.6%)   0.0160 (  0.6%)   2.3360 (  0.6%)   2.3900 (  0.6%)  Jump Threading
   2.3520 (  0.6%)   0.0200 (  0.7%)   2.3720 (  0.6%)   2.3781 (  0.6%)  Jump Threading
   2.1680 (  0.6%)   0.0240 (  0.9%)   2.1920 (  0.6%)   2.2602 (  0.6%)  Reassociate expressions
   2.1120 (  0.5%)   0.0160 (  0.6%)   2.1280 (  0.5%)   2.0903 (  0.5%)  Loop Load Elimination
   2.0600 (  0.5%)   0.0240 (  0.9%)   2.0840 (  0.5%)   2.0447 (  0.5%)  SROA
   2.0080 (  0.5%)   0.0000 (  0.0%)   2.0080 (  0.5%)   1.9710 (  0.5%)  Loop Vectorization
   1.5600 (  0.4%)   0.0160 (  0.6%)   1.5760 (  0.4%)   1.7095 (  0.4%)  Simplify the CFG
   1.6880 (  0.4%)   0.0200 (  0.7%)   1.7080 (  0.4%)   1.6968 (  0.4%)  Simplify the CFG
   1.4960 (  0.4%)   0.0080 (  0.3%)   1.5040 (  0.4%)   1.5448 (  0.4%)  Remove redundant instructions
   1.4880 (  0.4%)   0.0200 (  0.7%)   1.5080 (  0.4%)   1.4811 (  0.4%)  Module Verifier
   1.4680 (  0.4%)   0.0160 (  0.6%)   1.4840 (  0.4%)   1.4391 (  0.4%)  Module Verifier
   1.4120 (  0.4%)   0.0120 (  0.4%)   1.4240 (  0.4%)   1.4021 (  0.4%)  Module Verifier
   1.3120 (  0.3%)   0.0040 (  0.1%)   1.3160 (  0.3%)   1.3868 (  0.4%)  Simplify the CFG
   1.3240 (  0.3%)   0.0160 (  0.6%)   1.3400 (  0.3%)   1.3271 (  0.3%)  Simplify the CFG
   1.2680 (  0.3%)   0.0200 (  0.7%)   1.2880 (  0.3%)   1.3067 (  0.3%)  Early CSE
   1.1160 (  0.3%)   0.0160 (  0.6%)   1.1320 (  0.3%)   1.1449 (  0.3%)  Sparse Conditional Constant Propagation

There are small differences between their execution times, but none of them to me stand out. The total difference in wall-clock time of these runs is `3.16` seconds.


https://reviews.llvm.org/D42717





More information about the llvm-commits mailing list