[PATCH] D58476: [llvm-exegesis] Split Epsilon param into two (PR40787)

Roman Lebedev via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Feb 20 13:51:53 PST 2019


lebedev.ri created this revision.
lebedev.ri added reviewers: courbet, gchatelet.
lebedev.ri added a project: LLVM.
Herald added a subscriber: tschuett.

This eps param is used for two distinct things:

- initial point clusterization
- checking clusters against the llvm values

What if one wants to only look at highly different clusters, without changing
the clustering itself? In particular, this helps to weed out noisy measurements
(since the clusterization epsilon is still small, so there is a better chance
that noisy measurements from the same opcode will go into different clusters)

By splitting it into two params it is now possible.

This is nearly-free performance-wise:
Old:

  $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
  no exegesis target for x86_64-unknown-linux-gnu, using default
  Parsed 10099 benchmark points
  Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
  ...
   Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs):
  
              390.01 msec task-clock                #    0.998 CPUs utilized            ( +-  0.25% )
                  12      context-switches          #   31.735 M/sec                    ( +- 27.38% )
                   0      cpu-migrations            #    0.000 K/sec                  
                4745      page-faults               # 12183.732 M/sec                   ( +-  0.54% )
          1562711900      cycles                    # 4012303.327 GHz                   ( +-  0.24% )  (82.90%)
           185567822      stalled-cycles-frontend   #   11.87% frontend cycles idle     ( +-  0.52% )  (83.30%)
           392106234      stalled-cycles-backend    #   25.09% backend cycles idle      ( +-  1.31% )  (33.79%)
          1839236666      instructions              #    1.18  insn per cycle         
                                                    #    0.21  stalled cycles per insn  ( +-  0.15% )  (50.37%)
           407035764      branches                  # 1045074878.710 M/sec              ( +-  0.12% )  (66.80%)
            10896459      branch-misses             #    2.68% of all branches          ( +-  0.17% )  (83.20%)
  
            0.390629 +- 0.000972 seconds time elapsed  ( +-  0.25% )

  $ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
  no exegesis target for x86_64-unknown-linux-gnu, using default
  Parsed 50572 benchmark points
  Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
  ...
   Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (9 runs):
  
             6803.36 msec task-clock                #    0.999 CPUs utilized            ( +-  0.96% )
                 262      context-switches          #   38.546 M/sec                    ( +- 23.06% )
                   0      cpu-migrations            #    0.065 M/sec                    ( +- 76.03% )
               13287      page-faults               # 1953.206 M/sec                    ( +-  0.32% )
         27252537904      cycles                    # 4006024.257 GHz                   ( +-  0.95% )  (83.31%)
          1496314935      stalled-cycles-frontend   #    5.49% frontend cycles idle     ( +-  0.97% )  (83.32%)
         16128404524      stalled-cycles-backend    #   59.18% backend cycles idle      ( +-  0.30% )  (33.37%)
         17611143370      instructions              #    0.65  insn per cycle         
                                                    #    0.92  stalled cycles per insn  ( +-  0.05% )  (50.04%)
          3894906599      branches                  # 572537147.437 M/sec               ( +-  0.03% )  (66.69%)
           116314514      branch-misses             #    2.99% of all branches          ( +-  0.20% )  (83.35%)
  
              6.8118 +- 0.0689 seconds time elapsed  ( +-  1.01%)

New:

  $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html
  no exegesis target for x86_64-unknown-linux-gnu, using default
  Parsed 10099 benchmark points
  Printing sched class consistency analysis results to file '/tmp/clusters-new.html'
  ...
   Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (25 runs):
  
              400.14 msec task-clock                #    0.998 CPUs utilized            ( +-  0.66% )
                  12      context-switches          #   29.429 M/sec                    ( +- 25.95% )
                   0      cpu-migrations            #    0.100 M/sec                    ( +-100.00% )
                4714      page-faults               # 11796.496 M/sec                   ( +-  0.55% )
          1603131306      cycles                    # 4011840.105 GHz                   ( +-  0.66% )  (82.85%)
           199538509      stalled-cycles-frontend   #   12.45% frontend cycles idle     ( +-  2.40% )  (83.10%)
           402249109      stalled-cycles-backend    #   25.09% backend cycles idle      ( +-  1.19% )  (34.05%)
          1847783963      instructions              #    1.15  insn per cycle         
                                                    #    0.22  stalled cycles per insn  ( +-  0.18% )  (50.64%)
           407162722      branches                  # 1018925730.631 M/sec              ( +-  0.12% )  (67.02%)
            10932779      branch-misses             #    2.69% of all branches          ( +-  0.51% )  (83.28%)
  
             0.40077 +- 0.00267 seconds time elapsed  ( +-  0.67% )
  
  lebedevri at pini-pini:/build/llvm-build-Clang-release$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html
  no exegesis target for x86_64-unknown-linux-gnu, using default
  Parsed 50572 benchmark points
  Printing sched class consistency analysis results to file '/tmp/clusters-new.html'
  ...
   Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (9 runs):
  
             6947.79 msec task-clock                #    1.000 CPUs utilized            ( +-  0.90% )
                 217      context-switches          #   31.236 M/sec                    ( +- 36.16% )
                   1      cpu-migrations            #    0.096 M/sec                    ( +- 50.00% )
               13258      page-faults               # 1908.389 M/sec                    ( +-  0.34% )
         27830796523      cycles                    # 4006032.286 GHz                   ( +-  0.89% )  (83.30%)
          1504554006      stalled-cycles-frontend   #    5.41% frontend cycles idle     ( +-  2.10% )  (83.32%)
         16716574843      stalled-cycles-backend    #   60.07% backend cycles idle      ( +-  0.65% )  (33.38%)
         17755545931      instructions              #    0.64  insn per cycle         
                                                    #    0.94  stalled cycles per insn  ( +-  0.09% )  (50.04%)
          3897255686      branches                  # 560980426.597 M/sec               ( +-  0.06% )  (66.70%)
           117045395      branch-misses             #    3.00% of all branches          ( +-  0.47% )  (83.34%)
  
              6.9507 +- 0.0627 seconds time elapsed  ( +-  0.90% )

I.e. it's +2.6% slowdown for one whole sweep, or +2% for 5 whole sweeps.
Within noise i'd say.

Should help with PR40787 <https://bugs.llvm.org/show_bug.cgi?id=40787>.


Repository:
  rL LLVM

https://reviews.llvm.org/D58476

Files:
  docs/CommandGuide/llvm-exegesis.rst
  test/tools/llvm-exegesis/X86/analysis-cluster-stabilization.test
  test/tools/llvm-exegesis/X86/analysis-epsilons.test
  tools/llvm-exegesis/lib/Analysis.cpp
  tools/llvm-exegesis/lib/Analysis.h
  tools/llvm-exegesis/lib/Clustering.cpp
  tools/llvm-exegesis/lib/Clustering.h
  tools/llvm-exegesis/llvm-exegesis.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D58476.187664.patch
Type: text/x-patch
Size: 16199 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20190220/327f57e1/attachment.bin>


More information about the llvm-commits mailing list