[PATCH] D58355: [llvm-exegesis] Opcode stabilization / reclusterization (PR40715)

Roman Lebedev via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Feb 18 08:04:04 PST 2019


lebedev.ri created this revision.
lebedev.ri added reviewers: courbet, gchatelet.
lebedev.ri added a project: LLVM.
Herald added subscribers: jdoerfert, tschuett.

Given an instruction `Opcode`, we can make benchmarks (measurements) of the
instruction characteristics/performance. Then, to facilitate further analysis
we group the benchmarks with *similar* characteristics into clusters.
Now, this is all not entirely deterministic. Some instructions have variable
characteristics, depending on their arguments. And thus, if we do several
benchmarks of the same instruction `Opcode`, we may end up with *different*
performance characteristics measurements. And when we then do clustering,
these several benchmarks of the same instruction `Opcode` may end up being
clustered into *different* clusters. This is not great for further analysis.

We shall find every `Opcode` with benchmarks not in just one cluster, and move
*all* the benchmarks of said `Opcode` into one new unstable cluster per `Opcode`.

I have solved this by making `ClusterId` a bit field, adding a `IsUnstable` bit,
and introducing `-analysis-display-unstable-clusters` switch to toggle between
displaying stable-only clusters and unstable-only clusters.

The reclusterization is deterministically stable, produces identical reports
between runs. (Or at least that is what i'm seeing, maybe it isn't)

Timings/comparisons:
old (current trunk/head) F8303582: clusters-old.html <https://reviews.llvm.org/F8303582>

  $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
  no exegesis target for x86_64-unknown-linux-gnu, using default
  Parsed 43970 benchmark points
  Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
  ...
  no exegesis target for x86_64-unknown-linux-gnu, using default
  Parsed 43970 benchmark points
  Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
  
   Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs):
  
             6624.73 msec task-clock                #    0.999 CPUs utilized            ( +-  0.53% )
                 172      context-switches          #   25.965 M/sec                    ( +- 29.89% )
                   0      cpu-migrations            #    0.042 M/sec                    ( +- 56.54% )
               31073      page-faults               # 4690.754 M/sec                    ( +-  0.08% )
         26538711696      cycles                    # 4006230.292 GHz                   ( +-  0.53% )  (83.31%)
          2017496807      stalled-cycles-frontend   #    7.60% frontend cycles idle     ( +-  0.93% )  (83.32%)
         13403650062      stalled-cycles-backend    #   50.51% backend cycles idle      ( +-  0.33% )  (33.37%)
         19770706799      instructions              #    0.74  insn per cycle         
                                                    #    0.68  stalled cycles per insn  ( +-  0.04% )  (50.04%)
          4419821812      branches                  # 667207369.714 M/sec               ( +-  0.03% )  (66.69%)
           121741669      branch-misses             #    2.75% of all branches          ( +-  0.28% )  (83.34%)
  
              6.6283 +- 0.0358 seconds time elapsed  ( +-  0.54% )

patch, with reclustering but without filtering (i.e. outputting all the stable *and* unstable clusters) F8303586: clusters-new-all.html <https://reviews.llvm.org/F8303586>

  $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html
  no exegesis target for x86_64-unknown-linux-gnu, using default
  Parsed 43970 benchmark points
  Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html'
  ...
  no exegesis target for x86_64-unknown-linux-gnu, using default
  Parsed 43970 benchmark points
  Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html'
  
   Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html' (25 runs):
  
             6475.29 msec task-clock                #    0.999 CPUs utilized            ( +-  0.31% )
                 213      context-switches          #   32.952 M/sec                    ( +- 23.81% )
                   1      cpu-migrations            #    0.130 M/sec                    ( +- 43.84% )
               31287      page-faults               # 4832.057 M/sec                    ( +-  0.08% )
         25939086577      cycles                    # 4006160.279 GHz                   ( +-  0.31% )  (83.31%)
          1958812858      stalled-cycles-frontend   #    7.55% frontend cycles idle     ( +-  0.68% )  (83.32%)
         13218961512      stalled-cycles-backend    #   50.96% backend cycles idle      ( +-  0.29% )  (33.37%)
         19752995402      instructions              #    0.76  insn per cycle         
                                                    #    0.67  stalled cycles per insn  ( +-  0.04% )  (50.04%)
          4417079244      branches                  # 682195472.305 M/sec               ( +-  0.03% )  (66.70%)
           121510065      branch-misses             #    2.75% of all branches          ( +-  0.19% )  (83.34%)
  
              6.4832 +- 0.0229 seconds time elapsed  ( +-  0.35% )

Funnily, *this* measurement shows that said reclustering actually improved performance.

patch, with reclustering, only the stable clusters F8303594: clusters-new-stable.html <https://reviews.llvm.org/F8303594>

  $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html
  no exegesis target for x86_64-unknown-linux-gnu, using default
  Parsed 43970 benchmark points
  Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html'
  ...
  no exegesis target for x86_64-unknown-linux-gnu, using default
  Parsed 43970 benchmark points
  Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html'
  
   Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html' (25 runs):
  
             6387.71 msec task-clock                #    0.999 CPUs utilized            ( +-  0.13% )
                 133      context-switches          #   20.792 M/sec                    ( +- 23.39% )
                   0      cpu-migrations            #    0.063 M/sec                    ( +- 61.24% )
               31318      page-faults               # 4903.256 M/sec                    ( +-  0.08% )
         25591984967      cycles                    # 4006786.266 GHz                   ( +-  0.13% )  (83.31%)
          1881234904      stalled-cycles-frontend   #    7.35% frontend cycles idle     ( +-  0.25% )  (83.33%)
         13209749965      stalled-cycles-backend    #   51.62% backend cycles idle      ( +-  0.16% )  (33.36%)
         19767554347      instructions              #    0.77  insn per cycle         
                                                    #    0.67  stalled cycles per insn  ( +-  0.04% )  (50.03%)
          4417480305      branches                  # 691618858.046 M/sec               ( +-  0.03% )  (66.68%)
           118676358      branch-misses             #    2.69% of all branches          ( +-  0.07% )  (83.33%)
  
              6.3954 +- 0.0118 seconds time elapsed  ( +-  0.18% )

Performance improved even further?! Makes sense i guess, less clusters to print.

patch, with reclustering, only the unstable clusters F8303601: clusters-new-unstable.html <https://reviews.llvm.org/F8303601>

  $ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters
  no exegesis target for x86_64-unknown-linux-gnu, using default
  Parsed 43970 benchmark points
  Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html'
  ...
  no exegesis target for x86_64-unknown-linux-gnu, using default
  Parsed 43970 benchmark points
  Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html'
  
   Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters' (25 runs):
  
             6124.96 msec task-clock                #    1.000 CPUs utilized            ( +-  0.20% )
                 194      context-switches          #   31.709 M/sec                    ( +- 20.46% )
                   0      cpu-migrations            #    0.039 M/sec                    ( +- 49.77% )
               31413      page-faults               # 5129.261 M/sec                    ( +-  0.06% )
         24536794267      cycles                    # 4006425.858 GHz                   ( +-  0.19% )  (83.31%)
          1676085087      stalled-cycles-frontend   #    6.83% frontend cycles idle     ( +-  0.46% )  (83.32%)
         13035595603      stalled-cycles-backend    #   53.13% backend cycles idle      ( +-  0.16% )  (33.36%)
         18260877653      instructions              #    0.74  insn per cycle         
                                                    #    0.71  stalled cycles per insn  ( +-  0.05% )  (50.03%)
          4112411983      branches                  # 671484364.603 M/sec               ( +-  0.03% )  (66.68%)
           114066929      branch-misses             #    2.77% of all branches          ( +-  0.11% )  (83.32%)
  
              6.1278 +- 0.0121 seconds time elapsed  ( +-  0.20% )

This tells us that the actual `-analysis-inconsistencies-output-file=` outputting only takes ~0.4 sec for 43970 benchmark points (3 whole sweeps)
(Also, wow this is fast, it used to take several minutes originally)

Fixes PR40715 <https://bugs.llvm.org/show_bug.cgi?id=40715>.


Repository:
  rL LLVM

https://reviews.llvm.org/D58355

Files:
  docs/CommandGuide/llvm-exegesis.rst
  test/tools/llvm-exegesis/X86/analysis-cluster-stabilization.test
  tools/llvm-exegesis/lib/Analysis.cpp
  tools/llvm-exegesis/lib/Analysis.h
  tools/llvm-exegesis/lib/BenchmarkResult.h
  tools/llvm-exegesis/lib/Clustering.cpp
  tools/llvm-exegesis/lib/Clustering.h
  tools/llvm-exegesis/llvm-exegesis.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D58355.187245.patch
Type: text/x-patch
Size: 20319 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20190218/671ff330/attachment.bin>


More information about the llvm-commits mailing list