[llvm] r332708 - [llvm-exegesis] Improve documentation.

Fri May 18 05:33:57 PDT 2018

Author: courbet
Date: Fri May 18 05:33:57 2018
New Revision: 332708

URL: http://llvm.org/viewvc/llvm-project?rev=332708&view=rev
Log:
[llvm-exegesis] Improve documentation.

Summary:
- Better flag names.
- Fix flag reference in doc.
- Add usage examples in doc.

Fixes PR37497.

Reviewers: gchatelet

Subscribers: llvm-commits, tschuett

Differential Revision: https://reviews.llvm.org/D47015

Modified:
    llvm/trunk/docs/CommandGuide/llvm-exegesis.rst
    llvm/trunk/tools/llvm-exegesis/llvm-exegesis.cpp

Modified: llvm/trunk/docs/CommandGuide/llvm-exegesis.rst
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/docs/CommandGuide/llvm-exegesis.rst?rev=332708&r1=332707&r2=332708&view=diff
==============================================================================

--- llvm/trunk/docs/CommandGuide/llvm-exegesis.rst (original)
+++ llvm/trunk/docs/CommandGuide/llvm-exegesis.rst Fri May 18 05:33:57 2018
@@ -22,7 +22,116 @@ The code snippet is jitted and executed
 result is printed out as YAML to the standard output.
 
 The main goal of this tool is to automatically (in)validate the LLVM's TableDef
-scheduling models.
+scheduling models. To that end, we also provide analysis of the results.
+
+EXAMPLES: benchmarking
+----------------------
+
+Assume you have an X86-64 machine. To measure the latency of a single
+instruction, run:
+
+.. code-block:: bash
+
+    $ llvm-exegesis -mode=latency -opcode-name=ADD64rr
+
+Measuring the uop decomposition of an instruction works similarly:
+
+.. code-block:: bash
+
+    $ llvm-exegesis -mode=uops -opcode-name=ADD64rr
+
+The output is a YAML document (the default is to write to stdout, but you can
+redirect the output to a file using `-benchmarks-file`):
+
+.. code-block:: none
+
+  ---
+  key:
+    opcode_name:     ADD64rr
+    mode:            latency
+    config:          ''
+  cpu_name:        haswell
+  llvm_triple:     x86_64-unknown-linux-gnu
+  num_repetitions: 10000
+  measurements:
+    - { key: latency, value: 1.0058, debug_string: '' }
+  error:           ''
+  info:            'explicit self cycles, selecting one aliasing configuration.
+  Snippet:
+  ADD64rr R8, R8, R10
+  '
+  ...
+
+To measure the latency of all instructions for the host architecture, run:
+
+.. code-block:: bash
+
+  #!/bin/bash
+  readonly INSTRUCTIONS=$(grep INSTRUCTION_LIST_END build/lib/Target/X86/X86GenInstrInfo.inc | cut -f2 -d=)
+  for INSTRUCTION in $(seq 1 ${INSTRUCTIONS});
+  do
+    ./build/bin/llvm-exegesis -mode=latency -opcode-index=${INSTRUCTION} | sed -n '/---/,$p'
+  done
+
+FIXME: Provide an :program:`llvm-exegesis` option to test all instructions.
+
+EXAMPLES: analysis
+----------------------
+
+Assuming you have a set of benchmarked instructions (either latency or uops) as
+YAML in file `/tmp/benchmarks.yaml`, you can analyze the results using the
+following command:
+
+.. code-block:: bash
+
+    $ llvm-exegesis -mode=analysis \
+  -benchmarks-file=/tmp/benchmarks.yaml \
+  -analysis-clusters-output-file=/tmp/clusters.csv \
+  -analysis-inconsistencies-output-file=/tmp/inconsistencies.txt
+
+This will group the instructions into clusters with the same performance
+characteristics. The clusters will be written out to `/tmp/clusters.csv` in the
+following format:
+
+.. code-block:: none
+
+  cluster_id,opcode_name,config,sched_class
+  ...
+  2,ADD32ri8_DB,,WriteALU,1.00
+  2,ADD32ri_DB,,WriteALU,1.01
+  2,ADD32rr,,WriteALU,1.01
+  2,ADD32rr_DB,,WriteALU,1.00
+  2,ADD32rr_REV,,WriteALU,1.00
+  2,ADD64i32,,WriteALU,1.01
+  2,ADD64ri32,,WriteALU,1.01
+  2,MOVSX64rr32,,BSWAP32r_BSWAP64r_MOVSX64rr32,1.00
+  2,VPADDQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.02
+  2,VPSUBQYrr,,VPADDBYrr_VPADDDYrr_VPADDQYrr_VPADDWYrr_VPSUBBYrr_VPSUBDYrr_VPSUBQYrr_VPSUBWYrr,1.01
+  2,ADD64ri8,,WriteALU,1.00
+  2,SETBr,,WriteSETCC,1.01
+  ...
+
+:program:`llvm-exegesis` will also analyze the clusters to point out
+inconsistencies in the scheduling information. For example,
+`/tmp/inconsistencies.txt` will contain messages like:
+
+.. code-block:: none
+
+  Sched Class EXTRACTPSrr_VEXTRACTPSrr contains instructions with distinct performance characteristics, falling into 2 clusters:
+  4,EXTRACTPSrr,,3.00
+  3,VEXTRACTPSrr,,2.01
+
+  Sched Class WriteCRC32 contains instructions with distinct performance characteristics, falling into 2 clusters:
+  4,CRC32r32r16,,3.01
+  4,CRC32r32r32,,3.00
+  11,CRC32r32r8,,4.01
+  4,CRC32r64r64,,3.01
+  4,CRC32r64r8,,3.00
+
+Note that the scheduling class names will be resolved only when
+:program:`llvm-exegesis` is compiled in debug mode, else only the class id will
+be shown. This does not invalidate any of the analysis results though.
+
 
 OPTIONS
 -------
@@ -41,15 +150,40 @@ OPTIONS
  Specify the opcode to measure, by name.
  Either `opcode-index` or `opcode-name` must be set.
 
-.. option:: -benchmark-mode=[Latency|Uops]
+.. option:: -mode=[latency|uops|analysis]
 
- Specify which characteristic of the opcode to measure.
+ Specify the run mode.
 
 .. option:: -num-repetitions=<Number of repetition>
 
  Specify the number of repetitions of the asm snippet.
  Higher values lead to more accurate measurements but lengthen the benchmark.
 
+ .. option:: -benchmarks-file=</path/to/file>
+
+ File to read (`analysis` mode) or write (`latency`/`uops` modes) benchmark
+ results. "-" uses stdin/stdout.
+
+.. option:: -analysis-clusters-output-file=</path/to/file>
+
+ If provided, write the analysis clusters as CSV to this file. "-" prints to
+ stdout.
+
+.. option:: -analysis-inconsistencies-output-file=</path/to/file>
+
+ If non-empty, write inconsistencies found during analysis to this file. `-`
+ prints to stdout.
+
+.. option:: -analysis-numpoints=<dbscan numPoints parameter>
+
+ Specify the numPoints parameters to be used for DBSCAN clustering
+ (`analysis` mode).
+
+.. option:: -analysis-espilon=<dbscan epsilon parameter>
+
+ Specify the numPoints parameters to be used for DBSCAN clustering
+ (`analysis` mode).
+
 
 EXIT STATUS
 -----------

Modified: llvm/trunk/tools/llvm-exegesis/llvm-exegesis.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/tools/llvm-exegesis/llvm-exegesis.cpp?rev=332708&r1=332707&r2=332708&view=diff
==============================================================================
--- llvm/trunk/tools/llvm-exegesis/llvm-exegesis.cpp (original)
+++ llvm/trunk/tools/llvm-exegesis/llvm-exegesis.cpp Fri May 18 05:33:57 2018
@@ -49,7 +49,7 @@ static llvm::cl::opt<std::string>
 
 enum class BenchmarkModeE { Latency, Uops, Analysis };
 static llvm::cl::opt<BenchmarkModeE> BenchmarkMode(
-    "benchmark-mode", llvm::cl::desc("the benchmark mode to run"),
+    "mode", llvm::cl::desc("the mode to run"),
     llvm::cl::values(
         clEnumValN(BenchmarkModeE::Latency, "latency", "Instruction Latency"),
         clEnumValN(BenchmarkModeE::Uops, "uops", "Uop Decomposition"),