[llvm] 43f4e2b - [Docs][llvm-exegesis] Use double dash long options
Aiden Grossman via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 11 16:17:25 PDT 2023
Author: Aiden Grossman
Date: 2023-08-11T23:17:07Z
New Revision: 43f4e2be8963a8bc0f273eca82f5fb1a59927c59
URL: https://github.com/llvm/llvm-project/commit/43f4e2be8963a8bc0f273eca82f5fb1a59927c59
DIFF: https://github.com/llvm/llvm-project/commit/43f4e2be8963a8bc0f273eca82f5fb1a59927c59.diff
LOG: [Docs][llvm-exegesis] Use double dash long options
Currently the llvm-exegesis docs use a mix of double dash and single
dash options with seemingly no pattern. This patch makes everything
double dash options as it has been suggested that we should be
advertising double dash long options exclusively in the documentation.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D157641
Added:
Modified:
llvm/docs/CommandGuide/llvm-exegesis.rst
Removed:
################################################################################
diff --git a/llvm/docs/CommandGuide/llvm-exegesis.rst b/llvm/docs/CommandGuide/llvm-exegesis.rst
index 9650df4023d86b..f44f3a74ffbe3d 100644
--- a/llvm/docs/CommandGuide/llvm-exegesis.rst
+++ b/llvm/docs/CommandGuide/llvm-exegesis.rst
@@ -89,18 +89,18 @@ instruction, run:
.. code-block:: bash
- $ llvm-exegesis -mode=latency -opcode-name=ADD64rr
+ $ llvm-exegesis --mode=latency --opcode-name=ADD64rr
Measuring the uop decomposition or inverse throughput of an instruction works similarly:
.. code-block:: bash
- $ llvm-exegesis -mode=uops -opcode-name=ADD64rr
- $ llvm-exegesis -mode=inverse_throughput -opcode-name=ADD64rr
+ $ llvm-exegesis --mode=uops --opcode-name=ADD64rr
+ $ llvm-exegesis --mode=inverse_throughput --opcode-name=ADD64rr
The output is a YAML document (the default is to write to stdout, but you can
-redirect the output to a file using `-benchmarks-file`):
+redirect the output to a file using `--benchmarks-file`):
.. code-block:: none
@@ -125,7 +125,7 @@ To measure the latency of all instructions for the host architecture, run:
.. code-block:: bash
- $ llvm-exegesis -mode=latency -opcode-index=-1
+ $ llvm-exegesis --mode=latency --opcode-index=-1
EXAMPLE 2: benchmarking a custom code snippet
@@ -136,7 +136,7 @@ To measure the latency/uops of a custom piece of code, you can specify the
.. code-block:: bash
- $ echo "vzeroupper" | llvm-exegesis -mode=uops -snippets-file=-
+ $ echo "vzeroupper" | llvm-exegesis --mode=uops --snippets-file=-
Real-life code snippets typically depend on registers or memory.
:program:`llvm-exegesis` checks the liveliness of registers (i.e. any register
@@ -189,10 +189,10 @@ following command:
.. code-block:: bash
- $ llvm-exegesis -mode=analysis \
- -benchmarks-file=/tmp/benchmarks.yaml \
- -analysis-clusters-output-file=/tmp/clusters.csv \
- -analysis-inconsistencies-output-file=/tmp/inconsistencies.html
+ $ llvm-exegesis --mode=analysis \
+ --benchmarks-file=/tmp/benchmarks.yaml \
+ --analysis-clusters-output-file=/tmp/clusters.csv \
+ --analysis-inconsistencies-output-file=/tmp/inconsistencies.html
This will group the instructions into clusters with the same performance
characteristics. The clusters will be written out to `/tmp/clusters.csv` in the
@@ -230,28 +230,28 @@ be shown. This does not invalidate any of the analysis results though.
OPTIONS
-------
-.. option:: -help
+.. option:: --help
Print a summary of command line options.
-.. option:: -opcode-index=<LLVM opcode index>
+.. option:: --opcode-index=<LLVM opcode index>
Specify the opcode to measure, by index. Specifying `-1` will result
in measuring every existing opcode. See example 1 for details.
Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
-.. option:: -opcode-name=<opcode name 1>,<opcode name 2>,...
+.. option:: --opcode-name=<opcode name 1>,<opcode name 2>,...
Specify the opcode to measure, by name. Several opcodes can be specified as
a comma-separated list. See example 1 for details.
Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
-.. option:: -snippets-file=<filename>
+.. option:: --snippets-file=<filename>
Specify the custom code snippet to measure. See example 2 for details.
Either `opcode-index`, `opcode-name` or `snippets-file` must be set.
-.. option:: -mode=[latency|uops|inverse_throughput|analysis]
+.. option:: --mode=[latency|uops|inverse_throughput|analysis]
Specify the run mode. Note that some modes have additional requirements and options.
@@ -274,7 +274,7 @@ OPTIONS
* ``assemble-measured-code``: Same as ``prepare-and-assemble-snippet``. but also creates the full sequence that can be dumped to a file using ``--dump-object-to-disk``.
* ``measure``: Same as ``assemble-measured-code``, but also runs the measurement.
-.. option:: -x86-lbr-sample-period=<nBranches/sample>
+.. option:: --x86-lbr-sample-period=<nBranches/sample>
Specify the LBR sampling period - how many branches before we take a sample.
When a positive value is specified for this option and when the mode is `latency`,
@@ -283,7 +283,7 @@ OPTIONS
could occur if the sampling is too frequent. A prime number should be used to
avoid consistently skipping certain blocks.
-.. option:: -x86-disable-upper-sse-registers
+.. option:: --x86-disable-upper-sse-registers
Using the upper xmm registers (xmm8-xmm15) forces a longer instruction encoding
which may put greater pressure on the frontend fetch and decode stages,
@@ -292,7 +292,7 @@ OPTIONS
enabled can help determine the effects of the frontend and can be used to
improve latency and throughput estimates.
-.. option:: -repetition-mode=[duplicate|loop|min]
+.. option:: --repetition-mode=[duplicate|loop|min]
Specify the repetition mode. `duplicate` will create a large, straight line
basic block with `num-repetitions` instructions (repeating the snippet
@@ -307,13 +307,13 @@ OPTIONS
instead use the `min` mode, which will run each other mode,
and produce the minimal measured result.
-.. option:: -num-repetitions=<Number of repetitions>
+.. option:: --num-repetitions=<Number of repetitions>
Specify the target number of executed instructions. Note that the actual
repetition count of the snippet will be `num-repetitions`/`snippet size`.
Higher values lead to more accurate measurements but lengthen the benchmark.
-.. option:: -loop-body-size=<Preferred loop body size>
+.. option:: --loop-body-size=<Preferred loop body size>
Only effective for `-repetition-mode=[loop|min]`.
Instead of looping over the snippet directly, first duplicate it so that the
@@ -321,7 +321,7 @@ OPTIONS
in loop body being cached in the CPU Op Cache / Loop Cache, which allows to
which may have higher throughput than the CPU decoders.
-.. option:: -max-configs-per-opcode=<value>
+.. option:: --max-configs-per-opcode=<value>
Specify the maximum configurations that can be generated for each opcode.
By default this is `1`, meaning that we assume that a single measurement is
@@ -333,22 +333,22 @@ OPTIONS
lead to
diff erent performance characteristics.
-.. option:: -benchmarks-file=</path/to/file>
+.. option:: --benchmarks-file=</path/to/file>
File to read (`analysis` mode) or write (`latency`/`uops`/`inverse_throughput`
modes) benchmark results. "-" uses stdin/stdout.
-.. option:: -analysis-clusters-output-file=</path/to/file>
+.. option:: --analysis-clusters-output-file=</path/to/file>
If provided, write the analysis clusters as CSV to this file. "-" prints to
stdout. By default, this analysis is not run.
-.. option:: -analysis-inconsistencies-output-file=</path/to/file>
+.. option:: --analysis-inconsistencies-output-file=</path/to/file>
If non-empty, write inconsistencies found during analysis to this file. `-`
prints to stdout. By default, this analysis is not run.
-.. option:: -analysis-filter=[all|reg-only|mem-only]
+.. option:: --analysis-filter=[all|reg-only|mem-only]
By default, all benchmark results are analysed, but sometimes it may be useful
to only look at those that to not involve memory, or vice versa. This option
@@ -356,44 +356,44 @@ OPTIONS
ones that do involve memory (involve instructions that may read or write to
memory), or the opposite, to only keep such benchmarks.
-.. option:: -analysis-clustering=[dbscan,naive]
+.. option:: --analysis-clustering=[dbscan,naive]
Specify the clustering algorithm to use. By default DBSCAN will be used.
Naive clustering algorithm is better for doing further work on the
`-analysis-inconsistencies-output-file=` output, it will create one cluster
per opcode, and check that the cluster is stable (all points are neighbours).
-.. option:: -analysis-numpoints=<dbscan numPoints parameter>
+.. option:: --analysis-numpoints=<dbscan numPoints parameter>
Specify the numPoints parameters to be used for DBSCAN clustering
(`analysis` mode, DBSCAN only).
-.. option:: -analysis-clustering-epsilon=<dbscan epsilon parameter>
+.. option:: --analysis-clustering-epsilon=<dbscan epsilon parameter>
Specify the epsilon parameter used for clustering of benchmark points
(`analysis` mode).
-.. option:: -analysis-inconsistency-epsilon=<epsilon>
+.. option:: --analysis-inconsistency-epsilon=<epsilon>
Specify the epsilon parameter used for detection of when the cluster
is
diff erent from the LLVM schedule profile values (`analysis` mode).
-.. option:: -analysis-display-unstable-clusters
+.. option:: --analysis-display-unstable-clusters
If there is more than one benchmark for an opcode, said benchmarks may end up
not being clustered into the same cluster if the measured performance
characteristics are
diff erent. by default all such opcodes are filtered out.
This flag will instead show only such unstable opcodes.
-.. option:: -ignore-invalid-sched-class=false
+.. option:: --ignore-invalid-sched-class=false
If set, ignore instructions that do not have a sched class (class idx = 0).
-.. option:: -mtriple=<triple name>
+.. option:: --mtriple=<triple name>
Target triple. See `-version` for available targets.
-.. option:: -mcpu=<cpu name>
+.. option:: --mcpu=<cpu name>
If set, measure the cpu characteristics using the counters for this CPU. This
is useful when creating new sched models (the host CPU is unknown to LLVM).
More information about the llvm-commits
mailing list