[libcxx-commits] [libcxx] a6af641 - [libc++] Update utilities to compare benchmarks (#157556)
via libcxx-commits
libcxx-commits at lists.llvm.org
Tue Sep 9 07:52:48 PDT 2025
Author: Louis Dionne
Date: 2025-09-09T10:52:44-04:00
New Revision: a6af641b89d6fbfdaaa74406a3d1354d24c58393
URL: https://github.com/llvm/llvm-project/commit/a6af641b89d6fbfdaaa74406a3d1354d24c58393
DIFF: https://github.com/llvm/llvm-project/commit/a6af641b89d6fbfdaaa74406a3d1354d24c58393.diff
LOG: [libc++] Update utilities to compare benchmarks (#157556)
This patch replaces the previous `libcxx-compare-benchmarks` wrapper by
a new `compare-benchmarks` script which works with LNT-compatible data.
This allows comparing benchmark results across libc++ microbenchmarks,
SPEC, and anything else that would produce LNT-compatible data.
It also adds a simple script to consolidate LNT benchmark output into a
single file, simplifying the process of doing A/B runs locally. The
simplest way to do this doesn't require creating two build directories
after this patch anymore.
It also adds the ability to produce either a standalone HTML chart or a
plain text output for diffing results locally when prototyping changes.
Example text output of the new tool:
```
Benchmark Baseline Candidate Difference % Difference
----------------------------------- ---------- ----------- ------------ --------------
BM_join_view_deques/0 8.11 8.16 0.05 0.63
BM_join_view_deques/1 13.56 13.79 0.23 1.69
BM_join_view_deques/1024 6606.51 7011.34 404.83 6.13
BM_join_view_deques/2 17.99 19.92 1.93 10.72
BM_join_view_deques/4000 27655.58 29864.72 2209.14 7.99
BM_join_view_deques/4096 26218.07 30520.13 4302.05 16.41
BM_join_view_deques/512 3231.66 2832.47 -399.19 -12.35
BM_join_view_deques/5500 47144.82 42207.41 -4937.42 -10.47
BM_join_view_deques/64 247.23 262.66 15.43 6.24
BM_join_view_deques/64000 756221.63 511247.48 -244974.15 -32.39
BM_join_view_deques/65536 537110.91 560241.61 23130.70 4.31
BM_join_view_deques/70000 815739.07 616181.34 -199557.73 -24.46
BM_join_view_out_vectors/0 0.93 0.93 0.00 0.07
BM_join_view_out_vectors/1 3.11 3.14 0.03 0.82
BM_join_view_out_vectors/1024 3090.92 3563.29 472.37 15.28
BM_join_view_out_vectors/2 5.52 5.56 0.04 0.64
BM_join_view_out_vectors/4000 9887.21 9774.40 -112.82 -1.14
BM_join_view_out_vectors/4096 10158.78 10190.44 31.66 0.31
BM_join_view_out_vectors/512 1218.68 1209.59 -9.09 -0.75
BM_join_view_out_vectors/5500 13559.23 13676.06 116.84 0.86
BM_join_view_out_vectors/64 158.95 157.91 -1.04 -0.65
BM_join_view_out_vectors/64000 178514.73 226520.97 48006.24 26.89
BM_join_view_out_vectors/65536 184639.37 207180.35 22540.98 12.21
BM_join_view_out_vectors/70000 235006.69 213886.93 -21119.77 -8.99
```
Added:
libcxx/utils/compare-benchmarks
libcxx/utils/consolidate-benchmarks
libcxx/utils/requirements.txt
Modified:
libcxx/docs/TestingLibcxx.rst
Removed:
libcxx/utils/libcxx-benchmark-json
libcxx/utils/libcxx-compare-benchmarks
################################################################################
diff --git a/libcxx/docs/TestingLibcxx.rst b/libcxx/docs/TestingLibcxx.rst
index 56cf4aca236f9..44463385b81a7 100644
--- a/libcxx/docs/TestingLibcxx.rst
+++ b/libcxx/docs/TestingLibcxx.rst
@@ -471,7 +471,7 @@ removed from the Standard. These tests should be written like:
Benchmarks
==========
-Libc++'s test suite also contains benchmarks. The benchmarks are written using the `Google Benchmark`_
+Libc++'s test suite also contains benchmarks. Many benchmarks are written using the `Google Benchmark`_
library, a copy of which is stored in the LLVM monorepo. For more information about using the Google
Benchmark library, see the `official documentation <https://github.com/google/benchmark>`_.
@@ -490,27 +490,46 @@ run through ``check-cxx`` for anything, instead run the benchmarks manually usin
the instructions for running individual tests.
If you want to compare the results of
diff erent benchmark runs, we recommend using the
-``libcxx-compare-benchmarks`` helper tool. First, configure CMake in a build directory
-and run the benchmark:
+``compare-benchmarks`` helper tool. Note that the script has some dependencies, which can
+be installed with:
.. code-block:: bash
- $ cmake -S runtimes -B <build1> [...]
- $ libcxx/utils/libcxx-lit <build1> libcxx/test/benchmarks/string.bench.cpp --param optimization=speed
+ $ python -m venv .venv && source .venv/bin/activate # Optional but recommended
+ $ pip install -r libcxx/utils/requirements.txt
-Then, do the same for the second configuration you want to test. Use a
diff erent build
-directory for that configuration:
+Once that's done, start by configuring CMake in a build directory and running one or
+more benchmarks, as usual:
.. code-block:: bash
- $ cmake -S runtimes -B <build2> [...]
- $ libcxx/utils/libcxx-lit <build2> libcxx/test/benchmarks/string.bench.cpp --param optimization=speed
+ $ cmake -S runtimes -B <build> [...]
+ $ libcxx/utils/libcxx-lit <build> libcxx/test/benchmarks/string.bench.cpp --param optimization=speed
-Finally, use ``libcxx-compare-benchmarks`` to compare both:
+Then, get the consolidated benchmark output for that run using ``consolidate-benchmarks``:
.. code-block:: bash
- $ libcxx/utils/libcxx-compare-benchmarks <build1> <build2> libcxx/test/benchmarks/string.bench.cpp
+ $ libcxx/utils/consolidate-benchmarks <build> > baseline.lnt
+
+The ``baseline.lnt`` file will contain a consolidation of all the benchmark results present in the build
+directory. You can then make the desired modifications to the code, run the benchmark(s) again, and then run:
+
+.. code-block:: bash
+
+ $ libcxx/utils/consolidate-benchmarks <build> > candidate.lnt
+
+Finally, use ``compare-benchmarks`` to compare both:
+
+.. code-block:: bash
+
+ $ libcxx/utils/compare-benchmarks baseline.lnt candidate.lnt
+
+ # Useful one-liner when iterating locally:
+ $ libcxx/utils/compare-benchmarks baseline.lnt <(libcxx/utils/consolidate-benchmarks <build>)
+
+The ``compare-benchmarks`` script provides some useful options like creating a chart to easily visualize
+
diff erences in a browser window. Use ``compare-benchmarks --help`` for details.
.. _`Google Benchmark`: https://github.com/google/benchmark
diff --git a/libcxx/utils/compare-benchmarks b/libcxx/utils/compare-benchmarks
new file mode 100755
index 0000000000000..9bda5f1a27949
--- /dev/null
+++ b/libcxx/utils/compare-benchmarks
@@ -0,0 +1,123 @@
+#!/usr/bin/env python3
+
+import argparse
+import re
+import statistics
+import sys
+
+import plotly
+import tabulate
+
+def parse_lnt(lines):
+ """
+ Parse lines in LNT format and return a dictionnary of the form:
+
+ {
+ 'benchmark1': {
+ 'metric1': [float],
+ 'metric2': [float],
+ ...
+ },
+ 'benchmark2': {
+ 'metric1': [float],
+ 'metric2': [float],
+ ...
+ },
+ ...
+ }
+
+ Each metric may have multiple values.
+ """
+ results = {}
+ for line in lines:
+ line = line.strip()
+ if not line:
+ continue
+
+ (identifier, value) = line.split(' ')
+ (name, metric) = identifier.split('.')
+ if name not in results:
+ results[name] = {}
+ if metric not in results[name]:
+ results[name][metric] = []
+ results[name][metric].append(float(value))
+ return results
+
+def plain_text_comparison(benchmarks, baseline, candidate):
+ """
+ Create a tabulated comparison of the baseline and the candidate.
+ """
+ headers = ['Benchmark', 'Baseline', 'Candidate', 'Difference', '% Difference']
+ fmt = (None, '.2f', '.2f', '.2f', '.2f')
+ table = []
+ for (bm, base, cand) in zip(benchmarks, baseline, candidate):
+
diff = (cand - base) if base and cand else None
+ percent = 100 * (
diff / base) if base and cand else None
+ row = [bm, base, cand,
diff , percent]
+ table.append(row)
+ return tabulate.tabulate(table, headers=headers, floatfmt=fmt, numalign='right')
+
+def create_chart(benchmarks, baseline, candidate):
+ """
+ Create a bar chart comparing 'baseline' and 'candidate'.
+ """
+ figure = plotly.graph_objects.Figure()
+ figure.add_trace(plotly.graph_objects.Bar(x=benchmarks, y=baseline, name='Baseline'))
+ figure.add_trace(plotly.graph_objects.Bar(x=benchmarks, y=candidate, name='Candidate'))
+ return figure
+
+def prepare_series(baseline, candidate, metric, aggregate=statistics.median):
+ """
+ Prepare the data for being formatted or displayed as a chart.
+
+ Metrics that have more than one value are aggregated using the given aggregation function.
+ """
+ all_benchmarks = sorted(list(set(baseline.keys()) | set(candidate.keys())))
+ baseline_series = []
+ candidate_series = []
+ for bm in all_benchmarks:
+ baseline_series.append(aggregate(baseline[bm][metric]) if bm in baseline and metric in baseline[bm] else None)
+ candidate_series.append(aggregate(candidate[bm][metric]) if bm in candidate and metric in candidate[bm] else None)
+ return (all_benchmarks, baseline_series, candidate_series)
+
+def main(argv):
+ parser = argparse.ArgumentParser(
+ prog='compare-benchmarks',
+ description='Compare the results of two sets of benchmarks in LNT format.',
+ epilog='This script requires the `tabulate` and the `plotly` Python modules.')
+ parser.add_argument('baseline', type=argparse.FileType('r'),
+ help='Path to a LNT format file containing the benchmark results for the baseline.')
+ parser.add_argument('candidate', type=argparse.FileType('r'),
+ help='Path to a LNT format file containing the benchmark results for the candidate.')
+ parser.add_argument('--metric', type=str, default='execution_time',
+ help='The metric to compare. LNT data may contain multiple metrics (e.g. code size, execution time, etc) -- '
+ 'this option allows selecting which metric is being analyzed. The default is "execution_time".')
+ parser.add_argument('--output', '-o', type=argparse.FileType('w'), default=sys.stdout,
+ help='Path of a file where to output the resulting comparison. Default to stdout.')
+ parser.add_argument('--filter', type=str, required=False,
+ help='An optional regular expression used to filter the benchmarks included in the comparison. '
+ 'Only benchmarks whose names match the regular expression will be included.')
+ parser.add_argument('--format', type=str, choices=['text', 'chart'], default='text',
+ help='Select the output format. "text" generates a plain-text comparison in tabular form, and "chart" '
+ 'generates a self-contained HTML graph that can be opened in a browser. The default is text.')
+ args = parser.parse_args(argv)
+
+ baseline = parse_lnt(args.baseline.readlines())
+ candidate = parse_lnt(args.candidate.readlines())
+
+ if args.filter is not None:
+ regex = re.compile(args.filter)
+ baseline = {k: v for (k, v) in baseline.items() if regex.search(k)}
+ candidate = {k: v for (k, v) in candidate.items() if regex.search(k)}
+
+ (benchmarks, baseline_series, candidate_series) = prepare_series(baseline, candidate, args.metric)
+
+ if args.format == 'chart':
+ figure = create_chart(benchmarks, baseline_series, candidate_series)
+ plotly.io.write_html(figure, file=args.output)
+ else:
+
diff = plain_text_comparison(benchmarks, baseline_series, candidate_series)
+ args.output.write(
diff )
+
+if __name__ == '__main__':
+ main(sys.argv[1:])
diff --git a/libcxx/utils/consolidate-benchmarks b/libcxx/utils/consolidate-benchmarks
new file mode 100755
index 0000000000000..c84607f1991c1
--- /dev/null
+++ b/libcxx/utils/consolidate-benchmarks
@@ -0,0 +1,36 @@
+#!/usr/bin/env python3
+
+import argparse
+import pathlib
+import sys
+
+def main(argv):
+ parser = argparse.ArgumentParser(
+ prog='consolidate-benchmarks',
+ description='Consolidate benchmark result files (in LNT format) into a single LNT-format file.')
+ parser.add_argument('files_or_directories', type=str, nargs='+',
+ help='Path to files or directories containing LNT data to consolidate. Directories are searched '
+ 'recursively for files with a .lnt extension.')
+ parser.add_argument('--output', '-o', type=argparse.FileType('w'), default=sys.stdout,
+ help='Where to output the result. Default to stdout.')
+ args = parser.parse_args(argv)
+
+ files = []
+ for arg in args.files_or_directories:
+ path = pathlib.Path(arg)
+ if path.is_dir():
+ for p in path.rglob('*.lnt'):
+ files.append(p)
+ else:
+ files.append(path)
+
+ for file in files:
+ for line in file.open().readlines():
+ line = line.strip()
+ if not line:
+ continue
+ args.output.write(line)
+ args.output.write('\n')
+
+if __name__ == '__main__':
+ main(sys.argv[1:])
diff --git a/libcxx/utils/libcxx-benchmark-json b/libcxx/utils/libcxx-benchmark-json
deleted file mode 100755
index 7f743c32caf40..0000000000000
--- a/libcxx/utils/libcxx-benchmark-json
+++ /dev/null
@@ -1,57 +0,0 @@
-#!/usr/bin/env bash
-
-set -e
-
-PROGNAME="$(basename "${0}")"
-MONOREPO_ROOT="$(realpath $(dirname "${PROGNAME}"))"
-function usage() {
-cat <<EOF
-Usage:
-${PROGNAME} [-h|--help] <build-directory> benchmarks...
-
-Print the path to the JSON files containing benchmark results for the given benchmarks.
-
-This requires those benchmarks to have already been run, i.e. this only resolves the path
-to the benchmark .json file within the build directory.
-
-<build-directory> The path to the build directory.
-benchmarks... Paths of the benchmarks to extract the results for. Those paths are relative to '<monorepo-root>'.
-
-Example
-=======
-$ cmake -S runtimes -B build/ -DLLVM_ENABLE_RUNTIMES="libcxx;libcxxabi"
-$ libcxx-lit build/ -sv libcxx/test/benchmarks/algorithms/for_each.bench.cpp
-$ less \$(${PROGNAME} build/ libcxx/test/benchmarks/algorithms/for_each.bench.cpp)
-EOF
-}
-
-if [[ "${1}" == "-h" || "${1}" == "--help" ]]; then
- usage
- exit 0
-fi
-
-if [[ $# -lt 1 ]]; then
- usage
- exit 1
-fi
-
-build_dir="${1}"
-shift
-
-for benchmark in ${@}; do
- # Normalize the paths by turning all benchmarks paths into absolute ones and then making them
- # relative to the root of the monorepo.
- benchmark="$(realpath ${benchmark})"
- relative=$(python -c "import os; import sys; print(os.path.relpath(sys.argv[1], sys.argv[2]))" "${benchmark}" "${MONOREPO_ROOT}")
-
- # Extract components of the benchmark path
- directory="$(dirname ${relative})"
- file="$(basename ${relative})"
-
- # Reconstruct the (slightly weird) path to the benchmark json file. This should be kept in sync
- # whenever the test suite changes.
- json="${build_dir}/${directory}/Output/${file}.dir/benchmark-result.json"
- if [[ -f "${json}" ]]; then
- echo "${json}"
- fi
-done
diff --git a/libcxx/utils/libcxx-compare-benchmarks b/libcxx/utils/libcxx-compare-benchmarks
deleted file mode 100755
index 08c53b2420c8e..0000000000000
--- a/libcxx/utils/libcxx-compare-benchmarks
+++ /dev/null
@@ -1,73 +0,0 @@
-#!/usr/bin/env bash
-
-set -e
-
-PROGNAME="$(basename "${0}")"
-MONOREPO_ROOT="$(realpath $(dirname "${PROGNAME}"))"
-function usage() {
-cat <<EOF
-Usage:
-${PROGNAME} [-h|--help] <baseline-build> <candidate-build> benchmarks... [-- gbench-args...]
-
-Compare the given benchmarks between the baseline and the candidate build directories.
-
-This requires those benchmarks to have already been generated in both build directories.
-
-<baseline-build> The path to the build directory considered the baseline.
-<candidate-build> The path to the build directory considered the candidate.
-benchmarks... Paths of the benchmarks to compare. Those paths are relative to '<monorepo-root>'.
-[-- gbench-args...] Any arguments provided after '--' will be passed as-is to GoogleBenchmark's compare.py tool.
-
-Example
-=======
-$ libcxx-lit build1/ -sv libcxx/test/benchmarks/algorithms/for_each.bench.cpp
-$ libcxx-lit build2/ -sv libcxx/test/benchmarks/algorithms/for_each.bench.cpp
-$ ${PROGNAME} build1/ build2/ libcxx/test/benchmarks/algorithms/for_each.bench.cpp
-EOF
-}
-
-if [[ "${1}" == "-h" || "${1}" == "--help" ]]; then
- usage
- exit 0
-fi
-
-if [[ $# -lt 1 ]]; then
- usage
- exit 1
-fi
-
-baseline="${1}"
-candidate="${2}"
-shift; shift
-
-GBENCH="${MONOREPO_ROOT}/third-party/benchmark"
-
-python3 -m venv /tmp/libcxx-compare-benchmarks-venv
-source /tmp/libcxx-compare-benchmarks-venv/bin/activate
-pip3 install -r ${GBENCH}/tools/requirements.txt
-
-benchmarks=""
-while [[ $# -gt 0 ]]; do
- if [[ "${1}" == "--" ]]; then
- shift
- break
- fi
- benchmarks+=" ${1}"
- shift
-done
-
-for benchmark in ${benchmarks}; do
- base="$(${MONOREPO_ROOT}/libcxx/utils/libcxx-benchmark-json ${baseline} ${benchmark})"
- cand="$(${MONOREPO_ROOT}/libcxx/utils/libcxx-benchmark-json ${candidate} ${benchmark})"
-
- if [[ ! -e "${base}" ]]; then
- echo "Benchmark ${benchmark} does not exist in the baseline"
- continue
- fi
- if [[ ! -e "${cand}" ]]; then
- echo "Benchmark ${benchmark} does not exist in the candidate"
- continue
- fi
-
- "${GBENCH}/tools/compare.py" benchmarks "${base}" "${cand}" ${@}
-done
diff --git a/libcxx/utils/requirements.txt b/libcxx/utils/requirements.txt
new file mode 100644
index 0000000000000..de6e123eec54a
--- /dev/null
+++ b/libcxx/utils/requirements.txt
@@ -0,0 +1,2 @@
+plotly
+tabulate
More information about the libcxx-commits
mailing list