[libcxx-commits] [libcxx] [libc++] Measure retired CPU instructions when running SPEC (PR #177669)

Mon Jan 26 08:40:12 PST 2026

llvmbot wrote:




@llvm/pr-subscribers-libcxx

Author: Louis Dionne (ldionne)

<details>
<summary>Changes</summary>

Fixes #177611

Things to check:
1. [x] Extract building and running of SPEC benchmarks into distinct steps and make sure it works as intended
2. [x] Time each step to ensure this doesn't add too much overhead
3. [x] Run the benchmarks a few times to ensure there's not too much variability

(1) Checked the SPEC output and the code isn't rebuilt in the second `runcpu` call, as we'd expect
(2) I get a total of 46s to build and run `omnetpp_r` before the patch, and 50s after the patch. I think this is acceptable
(3) It seems that most of the metrics we're gathering have no more variability than the execution time, so I think this is acceptable. See [this](https://github.com/user-attachments/files/24864249/omnetpp_r.spread.of.metrics.-.Sheet1.pdf) for the data associated to 10 sample runs of `omnetpp_r`.



---
Full diff: https://github.com/llvm/llvm-project/pull/177669.diff


2 Files Affected:

- (modified) libcxx/test/benchmarks/spec.gen.py (+13-4) 
- (added) libcxx/utils/parse-time-output (+42) 


``````````diff

diff --git a/libcxx/test/benchmarks/spec.gen.py b/libcxx/test/benchmarks/spec.gen.py
index f355b2c6036d1..c08541a57ac7c 100644
--- a/libcxx/test/benchmarks/spec.gen.py
+++ b/libcxx/test/benchmarks/spec.gen.py
@@ -66,11 +66,19 @@
 
 for benchmark in spec_benchmarks:
     print(f'#--- {benchmark}.sh.test')
-    print(f'RUN: rm -rf %{{temp}}') # clean up any previous (potentially incomplete) run
+    # Clean up any previous (potentially incomplete) run
+    print(f'RUN: rm -rf %{{temp}}')
+
+    # Build the benchmark
     print(f'RUN: mkdir %{{temp}}')
     print(f'RUN: cp {spec_config} %{{temp}}/spec-config.cfg')
-    print(f'RUN: %{{spec_dir}}/bin/runcpu --config %{{temp}}/spec-config.cfg --size train --output-root %{{temp}} --rebuild {benchmark}')
-    print(f'RUN: rm -rf %{{temp}}/benchspec') # remove the temporary directory, which can become quite large
+    print(f'RUN: %{{spec_dir}}/bin/runcpu --config %{{temp}}/spec-config.cfg --action build --output_root %{{temp}} {benchmark}')
+
+    # Run the benchmark
+    print(f'RUN: /usr/bin/time -l -o %{{temp}}/time.txt %{{spec_dir}}/bin/runcpu --config %{{temp}}/spec-config.cfg --action run --size train --output_root %{{temp}} {benchmark}')
+
+    # Clean up, since there can be lots of content created
+    print(f'RUN: rm -rf %{{temp}}/benchspec')
 
     # The `runcpu` command above doesn't fail even if the benchmark fails to run. To determine failure, parse the CSV
     # results and ensure there are no compilation errors or runtime errors in the status row. Also print the logs and
@@ -78,6 +86,7 @@
     print(f'RUN: %{{libcxx-dir}}/utils/parse-spec-results --extract "Base Status" --keep-failed %{{temp}}/result/*.train.csv > %{{temp}}/status || ! cat %{{temp}}/result/*.log')
     print(f'RUN: ! grep -E "CE|RE" %{{temp}}/status || ! cat %{{temp}}/result/*.log')
 
-    # If there were no errors, parse the results into LNT-compatible format and print them.
+    # If there were no errors, parse the SPEC results and the `time` output into LNT-compatible format and print them.
     print(f'RUN: %{{libcxx-dir}}/utils/parse-spec-results %{{temp}}/result/*.train.csv --output-format=lnt > %{{temp}}/results.lnt')
+    print(f'RUN: %{{libcxx-dir}}/utils/parse-time-output %{{temp}}/time.txt --benchmark {benchmark} --extract instructions max_rss cycles peak_memory >> %{{temp}}/results.lnt')
     print(f'RUN: cat %{{temp}}/results.lnt')
diff --git a/libcxx/utils/parse-time-output b/libcxx/utils/parse-time-output
new file mode 100755
index 0000000000000..3be225226b7e0
--- /dev/null
+++ b/libcxx/utils/parse-time-output
@@ -0,0 +1,42 @@
+#!/usr/bin/env python3
+
+import argparse
+import re
+import sys
+
+def main(argv):
+    parser = argparse.ArgumentParser(
+        prog='parse-time-output',
+        description='Parse the output of /usr/bin/time and output it in LNT-compatible format.')
+    parser.add_argument('input_file', type=argparse.FileType('r'), default='-',
+        help='Path of the file to extract results from. By default, stdin.')
+    parser.add_argument('--benchmark', type=str, required=True,
+        help='The name of the benchmark to use in the resulting LNT output.')
+    parser.add_argument('--extract', type=str, choices=['instructions', 'max_rss', 'cycles', 'peak_memory'], nargs='+',
+        help='The name of the metrics to extract from the time output.')
+    args = parser.parse_args(argv)
+
+    # Mapping from metric names to field names in the time output.
+    field_mapping = {
+        'instructions': 'instructions retired',
+        'max_rss': 'maximum resident set size',
+        'cycles': 'cycles elapsed',
+        'peak_memory': 'peak memory footprint',
+    }
+    to_extract = [field_mapping[e] for e in args.extract]
+
+    metrics = {}
+    for line in args.input_file:
+        match = re.match(r'\s*(\d+)\s+(\w+.*)', line)
+        if match is not None:
+            time_desc = match.group(2)
+            for metric, desc in field_mapping.items():
+                if time_desc == desc:
+                    metrics[metric] = int(match.group(1))
+                    break
+
+    for metric, value in metrics.items():
+        print(f'{args.benchmark}.{metric} {value}')
+
+if __name__ == '__main__':
+    main(sys.argv[1:])

``````````

</details>


https://github.com/llvm/llvm-project/pull/177669