[llvm] de22d71 - [llvm-exegesis] 'Min' repetition mode

Wed Apr 1 23:29:17 PDT 2020

Author: Roman Lebedev
Date: 2020-04-02T09:28:35+03:00
New Revision: de22d7154b4a1400c3b19e3df107a7e178c3a604

URL: https://github.com/llvm/llvm-project/commit/de22d7154b4a1400c3b19e3df107a7e178c3a604
DIFF: https://github.com/llvm/llvm-project/commit/de22d7154b4a1400c3b19e3df107a7e178c3a604.diff

LOG: [llvm-exegesis] 'Min' repetition mode

Summary:
As noted in documentation, different repetition modes have different trade-offs:

> .. option:: -repetition-mode=[duplicate|loop]
>
>  Specify the repetition mode. `duplicate` will create a large, straight line
>  basic block with `num-repetitions` copies of the snippet. `loop` will wrap
>  the snippet in a loop which will be run `num-repetitions` times. The `loop`
>  mode tends to better hide the effects of the CPU frontend on architectures
>  that cache decoded instructions, but consumes a register for counting
>  iterations.

Indeed. Example:

>>! In D74156#1873657, @lebedev.ri wrote:
> At least for `CMOV`, i'm seeing wildly different results
> |           | Latency | RThroughput |
> | duplicate | 1       | 0.8         |
> | loop      | 2       | 0.6         |
> where latency=1 seems correct, and i'd expect the througput to be close to 1/2 (since there are two execution units).

This isn't great for analysis, at least for schedule model development.

As discussed in excruciating detail in

>>! In D74156#1924514, @gchatelet wrote:
>>>! In D74156#1920632, @lebedev.ri wrote:
>> ... did that explanation of the question i'm having made any sense?
>
> Thx for digging in the conversation !
> Ok it makes more sense now.
>
> I discussed it a bit with @courbet:
>  - We want the analysis tool to stay simple so we'd rather not make it knowledgeable of the repetition mode.
>  - We'd like to still be able to select either repetition mode to dig into special cases
>
> So we could add a third `min` repetition mode that would run both and take the minimum. It could be the default option.
> Would you have some time to look what it would take to add this third mode?

there appears to be an agreement that it is indeed sub-par,
and that we should provide an optional, measurement (not analysis!) -time
way to rectify the situation.

However, the solutions isn't entirely straight-forward.

We can just add an actual 'multiplexer' `MinSnippetRepetitor`, because
if we just concatenate snippets produced by `DuplicateSnippetRepetitor`
and `LoopSnippetRepetitor` and run+measure that, the measurement will
naturally be different from what we'd get by running+measuring
them separately and taking the min.
([[ https://www.wolframalpha.com/input/?i=%28x%2By%29%2F2+%21%3D+min%28x%2C+y%29 | `time(D+L)/2 != min(time(D), time(L))` ]])

Also, it seems best to me to have a single snippet instead of generating
a snippet per repetition mode, since the only difference here is that the
loop repetition mode reserves one register for loop counter.

As far as i can tell, we can either teach `BenchmarkRunner::runConfiguration()`
to produce a single report given multiple repetitors (as in the patch),
or do that one layer higher - don't modify `BenchmarkRunner::runConfiguration()`,
produce multiple reports, don't actually print each one, but aggregate them somehow
and only print the final one.

Initially i've gone ahead with the latter approach, but it didn't look like a natural fit;
the former (as in the diff) does seem like a better fit to me.

There's also a question of the test coverage. It sure currently does work here:
```
$ ./bin/llvm-exegesis --opcode-name=CMOV64rr --mode=inverse_throughput --repetition-mode=duplicate
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-8fb949.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'CMOV64rr RAX RAX R11 i_0x0'
    - 'CMOV64rr RBP RBP R15 i_0x0'
    - 'CMOV64rr RBX RBX RBX i_0x0'
    - 'CMOV64rr RCX RCX RBX i_0x0'
    - 'CMOV64rr RDI RDI R10 i_0x0'
    - 'CMOV64rr RDX RDX RAX i_0x0'
    - 'CMOV64rr RSI RSI RAX i_0x0'
    - 'CMOV64rr R8 R8 R8 i_0x0'
    - 'CMOV64rr R9 R9 RDX i_0x0'
    - 'CMOV64rr R10 R10 RBX i_0x0'
    - 'CMOV64rr R11 R11 R14 i_0x0'
    - 'CMOV64rr R12 R12 R9 i_0x0'
    - 'CMOV64rr R13 R13 R12 i_0x0'
    - 'CMOV64rr R14 R14 R15 i_0x0'
    - 'CMOV64rr R15 R15 R13 i_0x0'
  config:          ''
  register_initial_values:
    - 'RAX=0x0'
    - 'R11=0x0'
    - 'EFLAGS=0x0'
    - 'RBP=0x0'
    - 'R15=0x0'
    - 'RBX=0x0'
    - 'RCX=0x0'
    - 'RDI=0x0'
    - 'R10=0x0'
    - 'RDX=0x0'
    - 'RSI=0x0'
    - 'R8=0x0'
    - 'R9=0x0'
    - 'R14=0x0'
    - 'R12=0x0'
    - 'R13=0x0'
cpu_name:        bdver2
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: inverse_throughput, value: 0.819, per_snippet_value: 12.285 }
error:           ''
info:            instruction has tied variables, using static renaming.
assembled_snippet: 5541574156415541545348B8000000000000000049BB00000000000000004883EC08C7042400000000C7442404000000009D48BD000000000000000049BF000000000000000048BB000000000000000048B9000000000000000048BF000000000000000049BA000000000000000048BA000000000000000048BE000000000000000049B8000000000000000049B9000000000000000049BE000000000000000049BC000000000000000049BD0000000000000000490F40C3490F40EF480F40DB480F40CB490F40FA480F40D0480F40F04D0F40C04C0F40CA4C0F40D34D0F40DE4D0F40E14D0F40EC4D0F40F74D0F40FD490F40C35B415C415D415E415F5DC3
...
$ ./bin/llvm-exegesis --opcode-name=CMOV64rr --mode=inverse_throughput --repetition-mode=loop
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-051eb3.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'CMOV64rr RAX RAX R11 i_0x0'
    - 'CMOV64rr RBP RBP RSI i_0x0'
    - 'CMOV64rr RBX RBX R9 i_0x0'
    - 'CMOV64rr RCX RCX RSI i_0x0'
    - 'CMOV64rr RDI RDI RBP i_0x0'
    - 'CMOV64rr RDX RDX R9 i_0x0'
    - 'CMOV64rr RSI RSI RDI i_0x0'
    - 'CMOV64rr R9 R9 R12 i_0x0'
    - 'CMOV64rr R10 R10 R11 i_0x0'
    - 'CMOV64rr R11 R11 R9 i_0x0'
    - 'CMOV64rr R12 R12 RBP i_0x0'
    - 'CMOV64rr R13 R13 RSI i_0x0'
    - 'CMOV64rr R14 R14 R14 i_0x0'
    - 'CMOV64rr R15 R15 R10 i_0x0'
  config:          ''
  register_initial_values:
    - 'RAX=0x0'
    - 'R11=0x0'
    - 'EFLAGS=0x0'
    - 'RBP=0x0'
    - 'RSI=0x0'
    - 'RBX=0x0'
    - 'R9=0x0'
    - 'RCX=0x0'
    - 'RDI=0x0'
    - 'RDX=0x0'
    - 'R12=0x0'
    - 'R10=0x0'
    - 'R13=0x0'
    - 'R14=0x0'
    - 'R15=0x0'
cpu_name:        bdver2
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: inverse_throughput, value: 0.6083, per_snippet_value: 8.5162 }
error:           ''
info:            instruction has tied variables, using static renaming.
assembled_snippet: 5541574156415541545348B8000000000000000049BB00000000000000004883EC08C7042400000000C7442404000000009D48BD000000000000000048BE000000000000000048BB000000000000000049B9000000000000000048B9000000000000000048BF000000000000000048BA000000000000000049BC000000000000000049BA000000000000000049BD000000000000000049BE000000000000000049BF000000000000000049B80200000000000000490F40C3480F40EE490F40D9480F40CE480F40FD490F40D1480F40F74D0F40CC4D0F40D34D0F40D94C0F40E54C0F40EE4D0F40F64D0F40FA4983C0FF75C25B415C415D415E415F5DC3
...
$ ./bin/llvm-exegesis --opcode-name=CMOV64rr --mode=inverse_throughput --repetition-mode=min
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-c7a47d.o
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-2581f1.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'CMOV64rr RAX RAX R11 i_0x0'
    - 'CMOV64rr RBP RBP R10 i_0x0'
    - 'CMOV64rr RBX RBX R10 i_0x0'
    - 'CMOV64rr RCX RCX RDX i_0x0'
    - 'CMOV64rr RDI RDI RAX i_0x0'
    - 'CMOV64rr RDX RDX R9 i_0x0'
    - 'CMOV64rr RSI RSI RAX i_0x0'
    - 'CMOV64rr R9 R9 RBX i_0x0'
    - 'CMOV64rr R10 R10 R12 i_0x0'
    - 'CMOV64rr R11 R11 RDI i_0x0'
    - 'CMOV64rr R12 R12 RDI i_0x0'
    - 'CMOV64rr R13 R13 RDI i_0x0'
    - 'CMOV64rr R14 R14 R9 i_0x0'
    - 'CMOV64rr R15 R15 RBP i_0x0'
  config:          ''
  register_initial_values:
    - 'RAX=0x0'
    - 'R11=0x0'
    - 'EFLAGS=0x0'
    - 'RBP=0x0'
    - 'R10=0x0'
    - 'RBX=0x0'
    - 'RCX=0x0'
    - 'RDX=0x0'
    - 'RDI=0x0'
    - 'R9=0x0'
    - 'RSI=0x0'
    - 'R12=0x0'
    - 'R13=0x0'
    - 'R14=0x0'
    - 'R15=0x0'
cpu_name:        bdver2
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: inverse_throughput, value: 0.6073, per_snippet_value: 8.5022 }
error:           ''
info:            instruction has tied variables, using static renaming.
assembled_snippet: 5541574156415541545348B8000000000000000049BB00000000000000004883EC08C7042400000000C7442404000000009D48BD000000000000000049BA000000000000000048BB000000000000000048B9000000000000000048BA000000000000000048BF000000000000000049B9000000000000000048BE000000000000000049BC000000000000000049BD000000000000000049BE000000000000000049BF0000000000000000490F40C3490F40EA490F40DA480F40CA480F40F8490F40D1480F40F04C0F40CB4D0F40D44C0F40DF4C0F40E74C0F40EF4D0F40F14C0F40FD490F40C3490F40EA5B415C415D415E415F5DC35541574156415541545348B8000000000000000049BB00000000000000004883EC08C7042400000000C7442404000000009D48BD000000000000000049BA000000000000000048BB000000000000000048B9000000000000000048BA000000000000000048BF000000000000000049B9000000000000000048BE000000000000000049BC000000000000000049BD000000000000000049BE000000000000000049BF000000000000000049B80200000000000000490F40C3490F40EA490F40DA480F40CA480F40F8490F40D1480F40F04C0F40CB4D0F40D44C0F40DF4C0F40E74C0F40EF4D0F40F14C0F40FD4983C0FF75C25B415C415D415E415F5DC3
...
```
but i open to suggestions as to how test that.

I also have gone with the suggestion to default to this new mode.
This was irking me for some time, so i'm happy to finally see progress here.
Looking forward to feedback.

Reviewers: courbet, gchatelet

Reviewed By: courbet, gchatelet

Subscribers: mstojanovic, RKSimon, llvm-commits, courbet, gchatelet

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D76921

Added: 
    

Modified: 
    llvm/docs/CommandGuide/llvm-exegesis.rst
    llvm/tools/llvm-exegesis/lib/BenchmarkResult.h
    llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp
    llvm/tools/llvm-exegesis/lib/BenchmarkRunner.h
    llvm/tools/llvm-exegesis/lib/SnippetRepetitor.cpp
    llvm/tools/llvm-exegesis/llvm-exegesis.cpp

Removed: 
    


################################################################################
diff  --git a/llvm/docs/CommandGuide/llvm-exegesis.rst b/llvm/docs/CommandGuide/llvm-exegesis.rst
index 31be33cc861f..321cdf5a6dab 100644

--- a/llvm/docs/CommandGuide/llvm-exegesis.rst
+++ b/llvm/docs/CommandGuide/llvm-exegesis.rst
@@ -196,14 +196,16 @@ OPTIONS
  to specify at least one of the `-analysis-clusters-output-file=` and
  `-analysis-inconsistencies-output-file=`.
 
-.. option:: -repetition-mode=[duplicate|loop]
+.. option:: -repetition-mode=[duplicate|loop|min]
 
  Specify the repetition mode. `duplicate` will create a large, straight line
  basic block with `num-repetitions` copies of the snippet. `loop` will wrap
  the snippet in a loop which will be run `num-repetitions` times. The `loop`
  mode tends to better hide the effects of the CPU frontend on architectures
  that cache decoded instructions, but consumes a register for counting
- iterations.
+ iterations. If performing an analysis over many opcodes, it may be best
+ to instead use the `min` mode, which will run each other mode, and produce
+ the minimal measured result.
 
 .. option:: -num-repetitions=<Number of repetitions>
 

diff  --git a/llvm/tools/llvm-exegesis/lib/BenchmarkResult.h b/llvm/tools/llvm-exegesis/lib/BenchmarkResult.h
index 1788e7490685..d4bad347f604 100644
--- a/llvm/tools/llvm-exegesis/lib/BenchmarkResult.h
+++ b/llvm/tools/llvm-exegesis/lib/BenchmarkResult.h
@@ -68,8 +68,7 @@ struct InstructionBenchmark {
   // The number of instructions inside the repeated snippet. For example, if a
   // snippet of 3 instructions is repeated 4 times, this is 12.
   int NumRepetitions = 0;
-  enum RepetitionModeE { Duplicate, Loop };
-  RepetitionModeE RepetitionMode;
+  enum RepetitionModeE { Duplicate, Loop, AggregateMin };
   // Note that measurements are per instruction.
   std::vector<BenchmarkMeasure> Measurements;
   std::string Error;

diff  --git a/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp b/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp
index 5b15bc29c60d..be778f29f52f 100644
--- a/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp
+++ b/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp
@@ -14,6 +14,7 @@
 #include "Error.h"
 #include "MCInstrDescView.h"
 #include "PerfHelper.h"
+#include "llvm/ADT/ScopeExit.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/ADT/Twine.h"
@@ -81,7 +82,8 @@ class FunctionExecutorImpl : public BenchmarkRunner::FunctionExecutor {
 
 Expected<InstructionBenchmark> BenchmarkRunner::runConfiguration(
     const BenchmarkCode &BC, unsigned NumRepetitions,
-    const SnippetRepetitor &Repetitor, bool DumpObjectToDisk) const {
+    ArrayRef<std::unique_ptr<const SnippetRepetitor>> Repetitors,
+    bool DumpObjectToDisk) const {
   InstructionBenchmark InstrBenchmark;
   InstrBenchmark.Mode = Mode;
   InstrBenchmark.CpuName = std::string(State.getTargetMachine().getTargetCPU());
@@ -94,70 +96,113 @@ Expected<InstructionBenchmark> BenchmarkRunner::runConfiguration(
 
   InstrBenchmark.Key = BC.Key;
 
-  // Assemble at least kMinInstructionsForSnippet instructions by repeating the
-  // snippet for debug/analysis. This is so that the user clearly understands
-  // that the inside instructions are repeated.
-  constexpr const int kMinInstructionsForSnippet = 16;
-  {
-    SmallString<0> Buffer;
-    raw_svector_ostream OS(Buffer);
-    if (Error E = assembleToStream(
-            State.getExegesisTarget(), State.createTargetMachine(), BC.LiveIns,
-            BC.Key.RegisterInitialValues,
-            Repetitor.Repeat(Instructions, kMinInstructionsForSnippet), OS)) {
-      return std::move(E);
+  // If we end up having an error, and we've previously succeeded with
+  // some other Repetitor, we want to discard the previous measurements.
+  struct ClearBenchmarkOnReturn {
+    ClearBenchmarkOnReturn(InstructionBenchmark *IB) : IB(IB) {}
+    ~ClearBenchmarkOnReturn() {
+      if (Clear)
+        IB->Measurements.clear();
+    }
+    void disarm() { Clear = false; }
+
+  private:
+    InstructionBenchmark *const IB;
+    bool Clear = true;
+  };
+  ClearBenchmarkOnReturn CBOR(&InstrBenchmark);
+
+  for (const std::unique_ptr<const SnippetRepetitor> &Repetitor : Repetitors) {
+    // Assemble at least kMinInstructionsForSnippet instructions by repeating
+    // the snippet for debug/analysis. This is so that the user clearly
+    // understands that the inside instructions are repeated.
+    constexpr const int kMinInstructionsForSnippet = 16;
+    {
+      SmallString<0> Buffer;
+      raw_svector_ostream OS(Buffer);
+      if (Error E = assembleToStream(
+              State.getExegesisTarget(), State.createTargetMachine(),
+              BC.LiveIns, BC.Key.RegisterInitialValues,
+              Repetitor->Repeat(Instructions, kMinInstructionsForSnippet),
+              OS)) {
+        return std::move(E);
+      }
+      const ExecutableFunction EF(State.createTargetMachine(),
+                                  getObjectFromBuffer(OS.str()));
+      const auto FnBytes = EF.getFunctionBytes();
+      InstrBenchmark.AssembledSnippet.insert(
+          InstrBenchmark.AssembledSnippet.end(), FnBytes.begin(),
+          FnBytes.end());
     }
-    const ExecutableFunction EF(State.createTargetMachine(),
-                                getObjectFromBuffer(OS.str()));
-    const auto FnBytes = EF.getFunctionBytes();
-    InstrBenchmark.AssembledSnippet.assign(FnBytes.begin(), FnBytes.end());
-  }
 
-  // Assemble NumRepetitions instructions repetitions of the snippet for
-  // measurements.
-  const auto Filler =
-      Repetitor.Repeat(Instructions, InstrBenchmark.NumRepetitions);
+    // Assemble NumRepetitions instructions repetitions of the snippet for
+    // measurements.
+    const auto Filler =
+        Repetitor->Repeat(Instructions, InstrBenchmark.NumRepetitions);
+
+    object::OwningBinary<object::ObjectFile> ObjectFile;
+    if (DumpObjectToDisk) {
+      auto ObjectFilePath = writeObjectFile(BC, Filler);
+      if (Error E = ObjectFilePath.takeError()) {
+        InstrBenchmark.Error = toString(std::move(E));
+        return InstrBenchmark;
+      }
+      outs() << "Check generated assembly with: /usr/bin/objdump -d "
+             << *ObjectFilePath << "\n";
+      ObjectFile = getObjectFromFile(*ObjectFilePath);
+    } else {
+      SmallString<0> Buffer;
+      raw_svector_ostream OS(Buffer);
+      if (Error E = assembleToStream(
+              State.getExegesisTarget(), State.createTargetMachine(),
+              BC.LiveIns, BC.Key.RegisterInitialValues, Filler, OS)) {
+        return std::move(E);
+      }
+      ObjectFile = getObjectFromBuffer(OS.str());
+    }
 
-  object::OwningBinary<object::ObjectFile> ObjectFile;
-  if (DumpObjectToDisk) {
-    auto ObjectFilePath = writeObjectFile(BC, Filler);
-    if (Error E = ObjectFilePath.takeError()) {
+    const FunctionExecutorImpl Executor(State, std::move(ObjectFile),
+                                        Scratch.get());
+    auto NewMeasurements = runMeasurements(Executor);
+    if (Error E = NewMeasurements.takeError()) {
+      if (!E.isA<SnippetCrash>())
+        return std::move(E);
       InstrBenchmark.Error = toString(std::move(E));
       return InstrBenchmark;
     }
-    outs() << "Check generated assembly with: /usr/bin/objdump -d "
-           << *ObjectFilePath << "\n";
-    ObjectFile = getObjectFromFile(*ObjectFilePath);
-  } else {
-    SmallString<0> Buffer;
-    raw_svector_ostream OS(Buffer);
-    if (Error E = assembleToStream(State.getExegesisTarget(),
-                                   State.createTargetMachine(), BC.LiveIns,
-                                   BC.Key.RegisterInitialValues, Filler, OS)) {
-      return std::move(E);
+    assert(InstrBenchmark.NumRepetitions > 0 && "invalid NumRepetitions");
+    for (BenchmarkMeasure &BM : *NewMeasurements) {
+      // Scale the measurements by instruction.
+      BM.PerInstructionValue /= InstrBenchmark.NumRepetitions;
+      // Scale the measurements by snippet.
+      BM.PerSnippetValue *= static_cast<double>(Instructions.size()) /
+                            InstrBenchmark.NumRepetitions;
+    }
+    if (InstrBenchmark.Measurements.empty()) {
+      InstrBenchmark.Measurements = std::move(*NewMeasurements);
+      continue;
     }
-    ObjectFile = getObjectFromBuffer(OS.str());
-  }
 
-  const FunctionExecutorImpl Executor(State, std::move(ObjectFile),
-                                      Scratch.get());
-  auto Measurements = runMeasurements(Executor);
-  if (Error E = Measurements.takeError()) {
-    if (!E.isA<SnippetCrash>())
-      return std::move(E);
-    InstrBenchmark.Error = toString(std::move(E));
-    return InstrBenchmark;
-  }
-  InstrBenchmark.Measurements = std::move(*Measurements);
-  assert(InstrBenchmark.NumRepetitions > 0 && "invalid NumRepetitions");
-  for (BenchmarkMeasure &BM : InstrBenchmark.Measurements) {
-    // Scale the measurements by instruction.
-    BM.PerInstructionValue /= InstrBenchmark.NumRepetitions;
-    // Scale the measurements by snippet.
-    BM.PerSnippetValue *= static_cast<double>(Instructions.size()) /
-                          InstrBenchmark.NumRepetitions;
+    assert(Repetitors.size() > 1 && !InstrBenchmark.Measurements.empty() &&
+           "We're in an 'min' repetition mode, and need to aggregate new "
+           "result to the existing result.");
+    assert(InstrBenchmark.Measurements.size() == NewMeasurements->size() &&
+           "Expected to have identical number of measurements.");
+    for (auto I : zip(InstrBenchmark.Measurements, *NewMeasurements)) {
+      BenchmarkMeasure &Measurement = std::get<0>(I);
+      BenchmarkMeasure &NewMeasurement = std::get<1>(I);
+      assert(Measurement.Key == NewMeasurement.Key &&
+             "Expected measurements to be symmetric");
+
+      Measurement.PerInstructionValue = std::min(
+          Measurement.PerInstructionValue, NewMeasurement.PerInstructionValue);
+      Measurement.PerSnippetValue =
+          std::min(Measurement.PerSnippetValue, NewMeasurement.PerSnippetValue);
+    }
   }
 
+  // We successfully measured everything, so don't discard the results.
+  CBOR.disarm();
   return InstrBenchmark;
 }
 

diff  --git a/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.h b/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.h
index 1a8866705d08..b0fdb34450ee 100644
--- a/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.h
+++ b/llvm/tools/llvm-exegesis/lib/BenchmarkRunner.h
@@ -40,7 +40,7 @@ class BenchmarkRunner {
 
   Expected<InstructionBenchmark>
   runConfiguration(const BenchmarkCode &Configuration, unsigned NumRepetitions,
-                   const SnippetRepetitor &Repetitor,
+                   ArrayRef<std::unique_ptr<const SnippetRepetitor>> Repetitors,
                    bool DumpObjectToDisk) const;
 
   // Scratch space to run instructions that touch memory.

diff  --git a/llvm/tools/llvm-exegesis/lib/SnippetRepetitor.cpp b/llvm/tools/llvm-exegesis/lib/SnippetRepetitor.cpp
index ba618ac3f97e..c866e972a1c4 100644
--- a/llvm/tools/llvm-exegesis/lib/SnippetRepetitor.cpp
+++ b/llvm/tools/llvm-exegesis/lib/SnippetRepetitor.cpp
@@ -110,6 +110,8 @@ SnippetRepetitor::Create(InstructionBenchmark::RepetitionModeE Mode,
     return std::make_unique<DuplicateSnippetRepetitor>(State);
   case InstructionBenchmark::Loop:
     return std::make_unique<LoopSnippetRepetitor>(State);
+  case InstructionBenchmark::AggregateMin:
+    break;
   }
   llvm_unreachable("Unknown RepetitionModeE enum");
 }

diff  --git a/llvm/tools/llvm-exegesis/llvm-exegesis.cpp b/llvm/tools/llvm-exegesis/llvm-exegesis.cpp
index 3adc9f0ca19e..ce3a31c12d4a 100644
--- a/llvm/tools/llvm-exegesis/llvm-exegesis.cpp
+++ b/llvm/tools/llvm-exegesis/llvm-exegesis.cpp
@@ -86,10 +86,14 @@ static cl::opt<exegesis::InstructionBenchmark::ModeE> BenchmarkMode(
 static cl::opt<exegesis::InstructionBenchmark::RepetitionModeE> RepetitionMode(
     "repetition-mode", cl::desc("how to repeat the instruction snippet"),
     cl::cat(BenchmarkOptions),
-    cl::values(clEnumValN(exegesis::InstructionBenchmark::Duplicate,
-                          "duplicate", "Duplicate the snippet"),
-               clEnumValN(exegesis::InstructionBenchmark::Loop, "loop",
-                          "Loop over the snippet")));
+    cl::values(
+        clEnumValN(exegesis::InstructionBenchmark::Duplicate, "duplicate",
+                   "Duplicate the snippet"),
+        clEnumValN(exegesis::InstructionBenchmark::Loop, "loop",
+                   "Loop over the snippet"),
+        clEnumValN(exegesis::InstructionBenchmark::AggregateMin, "min",
+                   "All of the above and take the minimum of measurements")),
+    cl::init(exegesis::InstructionBenchmark::Duplicate));
 
 static cl::opt<unsigned>
     NumRepetitions("num-repetitions",
@@ -285,7 +289,22 @@ void benchmarkMain() {
 
   const auto Opcodes = getOpcodesOrDie(State.getInstrInfo());
 
-  const auto Repetitor = SnippetRepetitor::Create(RepetitionMode, State);
+  SmallVector<std::unique_ptr<const SnippetRepetitor>, 2> Repetitors;
+  if (RepetitionMode != InstructionBenchmark::RepetitionModeE::AggregateMin)
+    Repetitors.emplace_back(SnippetRepetitor::Create(RepetitionMode, State));
+  else {
+    for (InstructionBenchmark::RepetitionModeE RepMode :
+         {InstructionBenchmark::RepetitionModeE::Duplicate,
+          InstructionBenchmark::RepetitionModeE::Loop})
+      Repetitors.emplace_back(SnippetRepetitor::Create(RepMode, State));
+  }
+
+  BitVector AllReservedRegs;
+  llvm::for_each(Repetitors,
+                 [&AllReservedRegs](
+                     const std::unique_ptr<const SnippetRepetitor> &Repetitor) {
+                   AllReservedRegs |= Repetitor->getReservedRegs();
+                 });
 
   std::vector<BenchmarkCode> Configurations;
   if (!Opcodes.empty()) {
@@ -298,8 +317,8 @@ void benchmarkMain() {
                << ": ignoring instruction without sched class\n";
         continue;
       }
-      auto ConfigsForInstr =
-          generateSnippets(State, Opcode, Repetitor->getReservedRegs());
+
+      auto ConfigsForInstr = generateSnippets(State, Opcode, AllReservedRegs);
       if (!ConfigsForInstr) {
         logAllUnhandledErrors(
             ConfigsForInstr.takeError(), errs(),
@@ -324,7 +343,7 @@ void benchmarkMain() {
 
   for (const BenchmarkCode &Conf : Configurations) {
     InstructionBenchmark Result = ExitOnErr(Runner->runConfiguration(
-        Conf, NumRepetitions, *Repetitor, DumpObjectToDisk));
+        Conf, NumRepetitions, Repetitors, DumpObjectToDisk));
     ExitOnFileError(BenchmarkFile, Result.writeYaml(State, BenchmarkFile));
   }
   exegesis::pfm::pfmTerminate();