[PATCH] D60401: [llvm-exegesis] When generating templates with chained instructions, also add templates for helper instructions

Thu Apr 11 04:31:58 PDT 2019

gchatelet added a comment.

I understand the motivation behind this change but I think we need a more principled approach. I'll try to sum up my reasoning here.
To me there are two useful modes for `llvm-exegesis`:

- We want to look at a particular instruction latency - useful when we want to quickly check an assumption,
- We want to get the latency for all the instructions - useful to fully characterize or check a processor.

The analysis tool only makes sense for the second bullet.

For Latency analysis, an instruction falls in one of the following benchmarking modes:

1. Infeasible (privileged instructions, inadequate control flow e.g `HLT`)
2. Measurable in isolation
3. Measurable through another instruction

For 2, we want to explore all the dimensions of the instruction:

- Impact of choosing several time the same register (`XOR EAX, EAX, EAX`) or different ones (`XOR EAX, EBX, ECX`)
- Impact of choosing special values for immediates (`IMUL EAX, EAX, 0` or special values for floating point numbers `sNaN` `qNaN` `±∞` `±0` normal and denormal)

For 3, this is 2 combined with a second instruction. The paired instruction is another dimension to explore. Because we're mostly interested in the behavior of the first instruction, we don't need to explore all of this dimension. We can restrict ourselves to the compatible instructions with the less degrees of freedom.

For this exploration to be efficient (manageable) we can't eagerly generate all of the templates. We need a preprocessing step to gather which instructions belong to 2 or 3 - and for the ones in 3 which set of instructions is worth considering, then generate a dependency graph and process the instructions in an order that would allow to deduce the latencies for the instructions in 3, but still exploring the dimensions for 2.

In this automated mode (second bullet) the values for instructions in 3 can be recovered by solving an Ordinary Least Square <https://en.wikipedia.org/wiki/Ordinary_least_squares>. The recovered measurements can then be processed by the analysis tool.
I've started working on this, I just need to dedicate more time to it.

================
Comment at: tools/llvm-exegesis/lib/SnippetGenerator.cpp:106
+        llvm::for_each(
+            llvm::make_range(std::make_move_iterator(Templates.begin()),
+                             std::make_move_iterator(Templates.end())),
----------------
Can't you just:
```
std::move(Templates.begin(), Templates.end(), std::back_inserter(FinalTemplates)); )
```

================
Comment at: unittests/tools/llvm-exegesis/X86/SnippetGeneratorTest.cpp:199

+TEST_F(LatencySnippetGeneratorTest, SETCCrExaustive) {
+  const unsigned Opcode = llvm::X86::SETCCr;
----------------
`SETCCrExhaustive`

================
Comment at: unittests/tools/llvm-exegesis/X86/SnippetGeneratorTest.cpp:213
+    // IT0: PrimaryInstruction0[, SecondaryInstruction0[, ...]]
+    // IT1: PrimaryInstruction1(==SecondaryInstruction0)[, ...]
+    // ...
----------------
I don't understand this second line.

================
Comment at: unittests/tools/llvm-exegesis/X86/SnippetGeneratorTest.cpp:224
+  ASSERT_THAT(SecondaryInstructions, SizeIs(Gt(0U)))
+      << "Some secondary templates are avaliable";
+
----------------
`available`

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60401/new/

https://reviews.llvm.org/D60401