[PATCH] D77422: [llvm-exegesis] Add benchmark mode that uses LBR for more precise measurements.

Wed Apr 8 07:02:34 PDT 2020

courbet added inline comments.

================
Comment at: llvm/docs/CommandGuide/llvm-exegesis.rst:210
+  could occur if the sampling is too frequent. A prime number should be used to
+  avoid hiding blocks.
+  With various testing/experiments, `521` seems like the best value so far.
----------------
Can you expand on what "hiding blocks" means ? e.g. an external reference ?

================
Comment at: llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp:90
+
+class LbrFunctionExecutorImpl : public BenchmarkRunner::FunctionExecutor {
+public:
----------------
Apart from the PerfEvent, this does not really have anything to do with LBR.
On the one hand, this is not really a full-fledged `FunctionExecutor`, as it just throws away the `CounterName` argument and always measures the latency. This would be an argument for moving this to the X86 backend.
On the other hand, this has a lot of target-independent code, so it's a good thing that other targets can use it. The main issue, and the only thing that's really specific to LBR here is the perf event. Letting the target create the perf event as well as the counter does fix this.

So what we can do:
 - change `Target::createCounter` to take a counter name rather than a `PerfEvent`. The default implementation creates the PerfEvent, validates it and wraps it in a `Counter`. The X86 implementation tests if we're asking for the cycles counter in LBR mode, and creates the relevant `PerfEvent` and `LBRCounter`. It falls back to the default implementation else.
 - We stop ignoring the counter name argument in `LbrFunctionExecutorImpl::runAndMeasure`, and just propagate it to the ExegesisTarget.
 - We rename `LbrFunctionExecutorImpl` into `WorkerThreadFunctionExecutorImpl` (and `FunctionExecutorImpl` could be renamed to  `SameThreadFunctionExecutor`, and everything becomes target agnostic.

================
Comment at: llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp:198
+  if (Mode == InstructionBenchmark::ModeE::Latency) {
+    if (Arg.LbrSamplePeriod > 0) {
+      if (supportsLbr(State)) {
----------------
Again, this is too specific to X86. I think it's one of these cases where is actually makes more sense for the generic code to know nothing about LBR. What about:
 - Moving `LbrSamplePeriod` from `llvm-exegesis` to `X86/Target.cpp`
 - Making createFunctionExecutor() a virtual member of the target. The default implementation returns a `FunctionExecutorImpl`.

================
Comment at: llvm/tools/llvm-exegesis/lib/BenchmarkRunner.h:50

+  // TODO(vyng) Make this "Deprecated" and switch caller to using RunArg
   Expected<InstructionBenchmark>
----------------
The LLVM style is `FIXME`

================
Comment at: llvm/tools/llvm-exegesis/lib/X86/Target.cpp:32
+
+#include <atomic>
+#include <cassert>
----------------
do we really need all these ?

================
Comment at: llvm/tools/llvm-exegesis/lib/X86/Target.cpp:594
+      return llvm::make_error<llvm::StringError>(
+          "LBR counter requested without PFM package available..",
+          llvm::errc::invalid_argument);
----------------
[nit] `s/PFM package available../HAVE_LIBPFM/` the package might be available but specifically disabled

================
Comment at: llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp:16
+#include <poll.h>
+#include <sys/mman.h>
+#include <unistd.h>
----------------
Does this work under windows ? If not, please use one of the LLVM wrappers.

================
Comment at: llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp:29
+// Waits for the LBR perf events.
+int pollLbrPerfEvent(const int FileDescriptor) {
+  struct pollfd PollFd;
----------------
internal functions use a `static` keyword instead of anonymous namespace in the LLVM style guide.

================
Comment at: llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp:34
+  PollFd.revents = 0;
+  return poll(&PollFd, 1 /* num of fds */, 10000 /* time out */);
+}
----------------
`s/time out/timeout`

================
Comment at: llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp:38
+// Copies the data-buffer into Buf, given the pointer to MMapped.
+void copyDataBuffer(void *MMappedBuffer, char *Buf, uint64_t Tail,
+                    size_t DataSize) {
----------------
`static`

================
Comment at: llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp:52
+
+// Rewrites the jmp (from the BM loop) with `pop $rbx; ret` to get
+// out of the benchmarked code.
----------------
This would benefit from a pointer to documentation (or argued speculation about) whether rewriting code inside a loop for another thread is valid and how it behaves w.r.t. caches and DSB.

================
Comment at: llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp:56
+// pc: Must point at the start of the jmp instruction.
+void patchBasicBlockToEndBenchmarkedLoop(char *pc) const {
+  void *page = reinterpret_cast<void *>(reinterpret_cast<uintptr_t>(pc) &
----------------
`static`

================
Comment at: llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp:83
+// out the benchmark loop.
+llvm::Error parseDataBuffer(const char const *DataBuf, size_t DataSize,
+                            std::vector<int64_t> *CycleArray,
----------------
`static`

================
Comment at: llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp:125
+
+} // namespace
+
----------------
you can remove this now

================
Comment at: llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp:197
+    if (!error) {
+      // TODO(vyng) Analyse the array and get proper value.
+      return CycleArray[0];
----------------
`FIXME`. Also, please give more explanations so that somebody else can look at that too ;)

================
Comment at: llvm/tools/llvm-exegesis/lib/X86/X86Counter.h:1
+#include "../PerfHelper.h"
+#include "llvm/Support/Error.h"
----------------
This file, and the cpp one, are missing the LLVM headers.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77422/new/

https://reviews.llvm.org/D77422