[PATCH] D77422: [llvm-exegesis] Add benchmark mode that uses LBR for more precise measurements.

Thu Apr 9 19:03:37 PDT 2020

oontvoo added a comment.

Thanks for the thorough reviews! :-)

================
Comment at: llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp:90
+
+class LbrFunctionExecutorImpl : public BenchmarkRunner::FunctionExecutor {
+public:
----------------
courbet wrote:
> Apart from the PerfEvent, this does not really have anything to do with LBR.
> On the one hand, this is not really a full-fledged `FunctionExecutor`, as it just throws away the `CounterName` argument and always measures the latency. This would be an argument for moving this to the X86 backend.
> On the other hand, this has a lot of target-independent code, so it's a good thing that other targets can use it. The main issue, and the only thing that's really specific to LBR here is the perf event. Letting the target create the perf event as well as the counter does fix this.
> 
> So what we can do:
>  - change `Target::createCounter` to take a counter name rather than a `PerfEvent`. The default implementation creates the PerfEvent, validates it and wraps it in a `Counter`. The X86 implementation tests if we're asking for the cycles counter in LBR mode, and creates the relevant `PerfEvent` and `LBRCounter`. It falls back to the default implementation else.
>  - We stop ignoring the counter name argument in `LbrFunctionExecutorImpl::runAndMeasure`, and just propagate it to the ExegesisTarget.
>  - We rename `LbrFunctionExecutorImpl` into `WorkerThreadFunctionExecutorImpl` (and `FunctionExecutorImpl` could be renamed to  `SameThreadFunctionExecutor`, and everything becomes target agnostic.
Done - instead of moving the executor out, I've just let the Target decides whether to use WorkerThread executor or otherwise. 

================
Comment at: llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp:16
+#include <poll.h>
+#include <sys/mman.h>
+#include <unistd.h>
----------------
courbet wrote:
> Does this work under windows ? If not, please use one of the LLVM wrappers.
Per offline discussion, we'll ifdef this out for non-linux and will add support for windows in follow-up CLs

================
Comment at: llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp:52
+
+// Rewrites the jmp (from the BM loop) with `pop $rbx; ret` to get
+// out of the benchmarked code.
----------------
courbet wrote:
> This would benefit from a pointer to documentation (or argued speculation about) whether rewriting code inside a loop for another thread is valid and how it behaves w.r.t. caches and DSB.
I meant to rewrite this hack to use a condition var for stopping the loop from another thread,  per Ondrej's suggestion . 
I just haven't got to it yet :-) ...  (would need more changes in the assembler.cpp )

So I'll add a FIXME for next CLs

================
Comment at: llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp:197
+    if (!error) {
+      // TODO(vyng) Analyse the array and get proper value.
+      return CycleArray[0];
----------------
courbet wrote:
> `FIXME`. Also, please give more explanations so that somebody else can look at that too ;)
Per offline discussions,  this requires re-working the read() interface because we need to return more than one value.
(Specifically, the lbr-entries offer a list of cycles-latencies for a bunch of consec branches ... we don't want to just ignore the older branches and only look at the most recent)

But because this patch is already quite big , we'll keep this as TODO and follow up.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77422/new/

https://reviews.llvm.org/D77422