[PATCH] D77422: [llvm-exegesis] Add benchmark mode that uses LBR for more precise measurements.

Wed Jun 3 10:25:32 PDT 2020

ondrasej added inline comments.

================
Comment at: llvm/docs/CommandGuide/llvm-exegesis.rst:197

+ `latency` mode can be  make use of either RDTSC or LBR.
+ `latency[LBR]` is only available on X86 (at least `Skylake`).
----------------
This is more of a philosophical comment: IIRC, the LBR records the times when the branch instructions are retired. With correct branch prediction, the instructions in the measured block would start executing speculatively before the branch is retired, and so the number we get is probably closer to the throughput.

================
Comment at: llvm/docs/CommandGuide/llvm-exegesis.rst:211
+  could occur if the sampling is too frequent. A prime number should be used to
+  avoid consistently skipping certain blocks.
+  
----------------
I'm wondering how this applies to this use case. The issue with basic blocks skipped might be when sampling all basic blocks in a larger application over a longer period of time, but this is not our case.
In our case, we're interested in a single basic block, and run it in a tight loop for relatively long, so that the LBR sampling catches this exact basic block.

================
Comment at: llvm/tools/llvm-exegesis/lib/X86/X86Counter.cpp:87
+    // Read the perf_branch_entry array.
+    for (uint64_t i = 0; i < Count; ++i) {
+      CycleArray->push_back(Entry.cycles);
----------------
I think we also need to filter by the addresses of the branches (or store them) to make sure that we take the cycles only for the measured basic block, not for the other basic blocks surrounding it.

================
Comment at: llvm/tools/llvm-exegesis/llvm-exegesis.cpp:291
+  if (!ExpectedHostCpu.empty()) {
+    // The actual name could include variations, such as "skylake" vs
+    // "skylake-avx512" so we don't look for exact match.
----------------
courbet wrote:
> This is a bit brittle, because we could imagine the name of some unrelated CPUs being substrings of others. What about having  a repeated option `--allowed-host-cpu=skylake --allowed-host-cpu=skylake-avx512 --allowed-host-cpu=whateverlake`, and check that the exact value is one of these ?
Ideally, this should be defined in terms of CPU features (e.g. CPUID bits). Even better - each target should know which counters it supports, based on its platform-specific feature discovery mechanism. I understand that this would be a huge change for this CL, but we should at least have a FIXME here.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77422/new/

https://reviews.llvm.org/D77422