[PATCH] D85254: [llvm-exegesis] Add option to check the hardware support for a given feature before benchmarking.

Ondrej Sykora via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Oct 1 04:48:49 PDT 2020


ondrasej added a comment.

In D85254#2305179 <https://reviews.llvm.org/D85254#2305179>, @hliao wrote:

> This change is reverted as, on hosts without LBR supported but with LIBPFM installed and used, this change makes `llvm/test/tools/llvm-exegesis/X86/lbr/mov-add.s` failed. On that host, `perf_event_open` fails with `EOPNOTSUPP` on LBR config. That change's basic assumption
>
>> If this is run on a non-supported hardware, it will produce all zeroes for latency.
>
>       
>
> could not stand as `perf_event_open` system call will fail if the underlying hardware really doesn't have LBR supported.

**tl;dr:** This patch is still needed because on certain microarchitectures, the LBR is present but it doesn't provide timing information. This patch detects these cases and adds an appropriate error message.

The issue here is that the LBR is not either available or not, but it has multiple versions that have different features and different output formats (see the Intel SDM, vol 3B, section 17.4, if you're interested in the details). The Linux kernel takes care of most of it, abstracts away the details, and transforms the data from the hardware format into a unified format defined in `perf_event.h`.
Earlier versions of the LBR in Haswell and Broadwell had only the addresses of the branches, but no timing information. On these hosts, the kernel returns the addresses as usual, but it puts zero cycles for all branches. Moreover, the LBR format and feature support is readable only in kernel mode (it's provided through a MSR, not in CPUID) and the Linux kernel doesn't expose it to user mode.

So, depending on the microarchitecture of the host, one of the following may happen:

- if the host doesn't have LBR support at all, `perf_event_open` will fail as you said.
- if the host supports LBR but it's an older microarchitecture (Haswell, Broadwell, and maybe others), `perf_event_open` and reading from the LBR will succeed, but it will contain zero cycles for all branches, as provided by the Linux kernel. The goal of this patch is to detect this situation and report an error rather than return invalid numbers.
- only if the host supports LBR //and// LBR timings (i.e. it is a Skylake or newer), will we get the number of cycles between the branches.

We need to find a way to make the `mov-add.s` test behave in all three cases mentioned above/regardless of the microarchitecture it runs on. But this patch still handles valid situations.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85254/new/

https://reviews.llvm.org/D85254



More information about the llvm-commits mailing list