[PATCH] D77422: [llvm-exegesis] Add benchmark mode that uses LBR for more precise measurements.

Ondrej Sykora via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jun 4 03:45:31 PDT 2020


ondrasej added inline comments.


================
Comment at: llvm/tools/llvm-exegesis/llvm-exegesis.cpp:291
+  if (!ExpectedHostCpu.empty()) {
+    // The actual name could include variations, such as "skylake" vs
+    // "skylake-avx512" so we don't look for exact match.
----------------
oontvoo wrote:
> ondrasej wrote:
> > courbet wrote:
> > > This is a bit brittle, because we could imagine the name of some unrelated CPUs being substrings of others. What about having  a repeated option `--allowed-host-cpu=skylake --allowed-host-cpu=skylake-avx512 --allowed-host-cpu=whateverlake`, and check that the exact value is one of these ?
> > Ideally, this should be defined in terms of CPU features (e.g. CPUID bits). Even better - each target should know which counters it supports, based on its platform-specific feature discovery mechanism. I understand that this would be a huge change for this CL, but we should at least have a FIXME here.
> Actually, I had a question here that I was gonna ask in an email But here goes.
> 
> We know that the LBR formats could be queried from the perf-cap MSR
> Specifically, we want: `MSR IA32_PERF_CAPABILITIES[5:0]" == 000110B`  (bit `59...63` is not relevant)
> 
> If I'm not mistaken, `perf_event_mmap_page::capabilities` should(?)  give us that.
> Except, when I run this on both Broadwell and Skylake, the `capabilities` field has value of `30` for both platforms. (It  shouldn't be).   Of course, the difference here is that the `cycle` entries are all zeroes on Broadwell .
> I haven't looked in details, so I don't know if this is some implementation detail of the libpfm or if we're just mis-interpreting the SDM here.
> 
> Thoughts?
This is strange.

I understand the SDM section about IA32_PERF_CAPABILITIES the same way, i.e. we should see 000110B if there are timings, and this should happen only on Skylake or onwards.

That said, I don't think that perf_event_mmap_page::capabilities contains the IA32_PERF_CAPABILITIES MSR. Instead, it looks like a struct that is defined and filled by the kernel. Check out the [[ https://www.man7.org/linux/man-pages/man2/perf_event_open.2.html | definition/docs ]] of the struct (and [[ https://elixir.bootlin.com/linux/v5.7/source/arch/x86/events/core.c#L2374 | the code that fills it ]]). They set individual bit fields (and even the bit field defined in the union does not match the other bits defined in the MSR).

After some digging through kernel sources, I see where the kernel keeps the data internally, but I didn't see a way to get it out of the kernel. We might need to do more research on that.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77422/new/

https://reviews.llvm.org/D77422





More information about the llvm-commits mailing list