[PATCH] D77422: [llvm-exegesis] Add benchmark mode that uses LBR for more precise measurements.
Ondrej Sykora via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jun 4 03:45:31 PDT 2020
ondrasej added inline comments.
================
Comment at: llvm/tools/llvm-exegesis/llvm-exegesis.cpp:291
+ if (!ExpectedHostCpu.empty()) {
+ // The actual name could include variations, such as "skylake" vs
+ // "skylake-avx512" so we don't look for exact match.
----------------
oontvoo wrote:
> ondrasej wrote:
> > courbet wrote:
> > > This is a bit brittle, because we could imagine the name of some unrelated CPUs being substrings of others. What about having a repeated option `--allowed-host-cpu=skylake --allowed-host-cpu=skylake-avx512 --allowed-host-cpu=whateverlake`, and check that the exact value is one of these ?
> > Ideally, this should be defined in terms of CPU features (e.g. CPUID bits). Even better - each target should know which counters it supports, based on its platform-specific feature discovery mechanism. I understand that this would be a huge change for this CL, but we should at least have a FIXME here.
> Actually, I had a question here that I was gonna ask in an email But here goes.
>
> We know that the LBR formats could be queried from the perf-cap MSR
> Specifically, we want: `MSR IA32_PERF_CAPABILITIES[5:0]" == 000110B` (bit `59...63` is not relevant)
>
> If I'm not mistaken, `perf_event_mmap_page::capabilities` should(?) give us that.
> Except, when I run this on both Broadwell and Skylake, the `capabilities` field has value of `30` for both platforms. (It shouldn't be). Of course, the difference here is that the `cycle` entries are all zeroes on Broadwell .
> I haven't looked in details, so I don't know if this is some implementation detail of the libpfm or if we're just mis-interpreting the SDM here.
>
> Thoughts?
This is strange.
I understand the SDM section about IA32_PERF_CAPABILITIES the same way, i.e. we should see 000110B if there are timings, and this should happen only on Skylake or onwards.
That said, I don't think that perf_event_mmap_page::capabilities contains the IA32_PERF_CAPABILITIES MSR. Instead, it looks like a struct that is defined and filled by the kernel. Check out the [[ https://www.man7.org/linux/man-pages/man2/perf_event_open.2.html | definition/docs ]] of the struct (and [[ https://elixir.bootlin.com/linux/v5.7/source/arch/x86/events/core.c#L2374 | the code that fills it ]]). They set individual bit fields (and even the bit field defined in the union does not match the other bits defined in the MSR).
After some digging through kernel sources, I see where the kernel keeps the data internally, but I didn't see a way to get it out of the kernel. We might need to do more research on that.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D77422/new/
https://reviews.llvm.org/D77422
More information about the llvm-commits
mailing list