[llvm-dev] [RFC] Implementing the BHive methodology in llvm-exegesis
Ondrej Sykora via llvm-dev
llvm-dev at lists.llvm.org
Fri Mar 19 02:16:55 PDT 2021
I'm sorry to revive such an old thread. Due to lack of time, we did not
make progress as fast as we planned. We started building a prototype based
on the original proposal
but we found a couple of blockers:
- we found it very difficult to implement the interaction between the
llvm-exegesis process and the child process running the benchmarked code
using the LLDB API.
- in the meantime, the MIT team continued development of the BHive
algorithm, and replaced most of the assembly with C. The new code is
simpler and easier to port to other architectures.
Based on our experience, we're considering a simpler approach compared to
the original proposal
- if possible, we will use the same C-oriented design as the latest version
of the tool developed at MIT.
- we will focus on a Linux and x86-64 implementation first, with porting to
other architectures (but not operating systems) in mind. This would allow
us to depend on the stable and well-defined Linux syscall interface. To our
best knowledge, llvm-exegesis is already limited to Linux because of its
dependence on the Linux perf subsystem, so this does not create any new
- we would use ptrace <https://en.wikipedia.org/wiki/Ptrace> as a simpler
and more powerful alternative to LLDB. It is a syscall, so it does not
introduce any new external library dependencies.
- in the same spirit, we will depend on the mmap and munmap syscalls rather
than on their abstractions.
Let us know what you think!
On Mon, Jan 27, 2020 at 1:21 PM Ondrej Sykora <ondrasej at google.com> wrote:
> Hi Clement,
> thanks for the feedback!
> On Fri, Jan 17, 2020 at 11:47 AM Clement Courbet <courbet at google.com>
>> On Thu, Jan 16, 2020 at 6:32 PM Ondrej Sykora via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>> In a recent IISWC paper
>>> we've proposed BHive - a new methodology for benchmarking arbitrary basic
>>> blocks that has several advantages over the one currently used in
>>> llvm-exegesis. In particular, the new methodology:
>>> - automatically handles memory accesses in the basic block, without the
>>> need to manually annotate live-ins,
>>> - maps all memory addresses accessed by the basic block to the same
>>> page, significantly reducing the probability of cache misses during
>>> - the benchmarked code runs in a separate process, reducing risks of
>>> compromising the monitor process memory,
>>> - computes the throughput in a way that subtracts away the effects of
>>> the scaffolding code.
>> I've never actually seen a case where the scaffolding code had much
>> influence on the results (at least on X86), especially in loop mode.
>> However, I can see some value in snippet mode (not generated code mode):
>> this allows the snippet code to exhaust all available registers and still
>> be measurable.
> Yes, our main goal is benchmarking arbitrary basic blocks, where we do not
> control the register allocation.
>> A possible challenge is increased complexity of the code: BHive uses a
>>> separate process to run the benchmarked basic block and changes memory
>>> mapping of the process to ensure that all memory accesses lead to the same
>>> page. Most operating systems have the necessary APIs, but these may differ
>>> significantly. In particular, the Windows API for memory mapping and
>>> process creation/control is very different from the Unix world. Initially,
>>> we might be able to support the new methodology only on Linux and Unix-like
>> Though I think it's fine to have linux only as an initial implementation,
>> I think there should be a clear plan to support windows: there are people
>> in the LLVM community who are using llvm-exegesis on windows (e.g. folks at
>> Sony). Note that you might be able to reuse some code in LLVM: compiler-rt
>> already has an abstraction layer in "WindowsMMap.c" on top of MapViewOfFile.
> Thanks for the pointers! That said, replacing mmap is relatively
> straightforward. The difficult part is replacing munmap, which does not
> have a direct equivalent on Windows and you need to query the system for
> all mapped blocks, and then unmap them one by one. This is a very specific
> functionality, and I'd be surprised if someone implemented that.
>> Before we start the implementation, we would like to collect feedback on
>>> the proposed design:
>>> - We're planning to implement the methodology as a new implementation of
>>> BenchmarkRunner::FunctionExecutor that will exist alongside the current
>>> runner. The existing functionality will be preserved, and the user will be
>>> able to select the benchmark runner using a command-line flag.
>>> - We're considering using the LLDB API to control the execution of the
>>> benchmarking process in a platform-independent way.
>> I think it's a great idea to avoid introducing any other external
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev