[llvm-dev] RFC: Reconsidering adding gmock to LLVM's unittest utilities

Wed Jan 4 06:11:25 PST 2017

A long time ago I suggested that we might want to add gmock to compliment
the facilities provided by gtest in LLVM's unittests. It didn't go over
well:

1) There was concern over the benefit vs. the cost
2) Also concern about what the facilities would look like in practice and
whether they would actually help
3) At the time, I didn't have good, large examples of what these things
might look like or why they might be attractive
4) I didn't provide any real explanation of what gmock *did* and so it was
vague and unclear.

Since then, a lot has changed. We have more heavy use of unit testing in
the project with more developers finding benefit from it. And I think I
have compelling examples.

## Matchers

To start off, it is important to understand that there are two components
to what gmock offers. The first has very little to do with "mocks". It is
actually a matcher language and system for writing test predicates:

  EXPECT_EQ(expected, actual);
  EXPECT_NE(something, something);

Become instead:

  EXPECT_THAT(actual, Eq(expected));
  EXPECT_THAT(actual, Ne(not-expected));

This pattern moves the *matcher* out of the *macro*, giving it a proper C++
API. With that, we get two huge benefits: extensibility and composability.
You can easily write a matcher that summarizes concisely the expectation
for custom data types. And you can compose these matchers in powerful ways.
I'll give one example here:

  EXPECT_THAT(MyDenseMap, UnorderedElementsAre(Eq(key1, value1), Eq(key2,
value2), Eq(key3, value3)));

Here I'm composing equality matchers inside a matcher that can handle
*unordered* container element-wise comparison for generic, arbitrary
containers. With a small patch, I've even extended it to support arbitrary
iterator ranges! Combine this with custom matchers for the elements, and it
becomes a very expressive an declarative way to write expectations in tests.

I wanted to give a realistic and compelling example so I rewrote an entire
test: https://reviews.llvm.org/D28290 Note that I moved *every* EXPECT to
the new syntax so this is essentially worst-case. It also involves a
non-trivial custom matcher. Despite this, the code is shorter, easier to
read and easier to maintain. It has fewer unnecessary orderings enforced.
And it is much easier to extend. Also, the error messages when it fails are
substantially improved because these composed matchers have logic to
carefully explain *why* they failed to match.

I hope folks find this compelling. I think this alone is worth carrying the
gmock code in tree -- it is just used by tests and not substantially larger
than gtest. Even if we decide we want nothing to do with mocks, I would
very much like to have the matchers.

## Mocks

So, now let's consider mocks. First off, what are mocks? I'll give a fairly
casual definition here: they are test objects which implement some API and
allow the test to explicitly set expectations on how that API is used and
how it in turn should behave. For a more detailed vocabulary see [1] and
for a more lengthy discussion see [2].

As came up in the original discussion, LLVM relatively infrequently has a
need to test API interactions in this way. Usually we're in the business of
translating things from format A to B (instructions, metadata, whatever)
and can write down one format and write checks against the other format for
tests. This is a wonderful world to live in with tests. I never want LLVM
to *decrease* how much we leverage this.

But we *do* have API interactions that we need to test. We have plugin
APIs, and hookable interfaces, ranging from Clang frontend actions to JIT
listeners. We also have *generic* code in ADT that is all about API
interactions. Most generic code in fact is -- we want it to work for *any*
T that behaves in a certain way, so we need to give it interesting Ts to
test it.

My immediate example is the pass manager. We plug in a bunch of passes to
it, and expect it to run them in a precise way over specific bits of IR.
When testing this, it is extremely cumbersome to write a test pass which
does this in interesting and yet controllable and comprehensible ways.
Let's look at a concrete example:

https://github.com/llvm-project/llvm-project/blob/master/llvm/unittests/IR/PassManagerTest.cpp#L481-L509

Here we have over 20 lines of code spent testing that the correct set of
things happened the correct number of times. I had to write a long comment
just to explain what these numbers mean. And I still never understand
whether a change in the numbers really means a good or bad thing.

Now, we *have* detailed logging based tests use FileCheck which is the
primary way to avoid this in LLVM. But it isn't enough. In these tests we
want to carefully *permute* the behavior of very specific runs of
individual passes. A simple example of this can be seen here where we have
somewhat magical state in a pass to flip-flop its behavior:
https://github.com/llvm-project/llvm-project/blob/master/llvm/unittests/IR/PassManagerTest.cpp#L138-L139

And it gets more complicated if you want statefulness like triggering on
the *3rd* run of the pass.

But this is exactly the kinds of scenarios that I needed to write tests for
in order to get the code to be correct. I have consistently found and been
able to fix bugs throughout the pass manager by writing careful unittests.

Mocks with GoogleMock are, IMO, a *tool to create interesting and
debuggable test objects*. These objects can then be fed into an API to
exercise it in ways that are hard or impossible to control from a command
line in sufficient granularity and precision. While doing this is never fun
and should be avoided where possible, when we need to do this I think it
provides a powerful tool for the job.

Here is how it works at the highest level:
1) Create a class with a MOCK_METHOD*(...) API. This API is then hookable
by gmock.
2) Use some APIs to register default behaviors for the APIs.
3) Setup the *minimal* amount of expected API interactions for a given
test. IE, for this test to pass, X has to happen and in response to that my
code needs to do Y.
4) Feed this class, or a wrapper around it if you need a copyable object,
into the system you are testing and run it.

If the expected interactions don't occur, you get a trace of which ones
failed and why. These traces are somewhat verbose and hard to read, but
they actually have the information needed to debug the system which saves
you from building infrastructure to extract that over and over again.

But a concrete example will likely work better. I've used gmock to build
the unit tests for a major revision of the LoopPassManager in the new pass
manager. This is a substantial redesign that now handles inserting new
loops, deleting loops, and invalidating analyses. The tests for it are,
IMO, dramatically more readable than the test linked above. And they are
substantially more thorough and precise:

https://reviews.llvm.org/D28292

I hope this is compelling for folks. Just writing and debugging this one
test was extremely compelling for me. I ended up with much better coverage
and precision than I would have using any other technique without a
tremendous amount of plumbing essentially re-inventing a framework to build
test pass objects that work exactly the way these mock pass handles do.

That said, all is not perfect. For instance, gmock suffers from being
designed in  C++98 world. It has relatively poor support for move and value
semantics, which resulted in my using a wrapper around the mock interfaces
in the above patch to let a pimpl idiom provide the value semantics I
wanted. However, that idiom works well, and this didn't substantially
impede my use of the infrastructure.

Also, I remain very sympathetic to the idea that this kind of testing
apparatus should be relatively rarely needed. We shouldn't be writing new
complex unit tests for APIs every week. But even a few use cases such as to
test ADTs and generic tools like the pass manager seem to justify the cost
to me, and I'm happy to help draw up fairly restrictive guidance around
mocks for the coding standards.

Thanks, and sorry for the long email, but I wanted to try and lay out the
issues in a way folks could understand, and the examples, while hopefully
useful, are quite large and complex.

Please don't hesitate to ask questions if stuff isn't clear.
-Chandler

[1]: https://en.wikipedia.org/wiki/Test_double
[2]: http://martinfowler.com/articles/mocksArentStubs.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170104/b1da1d7c/attachment.html>