[cfe-dev] (From the LLVM Blog) C++ at Google: Here Be Dragons

Chandler Carruth chandlerc at google.com
Mon May 23 14:07:14 PDT 2011


Posting here for comments / questions from the mailing list. To read on the
web check out the
LLVM<http://blog.llvm.org/2011/05/c-at-google-here-be-dragons.html>or
Google
Engineering Tools<http://google-engtools.blogspot.com/2011/05/c-at-google-here-be-dragons.html>blogs.


Google has one of the largest monolithic C++ codebases in the world. We have
thousands of engineers working on millions of lines of C++ code every day.
To help keep the entire thing running and all these engineers fast and
productive we have had to build some unique C++ tools, centering around the
Clang C++ compiler. These help engineers understand their code and prevent
bugs before they get to our production systems.

Of course, improving the speed of Google engineers—their
productivity—doesn’t always correlate to *speed *in the traditional sense.
It requires the holistic acceleration of Google’s engineering efforts.
Making any one tool faster just doesn’t cut it; the entire process has to be
improved from end to end.

As a performance junky, I like to think of this in familiar terms. It’s
analogous to an *algorithmic *performance improvement. You get “algorithmic”
improvements in productivity when you reduce the total work required for an
engineer to get the job done, or fundamentally shift the time scale that the
work requires. However, improving the time a single task requires often runs
afoul of all the adages about performance tuning, 80/20 rules, and the
pitfalls of over-optimizing.

One of the best ways to get these algorithmic improvements to productivity
is to completely remove a set of tasks. Let’s take the task of triaging and
debugging serious production bugs. If you’ve worked on a large software
project, you’ve probably seen bugs which are somehow missed during code
review, testing, and QA. When these bugs make it to production they cause a
massive drain on developer productivity as the engineers cope with outages,
data loss, and user complaints.

What if we could build a tool that would find these exact kinds of bugs in
software automatically? What if we could prevent them from ever bringing
down a server, reaching a user’s data, or causing a pager to go off? Many of
these bugs boil down to simple C++ programming errors. Consider this snippet
of code:

Response ProcessRequest(Widget foo, Whatsit bar, bool *charge_acct) {
  // Do some fancy stuff...
  if (/* Detect a subscription user */) {
    charge_acct = false;
  }
  // Lots more fancy stuff...
}



Do you see the bug? Careful testing and code reviews catch these and other
bugs constantly, but inevitably one will sneak through, because the code *looks
fine*. It says that it shouldn’t charge the account right there, plain as
day. Unfortunately, C++ insists that ‘false’ is the same as ‘0’ which can be
a pointer just as easily as it can be a boolean flag. This code sets the
pointer to NULL, and never touches the flag.

Humans aren’t good at spotting this type of devious typo, any more than
humans are good at translating C++ code into machine instructions. We have
tools to do that, and the tool of choice in this case is the compiler. Not
just any compiler will do, because while the code above is *one *example of
a bug, we need to teach our compiler to find lots of other examples. We also
have to be careful to make certain that developers will act upon the
information these tools provide. Within Google’s C++ codebase, that means we
break the build for every compiler diagnostic, even warnings. We continually
need to enhance our tools to find new bugs in new code based on new
patterns, all while maintaining enough precision to immediately break the
build and have high confidence that the code is wrong.

To address these issues we started a project at Google which is working with
the LLVM Project <http://llvm.org/> to develop the
Clang<http://clang.llvm.org/> C++
compiler. We can rapidly add warnings to Clang and customize them to emit
precise diagnostics about dangerous and potentially buggy constructs. Clang
is designed as a collection of libraries with the express goal of supporting
diverse tools and application uses. These libraries can be directly
integrated into IDEs and commandline tools while still forming the core of
the compiler itself.

We’ve been working on Clang for over a year now so that it can understand
and reason about all of the C++ code at Google. But building the tools and
technology to catch these bugs is only half the battle; we have to get
engineers to *use* them as well. When other teams at Google respond to
production bugs, our team will often begin working to enable any Clang
diagnostics that might have caught the bug. Within one week of production
issues, we can sweep the entire code base using these diagnostics to fix any
latent bugs.

Recently we enabled the Clang C++ compiler for every C++ build at Google in
order to provide accurate and helpful warnings and diagnostics to engineers.
Some examples of how Clang can help developers with bad code are discussed
on this post<http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html>
to
the LLVM blog. Beyond that, once we have swept the codebase with a
bug-finding diagnostic, we can enable it for all our engineers to catch
future bugs before they’re committed. These diagnostics break the entire
build of that piece of software to ensure that they aren’t ignored and are
acted on immediately. For the code sample above, the user gets an error
message:

*example1.cc:4:17: error: initialization of pointer of type 'bool *'
from literal 'false' [-Werror,-Wbool-conversions]*
   charge_acct = false;
                 *^*



Here are two other classes of bugs we’ve found::

long kMaxDiskSpace = 10 << 30;  // Thirty gigs ought to be enough for anybody.

void SomeService() {
  // Setup task using external resource...
  while (/* Check if resource is available yet ... */) {
    sleep(0.5);  // Yield the CPU
  }
}



Which now trigger the following errors:

*example2.cc:12:25: error: shift result (10737418240) requires 35 bits
to represent, but 'int' only has 32 bits [-Werror,-Wshift-overflow]*
long kMaxDiskSpace = 10 << 30;  // Thirty gigs ought to be enough for anybody.
                    *~~ ^  ~~*
*example2.cc:16:11: error: implicit conversion turns literal
floating-point number into integer: 'double' to 'unsigned int'
[-Werror,-Wliteral-conversion]*
    sleep(0.5);
    *~~~~~ ^~~*



All of these represent real bugs that we have found in our code, and that we
are catching and fixing with the help of Clang today.

Clang and its diagnostics don’t in any way obviate the need for careful code
review and thorough testing. Rather, they complement these practices,
combining to help reduce the number of bugs in our code. This is the
platform on which we are developing new and better diagnostics for engineers
going forward. This is how we are providing an algorithmic improvement to
their productivity, and accelerating Google.

Stay tuned for more posts about how we rolled Clang out to Google engineers,
how we have enhanced Clang to make it even more relevant for our code and
our developers’ needs, and some of the exciting tools we’re building on top
of this platform.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20110523/70d2ebd3/attachment.html>


More information about the cfe-dev mailing list