[cfe-dev] [RFC] Adding Thread Role Analysis

Fri Jun 21 13:31:06 PDT 2013

On Jun 20, 2013, at 7:12 PM, Delesley Hutchins <delesley at google.com> wrote:
> 
> As Chandler mentioned, clang already has an existing thread safety analysis,
> which is based on type systems for static race detection, which I believe
> you are familiar with:
> 
> http://users.soe.ucsc.edu/~cormac/papers/pldi00.pdf
> http://users.soe.ucsc.edu/~cormac/papers/toplas06-safe-locking.pdf
> 
> The clang annotations can be found at:
> http://clang.llvm.org/docs/LanguageExtensions.html
> 
> In a nutshell, the existing analysis allows a data member to be annotated as
> being guarded by a particular lock.  Similarly, a function can be annotated
> as requiring a particular lock to be held.  The analysis then tracks the set
> of locks that are known to be held at every program point, and issues
> warnings when a data member is accessed or a function is called without
> holding the appropriate lock.

We are familiar with these techniques, which are fundamentally similar to the
lock analysis developed by our Fluid-group colleague Aaron Greenhouse (see the
PPoPP paper's bibliography for references).  That said, we feel that TRA has
additional benefits over the existing functionality.  The key difference is
this: the Clang annotations, along with the vast majority of other work on
static analysis for thread safety, are focused on lock-based concurrency.
Thread role analysis is focused on NON-lock concurrency, which is a rather
different thing.  However, we realize that there can be overlap between the two
techniques, for example with regards to enforcing reader-writer behavior.

Thread role analysis doesn't concern itself with locks, which provides
benefits over the various lock-based analyses which don't do anything
when there's no locking involved.  The issues addressed are rather
different.  And, it turns out, many widely-used frameworks impose
thread usage policies on their clients.

The easiest example to explain involves the typical threading model
used by most modern GUIs (including at least AWT/Swing and SWT in the
Java world, the X windowing system, and many others). Specifically:

* There is at most one GUI event thread at a time. In some
implementations the identity of this thread is fixed. In others, the
identity of the GUI event thread may change from time to time.

* The GUI does not use locks to protect its data structures. (Note:
this is from the *client* perspective.  There may be interesting
locking *inside* the GUI implementation, but this is typically hidden
from clients.) This choice was originally made for performance
reasons; it has persisted due to the clash between the need for
lock-ordering (to avoid deadlock) and the fact that events flow
bi-directionally between the GUI(and the user) on one side, and the
program and system on the other. This makes it extremely challenging
(and perhaps impossible) to create a single consistent lock-ordering.

* The GUI must avoid race conditions, even though its data is
unprotected by locks. The typical solution is thread confinement, to
wit: *only* the GUI event thread may touch the GUI's data structures.
That's a thread usage policy, but not one that's easily expressable
using lock-based annotations.  The statements below are consequences
of that policy.

* Code executing on threads other than the GUI event thread must never
invoke GUI methods (actually, most GUI methods; see below). Getting
this wrong breaks the necessary thread confinement. Note that this is
a transitive property which must be true for all code executing on
non-GUI threads.

* GUI callback methods are invoked by the GUI on the GUI event thread,
even though it is client-written code.  Consequences include:
** Code invoked from GUI callbacks must transitively invoke only
methods that can legitimately execute on the GUI event thread.
** Data structures that are part of the client's UI implementation
inherit the thread-confinement policy from the GUI.
** GUI callback methods should never be invoked from any thread
*other* than the GUI event thread.
** All of these sub-items must be transitively true for all code
reachable from GUI callback methods.

* Client code should avoid blocking the GUI event thread, because that
hangs the UI. Implications of this include:
** Don't risk waiting on locks that may be held by another
thread. File and network I/O are probably a bad idea. Etc. 
** Don't kick off lengthy computations while executing on the event
thread. 
** All of these require the ability to "know" which code may be
executed by the event thread.
** Once again, all of these sub-items must be transitively true for
all code reachable from GUI callback methods.

* There are a few GUI functions that may be legitimately invoked from
any thread. This is typically a short list, at least when compared
with the total set of GUI framework methods. Typical examples include
adding and removing event listeners, requesting that some action be
taken on the GUI event thread (Swing calls this method
"invokeLater(...)", other GUIs have other names for it), etc.

All of these properties are entirely policy-based.  The typical
lock-based concurrency analysis systems lack the concepts needed to
express or enforce them.  The concepts of thread roles and constraints
regarding which roles may execute which code (or touch which specific
data items, if data analysis is supported) enable simple expression
and enforcement of policies like these.

> The existing analysis tracks locks, not threads or roles, so
> concepts like the "GUI thread" cannot be directly
> expressed. However, it's not hard to mimic this functionality -- the
> GUI thread acquires a lock named "GUI_lock" when it starts, and
> releases it when its done, and all code and data that should be
> GUI-only is annotated as requiring that lock.  (A lock in clang is
> simply a named C++ object; it does not need to implement an actual
> lock.)  Switching of roles can similarly be modeled as releasing one
> lock and acquiring another.  So at first glance, it seems to me that
> lock-based analysis is roughly similar to the analysis that you
> propose, and may in fact be significantly stronger; the "related
> work" section in your paper does not discuss the differences in much
> detail.

It may be possible to squeeze TRA into the current thread safety
analysis by using "notional" locks in place of thread roles as you
suggest.  In what follows, a lock name (e.g., "A") means that the lock
is required; the negation of a lock name (e.g., "!A") means that it is
excluded.  The existing Clang annotations support requiring a set of
locks (e.g., (A & B & C) ) and also forbidding a set of locks
(e.g.,(!B & !C)).

However, when you say:

> and all code and data that should be GUI-only is annotated as
> requiring that lock.

If this statement requires user-written annotations for all such
methods, the required annotation effort will make adoption exceedingly
difficult for most practicing programmers.  One of the benefits of TRA
is the application of well-founded techniques to avoid the need for
most user-written annotations.  Perhaps some of these techniques could
be brought to bear on the existing thread safety analysis.

TRA offers some additional capabilities:

 * We support more complex boolean expressions over thread roles. The
existing Clang annotations cannot express multiple combinations of
such properties.  For example, ((A & B & !C) | (D & !E)).  Before you
ask, although such combinations would be nonsensical for locks, they
really do show up for thread roles in production code.

* TRA allows statements that certain thread roles are mutually
exclusive (a GUI thread can't be a Compute thread, and vice
versa). The Clang annotations lack support for this property.

* TRA (for code) plus a sufficient effects and/or alias analysis (for
data) can combine to support discovery of data that *should* be locked
(but currently is not), as well as demonstrating that notionally
thread-confined data is, in fact, thread confined.  If your existing
effects or alias analysis is strong enough, that's a really exciting
capability.  By comparison, the existing thread safety analysis
*starts* when you decide that you need to protect some data with a
lock.

* TRA is strongly focused on reducing the user effort required to get
results.  Thus, multiple tricks to reduce the number of annotations to
be written (see the paper).  For example, our inference-based cross-TU
analysis avoids the need to annotate each method in a chain of calls
with the equivalent of a shared_locks_required() annotation. See also
the discussion in the paper of analysis of partially-annotated code,
"scoped promises" (NOT related to your scoped_lockable annotation),
and so on.

Our experience with extensive field trials showed that pragmatic
adoptability requires all of:
* Sound results. The existing thread safety analysis is certainly
capably of providing this.
* A very smooth incremental adoption path. I believe that the existing
thread safety analysis provides this, for the properties it
addresses. At a minimum, you can work through an existing code-base
one lock at a time.

* Very low programmer effort required to produce useful results.  My
past experience with lock-based analyses suggests that this is mixed.
The purely lock-based part is mostly there.  O-O effects and/or alias
analysis, however, mostly fails this test.  Particular challenges
arise due to the occasional need to prove properties like exclusive
ownership of heap data (for some cases), or more-precise effects than
"touches the heap" (for other cases).

An important issue to be aware of is that the portion of a program
that needs to know about any particular lock (and the data item
protected by it) is typically a very small fraction of the total
program -- perhaps only a handful of methods.  Writing an annotation
for each method in that handful is rarely a substantial
burden. However, interesting programs may contain large numbers of
locks.

Thread roles have very different properties. First, most programs --
even extremely large programs -- tend to involve only a handful of
interesting thread roles, regardless of the number of actual threads
at runtime.  The multi-million line programs we analyzed had fewer
than a dozen thread roles, but had thousands of threads.  During our
field testing, the largest number of distinct thread roles we
encountered in any one program was under 30.

Secondly, interesting thread roles tend to involve very large swathes
of code.  For example, the GUI model described above essentially
partitions the entire program into two bins: GUI code and not-GUI
code. It should be obvious that although:

> and all code and data that should be GUI-only is annotated as
> requiring that lock

is exactly what we want conceptually, asking programmers to annotate
every method in the entire program is a guaranteed non-starter due to
the amount of effort required.  Hence our focus on extremely
light-weight annotations and extremely light-weight analysis (our goal
was less than double the compilation time of whatever we were
analyzing).  I believe that this tendency to be relevant to very large
fractions of an application is very different from the typical
use-case for the thread safety analysis, at least in terms of the
annotations to be written (or inferred).

It may, in fact, be possible to combine the capabilities of TRA and
the current thread safety checking to produce something very
interesting.  Let's talk about it!  But we believe that TRA offers
sufficient benefits to warrant its inclusion as a feature in addition
to the existing lock-based analysis.  They're similiar in that they
both deal with thread safety, but solve the problem in different ways.

> The existing clang annotations are used extensively on production
> code; we have many hundreds of thousands of lines that are currently
> annotated and checked.  It is not a research project.

While the previous Java system described in the paper was a research
project, it has progressed far past that point.  It (along with
the other static analyses produced by members of the Fluid research
group) was used sucessfully on millions of lines of production Java
code (warts and all). The Fluid group's analyses (including TRA) were
subsequently tech-transferred out into industry.  In any case, the
Java side is definitely not research at this point.

We're proposing to bring the same concepts over to the C-based family
of languages because the problems TRA solves are mostly
language-agnostic.  The extent of the research involved at this point
is in finding the most obvious ways to express the capabilities; the
underlying concept has already been proven useful and effective.